Skip to content

Commit

Permalink
Add 2024 GSoC report Compute Summary for all detected packages
Browse files Browse the repository at this point in the history
Signed-off-by: swastik <swastkk@gmail.com>
  • Loading branch information
swastkk committed Aug 23, 2024
1 parent a9508b4 commit 0e4975e
Show file tree
Hide file tree
Showing 2 changed files with 74 additions and 1 deletion.
11 changes: 10 additions & 1 deletion docs/source/archive/gsoc-toc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,16 @@ GSoC -- Google Summer of Code
open source software development. GSoC is completely online designed to encourage university
student participation in open source software development.
It was started by Google in 2005.
More about GSoc - <https://summerofcode.withgoogle.com/about/>_
More about GSoC - `<https://summerofcode.withgoogle.com/about/>`_

GSoC 2024
---------

.. toctree::
:maxdepth: 2

gsoc/reports/2024/scancode_toolkit_swastkk


GSoC 2022
---------
Expand Down
64 changes: 64 additions & 0 deletions docs/source/archive/gsoc/reports/2024/scancode_toolkit_swastkk.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
========================================================================
Compute summary for all detected packages.
========================================================================


| **Organization:** `AboutCode <https://aboutcode.org>`_
| **Project:** `Scancode Toolkit <https://github.com/aboutcode-org/scancode-toolkit>`_
| **Mentee:** `Swastik Sharma (swastkk) <https://github.com/swastkk>`_
| **Mentors:** Philippe Ombredanne, AyanSinhaMahapatra, AvishrantSh, Jonathan Yang, Jay Kumar
Overview
--------

Previously we were computing the summary at the codebase level which involves `license_clarity_score`,
`declared_holder`, `other_license_expressions` and many more. This project aims to improve scanning accuracy
by computing summary and license clarity scores for each package and its files, rather than for the entire scan.
This involves enhancing package models, and ensuring proper attribute collection for all package ecosystems.

Implementation
--------------

All the work I did is contained in `this single PR <https://github.com/aboutcode-org/scancode-toolkit/pull/3792>`_.
I added a new command line option called ``--package-summary`` that someone can use
to get the package level summary within a single codebase. The package level summary involves the
``license_clarity_score`` calculation and population of package attributes like ``copyright``,
``holder``, ``other_license_expression``, ``notice_text``. This option must be called with ``--classify``
option that helps ScanCode further classify scanned files/directories, to determine whether
they fall in these categories `legal`, `readme`, `top-level`, `manifest` & ``--package`` or ``-p`` option
detects various package manifests, lockfiles and package-like data and then assembles codebase level packages
and dependencies from these package data detected at files. Also tags files if they are part of the packages.

This change allows users to get the more refined summary for each individual package that is present in a codebase.
Also this feature improves the package assembly for various package ecosystems like npm, python-whl, rust, rubygems etc.


Finally, all these changes are tested through multiple unit tests validating both correct
behavior and error handling as needed.

Post GSoC
---------

I would like to merge this PR into Scancode Toolkit, hopefully allowing users to leverage
this feature to expand their package/codebase scanning capabilities.

Links
-----

`Project idea <https://github.com/aboutcode-org/aboutcode/wiki/GSOC-2024-Project-Ideas#compute-summary-for-all-detected-packages>`_

`Official GSoC project page <https://summerofcode.withgoogle.com/programs/2024/projects/JzMlDtnM>`_

`GSoC Proposal <https://docs.google.com/document/d/1TcGqQVzXhTkz6Pmu9UaXAr4R4q1rlT4tof7H7dsVG0o/edit?usp=sharing>`_

Acknowledgements
----------------

I would like to thank my mentors
- `@pombredanne <https://github.com/pombredanne>`_
- `@AyanSinhaMahapatra <https://github.com/AyanSinhaMahapatra>`_
- `@AvishrantSh <https://github.com/AvishrantSsh>`_
- `@35C4n0r <https://github.com/35C4n0r>`_

Weekly calls were greatly helpful and those special 1:1 call with `@AyanSinhaMahapatra` and `@pombredanne`
were so amazing. Thank you for your time and your patience!

0 comments on commit 0e4975e

Please sign in to comment.