-
Notifications
You must be signed in to change notification settings - Fork 125
MeetingMinutes2021
We meet online on Mondays at 16:00 UTC as a reference. See https://www.timeanddate.com/worldclock/meeting.html to get the time in your timezone.
Join us at https://meet.jit.si/AboutCode
The current meeting notes is at:
Here are the running meeting notes:
Participants:
- Ayan
- Tom
- Jono
- Tushar
- Philippe
Agenda:
- Misc. project updates
- Scancode.io and scancode toolkit codebase and resource roots
- FAQ, QA session community for aspiring contributors
- Workbench problem issue with SCTK
Discussions:
-
- Misc. project updates
-
SCTK VC SCIO Univers version control lib
-
Scancode.io and scancode toolkit codebase and resource roots
How to reuse SCTK codebase navigation in SCIO? Pending PR, needs discussion
-
FAQ, QA session community for aspiring contributors
We have a lot of questions. How to get started? Have good 1st issue is useful We can try it once in early January. promotion of the event is TBD LinkedIn, Twitter, some ....
-
Workbench problem issue with SCTK Seems quite manageable. is candicate for good first issue.
Participants:
- Ayan
- Tom
- Jono
- Tushar
- Philippe
- Hritik
Topics:
- Status
- FetchCode
- ScanCode Workbench failng on latest SCTK: Ayan volunteered to look into this
- VulnerableCode DB
- Jono @JonoYang
- Harsh @harshagrawal523
- Hritik @Hritik14
- Tushar @TG1999
- Philippe @pombredanne
- status update
- question wrt. GSoC: we will participate? which projects?
Philippe: reviewing PR on SCTK:
- PR: new key phrases in licenses
- PR: from Ayan on package files
Other discussions:
- vulncode-db is shutting down. May be we can take over? we will need to collect the data asap before it goes dark
- There are two new implementations of purl: one in Ruby and one in Swift made by a GitHub contributor
- next year GSoC:
- which projects will we have?... TODO: we to create and update the list of projects.
- FOSDEM:
- accepted as a devroom for Software Composition And Dependencies Management
- Ayan @AyanSinhaMahapatra
- Harsh @harshagrawal523
- Hritik @Hritik14
- Tushar @TG1999
- Philippe @pombredanne
- scancode-toolkit update
Phillipe:
- New PR with a lot of new license detection rules https://github.com/nexB/scancode-toolkit/pull/2765
- New WIP PR by folks from softsense https://github.com/softsense/scancode-toolkit/pull/1 and https://github.com/softsense/scancode-toolkit/pull/2. This adds keywords to rules which should be present in matches or they will be dropped, making sure key words are present in matches. For example: GPL should be present in the match for a successful match to a GPL rule with the GPL word in it. This would potentially get rid of a lot of false positives from the matches.
Ayan:
- Changing the Package classes to the new PackageManifest classes https://github.com/nexB/scancode-toolkit/pull/2748
- Looking into PackageDatabase classes for system package manifests
- Would be working on PackageInstances and their creation next.
Phillipe:
- Work on Univers spec which could eventually be moved to PackageURL on a common version range syntax for all versioning schemes
Tushar: Waiting from the ONAP people on the PR.
Phillipe answering Harsh:
- First make sure you're interested in our projects
- Read through https://aboutcode.readthedocs.io/en/latest/contributing.html for a start
- Look into the projects which interest you the most
- Starting with a beginners issue and trying to solve it would make the most sense
- PRs for small doc typos are not useful at all.
- Ayan @AyanSinhaMahapatra
- Tushar
- Jono @JonoYang
- Philippe @pombredanne
- univers update
- Go port of scancode-toolkit sponsored by interested parties - Initial reactions seem tepid: this is a big undertaking, which is not helped by an unfamiliarity with Go. We would also have to maintain two separate codebases.
- PackageManifest implementation/update by Ayan in scancode-toolkit
- Ayan @AyanSinhaMahapatra
- Jono @JonoYang
- Philippe @pombredanne
scancode TK: package files
Ayan:
- Replacing ecosystem specific package classes to PackageManifest classes, one for each package manifest type, so one/more PackageManifest classes would be present for each package ecosystem, and there would be standard functions for package manifest detection and creating PackageManifest objects from manifest files, which would be overriden for each specific manifest type. This is WIP now, see https://github.com/nexB/scancode-toolkit/tree/2098-top-level-packages
- Next would be adding PackageInstance objects, which are created out of one/multiple package manifests, and the files associeated with the package instance. Every package ecosystem would have a PackageInstance class, which would override and implement functions to find all other package manifests for a instance, given one manifest, and to get all the files for that package instance.
- functions related to package root are not touched, but this would be deprecated, and as this top level list of package instances is really package consolidation, the existing package consolidation has to be looked at after this.
Jono:
- Package roots are important in most cases as it can get all the package resources, and we there should be a way to keep doing this
Phillipe:
- There exists no package root in a lot of specific package ecosystem cases, and what we need is to be able to get all the resources associated with a particular package instance and being able to tag them as a part of that package instance. The upcoming changes are in that direction.
- Ayan @AyanSinhaMahapatra
- Ishu @ishukhr
- Jono @JonoYang
- Philippe @pombredanne
- Tushar @TG1999
- Hritik @Hritik14
Ayan: There has been a PR from @balakrishna-mukundaraj, https://github.com/nexB/scancode-toolkit/pull/2546 and there have been some installation failures there with version mismatch. Phillipe could you check this out?
Phillipe: There has been some problems since we switched to version constraints from having pinned requirements, and this needs to be inspected.
Hritik: On separating import and improve operations and revisit time travel.
There have been a conversation in packageurl gitter about having a logo with initial suggestion from @iamwillbar.
Tushar: Should an issue be added for this and should that be in packageurl-spec or packageurl-python? Ayan: It should be packageurl-spec as that is the main PURL repo, other repos are just tool implementations in different language. Phillipe: Yes, please add an issue.
- Tushar: Adding Black pre-commit hooks to packageurl-python, waiting for PR from @aditirao7 on that
- Philippe: new WIP spec for version ranges nottaion
- Tushar: PR to add Black to purl Python library needs review
- Jono @JonoYang
- Tom @tdruez
- Philippe @pombredanne
- Tushar @TG1999
- Hritik @Hritik14
This "vers" spec draft is at https://github.com/nexB/univers/blob/386eb32468c75ecac25ec872ea004b3257962946/VERSION-RANGE-SPEC.rst This will be moved to its own proper PR and is to address specific needs in purl and VulnerableCode. See: - https://github.com/package-url/purl-spec/issues/66 - https://github.com/package-url/purl-spec/issues/84 - https://github.com/package-url/purl-spec/pull/93 - https://github.com/nexB/vulnerablecode/issues/119 - https://github.com/nexB/vulnerablecode/issues/140
univers is the implementation done in //
https://github.com/package-url/packageurl-python/pull/64 has been submitted by @aditirao7 to add Balck style to the purl python library and consider using pre-commit.
We discussed using pre-commit CI to automatically push fixes to the PR branches. None present liked this, so we would instead likely use pre-commit with local git hooks instead and have failures in the CI if code style it not correct. Tushar @TG1999 and Hritik @Hritik14 will help set this up.
- Summarization and data aggregation: should it be in SCTK vs. SCIO. Or can we use a VirtualCodebase and SCTK plugins across the board?
- Drop Python 3.6 and Ubuntu 16 support
- How to deal with optimized build of Docker images such that lower layers are not rebuilt with each code changes. We need a ticket for this
- project statuses
- hacktoberfest
- we said we would put one project on deck for planning discussion each week... which one this week?: VulnerableCode
- Jono @JonoYang
- Tom @tdruez
- Philippe @pombredanne
- Tushar @TG1999
- Ayan @AyanSinhaMahapatra
- Hritik @Hritik14
- Avishrant @AvishrantsSh
- Should it be in SCTK vs. SCIO? Or can we use a VirtualCodebase and SCTK plugins across the board?
- The VirtualCodebase can be useful to walk a filesystem tree in a specific tree order
- Is it worth keeping consolidation SCTK plugins in SCTK? the Codebase model is not great when there is no DB.
- in particular the package pipeline in SCIO would need such features
There was no conclusion yet from the discussion, and ideally we would like to to keep summry functions in both. But the programming model for data aggregation in SCTK is really problematic. For instance to find a file or directory resource that has a certain attribute in a VirtualCodebase, the whole codebase needs to be walked and all resources of the codebase checked. Basically we are badly missing the ability to do queries, something that a DB is failrly good at.
So unless we can find a clean way to get the code working cleanly in both cases, we may deprecate aggregation in SCTK and update its migrated code in SCIO to leverage the DB.
Note, that the issue is not so much the performance (which is poor in SCTK for these features) but rather the programming model that is really painful.
- Eveyone is A-OK to drop support 3.6 which is EOL by the end of the year
- We will adopt likely 3.8 as a minimum version number, which is the mininum version that Django will move to too.
- Ubuntu 16 is being dropped from Azure and has long been out of maintenance. SCTK and SCIO are now on Ubuntu 20 for core tests, and other Ubuntu 18 and other OS for smoke tests
- We will use Ubuntu 20 or Debian buster as needed as a base OS for core tests.
-... such that lower layers are not rebuilt with each code changes. - for now the way we build most docker images where we first copy a project then install it creates a layer for dependencies that is rebuilt each time the core code changes. In development this means constant rebuilds of everything - we want smaller images, faster builds and a way to publish pre-built Docker images
We need a ticket for this: Tom to create this in SCIO
- VulnerableCode : - Hritik: working on refactoring of with improvers - Hritik: how to share data efficiently decentralized: bit-torrent? - Philippe: still working on deployment - TODO: add Azure pipelines to CI for tests
- ScanCode TK: - Ayan: one PR merged on changing output structure, working to use one class for each package manifest, rather than one for each package manifest - Ayan: new reference scans diff and doc for SCTK https://github.com/nexB/scancode-toolkit-reference-scans
- ScanCode.io: - Jono: https://github.com/nexB/scancode.io-reference-scans needs some update. - Tom: released a new version with the latest TK. Drop Celery for RQ which is better at managing tasks.
- ExtractCode: - Philippe: Bugs and fixes require a new release
- FetchCode: - Pending PR such https://github.com/nexB/fetchcode/pull/70 ... which file need special attention. Todo ask Alexander to setup some live review time or to help focus the review on the specific parts that need attention.
- Package URL: - Lots of PR merged and chatter around OCI images and if a purl is a location or not.
- already 10 days in, so we need to start fast or it will be too late
- Hritik: project board created in VC. other projects that want to participate should join there
- Ayan: Repos, issues and PR need to be tagged accordingly.
- Hacktoberfest: from @Hritik
- ScanCode.io homepage content
- Package URL for RPM and debs.
- FetchCode pending PRs
- ScanCode.io Keycloak PR
- Recent events presentations
- Jono @JonoYang
- Tom @tdruez
- Philippe @pombredanne
- Tushar @TG1999
- Alexander @aalexanderr
- Ayan @AyanSinhaMahapatra
- need just to tag issues with Hacktoberfest for beginners
- Tushar will look into and sync Hritik and report back
- Philippe to work on draft content
- https://github.com/package-url/packageurl-python/issues/62
- could be a great hacktoberfest and there could some minimal sponsoring available too
- Package URL for RPM and debs.
- https://github.com/nexB/fetchcode/pull/71 : ready to merge and merged
- https://github.com/nexB/fetchcode/pull/70
- https://github.com/nexB/fetchcode/pull/54
- https://github.com/nexB/fetchcode/pull/56
Using CLI tools like wget or curl vs. the standard library needs to be discussed in a ticket. See https://github.com/nexB/fetchcode/issues/72
This may be useful or needed for large files with multipart data.
Alexander is trying to deploy SCIO on a public cloud and want it to gate by some login through of using openid connect: now with GH, and later using LF as an identify provider.
Auth should be mostly configuration and not for only one specific auth server.
At the LF OSS Summit, we had two presentations that talked of ScanCode.io:
- Alexander and Krzysztof: https://osselc21.sched.com/event/lAR7/virtual-emerging-automated-license-compliance-for-containers-alexander-mazuruk-krzysztof-opasiak-samsung-rd-institute-poland?iframe=no
- Philippe: https://osselc21.sched.com/event/lAMB/virtual-software-composition-analysis-with-free-tools-philippe-ombredanne-aboutcodeorg-and-nexb-inc?iframe=no
Alexander and Krzysztof will also present to the Open Networking Edge + Kubernetes on October 11th: https://events.linuxfoundation.org/open-networking-edge-summit-north-america/program/schedule/
- changes in package/package-manifests reporting
- scancode TK output format documentation with diffs between versions
- @AyanSinhaMahapatra
- @JonoYang
- @tdruez
@AyanSinhaMahapatra:
Some documentation on how the scancode output data changes across versions is needed as there are upcoming changes on both the package and license data struture. So it would be nice to have a collection of sample codebase to scan for, and perform diffs with sphinx and hosted, in order for adopters to make sense of the changes easily. So is there some thing we can use, scanning which would cover/show most scancode features in the data.
@AyanSinhaMahapatra:
Working on reporting package instances at top-level with data from possibly multiple package manifests and with the files present under that package. Design Doc at: https://docs.google.com/document/d/1cHAxXZ_VxwEDxRF4BcOXTSSjGp3-_tYLVxXTx2X8oC4/edit?usp=sharing
@JonoYang:
It would be useful to have:
- npm manifests and node_modules directories
- different python manifests in a same directory
to check these features of having package instances and one instance being created from multiple package manifests data. These should also be there in the samples part to effectively document and show diffs.
- planning process
- scancode TK format changes
- ONAP presentation
- license scanning campaign (Debian and Alpine)
- @aalexanderr
- @pombredanne
- @kopasiak
- @AyanSinhaMahapatra
- @tdruez
- @Hritik14
- @JonoYang
The idea would be to add a simple ROADMAP.rst to each repo. And ensure that each project gets its time in turns in the spotlight during the weekly call so that we can review and update the roadmaps, focusing on one at a time.
@pombredanne:
- Would recognize package manifests
- multiple manifests contribute to making a package
- generally the plan is to decouple low level scan/detections that are tied to a file and/or positions within a file, and conflate several of these in a single reported value still keeping the details of the per-file and per lien matches.
For instance:
- multiple package manifests form one package and its files
- multiple license detections form one inferred license expression in a given file
- multiple copyright statements may refer to one copyright holder
@kopasiak:
- ONAP is a comprehensive platform for management and automation of network and telco services for easy scaling and monitoring
See https://docs.onap.org/en/latest/guides/onap-developer/architecture/onap-architecture.html#onap-architecture for more docs.
- License compliance is important to ONAP. The project is deployed using 100's of container images, mostly using Alpine Linux.
- Using ScanCode.io will help ensure that compliant and vetted images are used
@kopasiak:
- have some openstack infrastructure which can be used to scan packages
- for Alpine, which versions to scan?
- an estimate of the machine resource needed would be needed before starting
- whether CPU/RAM/DISK bound
@pombredanne:
- scanning is mostly CPU bound
- Versioning scancode toolkit
- Debian license improvement campaign, possibly also on alpine
- Alpine WIP with maintainers on how to get to a source package
- Docker/container model in SCIO
- @aalexanderr
- @pombredanne
- @Hritik14
- @JonoYang
- @TG1999
- @tdruez
There is a need to create a graph with dot the dependencies of container images.
- there is a need for both new data structure
- and new data to support these
Alexander will create a ticket for this. And will also enter a ticket to avoid re-scan already scans based on checksums.
- Given a binary Alpine package, it is not possible to get to the corresponding source package directly. Each of community, main, non-free, scripts, testing, unmaintained needs to be tried in turn until the package name is found. This is problematic.
- Alexander will get in touch with Alpine maintainers... Mateuz has a pending patch on apktools to fix this.
- The idea of these projects is to organize campaigns to massively improve licensing documentation quality and contribute this upstream.
- first targets are Debian and Alpine.
- This will need some serious sponsoring: TBD with LF projects and other sponsors
- next step: Philippe to draft one pager so we can start engaging possible sponsors.
- calver is not super useful. We are switching back to plain semver. We can start at 22.0.0
- Alexander suggested why not just 30.0.0 instead? This will separate it from calver and make a nice round basis for next semver compatible releases.
- next step: Philippe to draft doc and use the new way on SCTK
- Alexander will be speaking on OSPO conference and on Open kubernetes and will mention ScanCode.io!
- @aalexanderr
- @AyanSinhaMahapatra
- @pombredanne
Agenda:
- PR to FetchCode that is ready to merge
- Versioning data format on ScanCode toolkit
- Design update on package ScanCode models
- Misc: Debian package formats updates
- Adding image id to package model
- pip updates questions
Discussion:
Alexander:
- What remains to be done on pip attribution
- https://github.com/nexB/fetchcode/pull/70/files
Phillipe:
- Add a SPDX license identifier tag for files would be straightforward
Alexander:
- Should we support typing in fetchcode
Phillipe:
- It should be enforced and universally applied for it to be useful
- Don't have to change if typing already added
Alexander:
- DCO check failing on two commits as they are code from pip didn't add signoffs
Phillipe:
- It doesn't have to be your code for you to signoff, you just need to have rights to push that
Alexander:
- adding image ID to scancode IO package model
Phillipe:
- we should not have anything to our model that is specific to the pipeline, but this would be important
- let's put this in a ticket and also discuss next week with @tdruez
Phillipe:
- Debian copyright scanning for structured files now don't have line numbers. To add this changes has to be added to debian-inspector, replacing email module with a new parser with line tracking capabilities.
Alexander:
- Connection alive bug and one ONAP image scanning failed in scancode.io
Phillipe:
- These are bugs and issues should be opened
Ayan:
- Versioning the Output Data Format for scancode introduced. --future-format flag now removed as it's hard to implement two supported versions.
- Changes to the package format planned, with new top-level packages (instances) and file level package metadata reporting. See https://github.com/nexB/scancode-toolkit/projects/10 for more details.
- @AvishrantsSh
- @akugarg
- @AyanSinhaMahapatra
- @JonoYang
- @pombredanne
- @TG1999
- @Hritik14
- @tdruez
Agenda:
- GSOC wrap-up
- Data Versionning in ScanCode Toolkit: discussing https://github.com/nexB/scancode-toolkit/issues/2653
- FetchCode session with Samsung: reporting on the discussion
For next week, we will have a 10/15 minutes session on each GSoC project as a wrap up where each GSoc student will present its project, and make a quick demo.
GSoC:
- AvishrantsSh: Wrapping GSoC things up , submitted the final version of evaluation and released a new version of the plugin on PyPI.
- Akanksha: Submitted the final version of evaluation, need help to wrap the LicenseMatch for unknown license detection.
- Hritik: Working on the new improver design for VulnerableCode and project documentation. Discussed imports
- (Pratik could not join)
- @akugarg
- @AvishrantsSh
- @AyanSinhaMahapatra
- @Hritik14
- @pratikrocks
- @JonoYang
- @pombredanne
- @tdruez
- @TG1999
Agenda: - GSOC status
Akanksha:
- Following file references to other files in licensedcode
- Now, just in same dir, should whole codebase be done
Phillipe:
- Look only in current is fine and should cover most cases
- The other case is see license in root and this is complex because finding root is complicated and depends on context
- need to create ticket for package ecosystem specific referenced file checks
Avishrant:
- working on making all the tests work for the GLC pipeline
- documentation on adding a new pipeline
- Is it okay to have the final report just as a .rst file instead of RTD
Phillipe:
- Yes perfectly okay as there is no RTD for the
Hritik:
- working on inference
- Not sure about having different confidence levels, would be inference if not full confidence
Phillipe:
- Not sure on the naming of inference, needs refinement
- Would discuss in details in the vulnerablecode meeting tomorrow
Pratik
- Working on documentation, and final report
- Asked if it was okay to have the final report in the wiki
Phillipe:
- having it as .rst files in RTD is best because there are tests and better than seperate wiki
Ayan:
- need to remove the old wiki contents and link to corresponding RTD sites in deltacode
Ayan:
- GSoC evaluation forms will open today/tomorrow, deadline on 23rd for students.
- Will follow up on activating RTD for vulnerablecode and deltacode
Phillipe:
- have pushed a release prep on fetchcode
- added some issues with fetchcode, on better tracing and other problems
- monorepo vs manyrepo, should have a discussion on this next week
- @akugarg
- @AvishrantsSh
- @AyanSinhaMahapatra
- @Hritik14
- @pratikrocks
- @JonoYang
- @pombredanne
- @TG1999
Agenda: - GSOC status
Akanksha:
- Following file references to other files in licensedcode
- Added PR for adding referenced_filenames to API, working on feedback that it should be in matched_rule and not resource_attribute
- Added new licenses which were not detected
Avishrant:
- working on adding documentation for the GLC pipeline
Hritik:
- working on importer resturcturing (some problems with Oval based importers, looking into them)
- added configure files for documentation
Ayan:
- Will follow up on adding RTD page for vulnarablecode
Pratik
- fixing the deltacode documentation , and adding additional documentation for the use of docker image
Ayan:
- We need to review PR https://github.com/nexB/fetchcode/pull/54
Tushar:
- Mostly ready to relase as a package
- Will look into issues and ping for discussion
Phillipe:
- Will review scancode.io PR which depends on this
Avishrant:
- Recieved a mail from google on writing reports, where should it belong
Ayan:
- Will share GSoC reports from previous years
-
- It is good to have them in RTD or wikis, instead of having blogs/docs present elsewhere, as they are
- more permanent links. Benificial for the project, the participant to link to, and for future participants.
- Would be nice, but not mandatory, if there are blogs/other documentation on experience and POV, link to those
Some Previous Reports:
- https://github.com/nexB/aboutcode/blob/master/docs/source/gsoc/gsoc19_final_report.rst
- https://scancode-toolkit.readthedocs.io/en/latest/contribute/gsoc19_final_report.html
- https://scancode-toolkit.readthedocs.io/en/latest/contribute/gsoc17_final_report.html
- https://gist.github.com/sbs2001/26d42784e738c078a97e3904e8833fc6
- @akugarg
- @AvishrantsSh
- @AyanSinhaMahapatra
- @Hritik14
- @pratikrocks
- @JonoYang
- @pombredanne
Agenda: - GSOC status
Akanksha:
- Working on following file references to other files
- Question on whether existing unknown matches should be replaced with new resolved ones
Phillipe:
-
- There are two cases
-
- when added to the license plugin, matches should not be replaced, just new match added
- in packagedcode, in specific package manifests (like npm), they can be replaced as this are official specification for declaring license
Avishrant:
- the glc-pipeline repo is generated from skeleton
- working on packaging the pipeline, problems on adding scancode.io as a requirement have tried extra_requires, installing from git
- adding test cases
Phillipe:
-
- There are various solutions
-
- make scancode.io available in pypi and have then have it in dependencies
- install scancode.io locally as wheel (should do this to test now anyway)
- have a installation script
Hritik:
- changing the structure of importer (did it for one importer)
- added basic files for documentation
- which distros are/should be supported and how to mention that in docs
Phillipe:
- we need to run tests on CI to support distros
Ayan:
- Will add config and other files for basic RTD setup
Pratik
- Adding CSV output option in the deltacode CLI from script
- https://github.com/nexB/deltacode/issues/179 added later
Phillipe:
- Usually a good idea to create a ticket first
Philippe:
- new scancode released
- Would make python 3.7-3.9 default as 3.6 nears EOL
- @AvishrantsSh
- @AyanSinhaMahapatra
- @Hritik14
- @pratikrocks
- @tdruez
- @TG1999
- @JonoYang
- @pombredanne
Agenda: - GSOC status - fetchcode - scancode-toolkit updates
Avishrant:
- will work on memory issues in go side (at conversions)
- documentation on the pipeline
Phillipe:
- important to fix the bugs but more important to finish first
- create a ticket on that and postpone that
Thomas:
- Have published a repo for the glc pipeline, https://github.com/nexB/scancodeio.io-pipeline-glc_scan
Pratik
- having scancode options in deltacode results
- issues pointed by steven (on removing redundant models)
- work on Documentation
Phillipe:
- Ping for session, some planning on the fingerprints side
Hritik:
- implemented rate limiters
- have to restructure importers and make it easier to contribute importers
- sorting imports and tests
- docker bug fix (review needed)
- subversion http webdab
Phillipe:
- we want to design an aunthentication service which could be common with scancode.io
- make subversion as a requirement and use xml output
- discussion on subversion
- PR for nixOS packaging was submitted. CI being brittle because of that
Akanksha:
- (by text) could not join today not feeling well
Philippe:
- refactoring in fetchcode
- working on alpine apkbuild parsers
- project versioning (semver vs calver) https://github.com/nexB/scancode-toolkit/issues/2601
Jono:
- Extractcode bug replacing spaces with underscore, added fixes for that
- update package detection for miu files
- new releases for commoncode, extractcode
Ayan:
- working on parsers for cocoapod lockfiles (getting dependencies of xcode projects and link to their specs json)
- getting package objects for parsing podspec.jsons which are present in Cocoapods/Specs
- @akugarg
- @AvishrantsSh
- @AyanSinhaMahapatra
- @Hritik14
- @majurg
- @pratikrocks
- @tdruez
- @pombredanne
Agenda: - GSOC status
Akanksha: - working on PR unknown-unknowns, adding unknowns matches where there are none based on n-grams (some blockers, will continue discussion) - also working on following license references to another file
Ayan:
-
- It would make sense for now to follow [this comment](https://github.com/nexB/scancode-toolkit/issues/1364#issuecomment-869995820) but just
- for file references in the same directory and implement it in a post processing step in license scan plugin (process_codebase function) instead of in a seperate post-scan plugin.
Pratik
- PRs merged and more in review
- work on Documentation
Steven:
- Will add more issues on the specific tasks.
Ayan:
- Updated project board to have only ToDo, In Progress and Done columns, arrange tickets accordingly.
Avishrant:
- rebased on google licenseclassifier upstream
- working on mapping for license (glc handles notices and headers differently than scancode)
- working on fixing bug that is caused by filestreams opened
Phillipe:
- We need to focus on having the format conversion and not on modifying tool behaviour
- Binary files/files larger than a size could be ignored
- Open ticket in google licenseclassifierwith the problem
Hritik: - Fixing mattermost and mozilla importers - Rate Limiters - Opened https://github.com/nexB/vulnerablecode/issues/506
Phillipe:
- Open ticket on API rate limiters politely
Philippe: - working on versioning the JSON format of SCTK - accepting @tdruez's suggestion on having that as an experimental feature and will not be a default change, will be made default in later versions - Presented at UCSC's CROSS, on Open Source Compliance License Tools. Link - https://www.crowdcast.io/e/open-source-compliance
- @akugarg
- @AvishrantsSh
- @AyanSinhaMahapatra
- @Hritik14
- @pratikrocks
- @tdruez
- @pombredanne
Agenda: - GSOC status
Evaluation for Phase 1
- Output format changes in SCTK
Pratik - test CLI - work on Documentation - large PR ready to merge - TODO: have a session to work on fingerprints formats
Akanksha: - working on PR work unknowns - will update to have a single
Avishrant: - rebased on google licenseclassifier upstream - working on mapping for license - working on test cases for the module and now for the pipeline
Hritik: - Fixing mattermost and mozilla importers - Found new JSON API to get all mozilla products versions
Philippe: - discussion of versioning the JSON format of SCTK - Proposal: - add a new top level version format attribute
- @akugarg
- @AvishrantsSh
- @AyanSinhaMahapatra
- @Hritik14
- @majurg
- @pratikrocks
- @tdruez
- @TG1999
- @pombredanne
- GSoC project updates
- Other projects
@ ScanCode.io
- Have a working pipeline
- Submitted upstream ticket for Go Classifier
- Rebased modifications for Go Classifier
-
- are there ways to ignorable files like binaries?
-
- best would be to have that in ScanCode.io
-
- Should I use the skeleton?
-
- this can wait.
TODO: we need to make a presentation on how to use the skeleton next week
@tdruez:
- Made some tests on the pipeline and have some issues to review
- Working on adding new data structure to license: done
- What is next? either improve low score detection of licenses or unknown/unknown license detection - @ayan and @philippe : unknown/unknown license detection - @ayan and @philippe : should be in Sancode TK
-
- should "See license" be worked on next?
-
- @ayan and @philippe : unknown/unknown license detection is best first
- adding extensive documentation on DeltaCode - the wiki part should be best moved in the main repo docs directory
- PR for 1st phase ready but some CI issues on Windows - create a ticket as this may be a problem with an outdated skeleton configure.bat file
-
Working on importers - fixing mozilla importers - next is openstack
-
Some issues: - issues in the way: should I solve first or later? - documentation is weak and especially at the low level of the code
- adopt doc standard from Linux Kernel
-
Timing of VulnerableCode meedting needs to be workd out
@TG1999 @ FetchCode
- major restructuration of the code reviewed and needs to be reviewed by a second pair of eyes
@TG1999 in general: we shoudl have smaller PRs when possible. Bit ones are hard to review
New Gitter room created for off topic discussion from @Hritik14 request https://gitter.im/aboutcode-org/coffee-room
- @akugarg
- @AvishrantsSh
- @AyanSinhaMahapatra
- @Hritik14
- @pratikrocks
- @tdruez
- GSoC project updates
- Have been able to make a pipeline with LicenseClassifier
- Working on Multiprocessing and efficiency issues
- Adding copyright detection as LicenseClassifier V2 doesn't have copyrights detected
@tdruez:
- It doesn not make sense to add functionalitites to the projects, we just want to create a pipeline with the project as it is, so no need to work on adding copyrights detection
- Should work on documentating the process of adding the pipeline, the issues faced, about installing the package and running the pipeline
- Making a branch on scancode.io proper for review and feedback would be better, and point to docs to install the package and run the pipeline
- create Unit tests on running the pipeline
- [sound issues so could not speak, posted status on discuss]
- Hey! @/all I was having some sound issues in today's meeting .
- I was firstly working on addition of new flag in models definition which is completed!
- Moving on to next part i.e. Reporting Unknown licenses separately I have created a PR nexB/scancode-toolkit#2578 .
- As ayan said instead of having a subsection in licenses itself we need to have a separate section for "unknown" ones.
- Also I am working parallely on "Following indirect references" in files.
- Pushed PR: https://github.com/nexB/scancode-toolkit/pull/2578 on reporting unknown licenses seperately
@ayan:
- The https://github.com/nexB/scancode-toolkit/pull/2548 PR is almost done, there's one tests failure but could be not related to what's added (?) I'll check this.
- On #2578, we need to add unknown_licenses as a CodebaseResource, rather than adding it inside licenses.
- Need to sync with phillipe, on the design and how to go ahead, will set up a sync meeting for tomorrow
- Please post a status update on the Chat
- Work on virtualcodebase is ready for review and comments on PR has been addressed
- Working on documenting the changes made
- Also add general docs to be posted in RTD: https://github.com/nexB/deltacode/issues/133
- Working on test speed improvements: https://github.com/nexB/vulnerablecode/pull/490
- General work on vulnerablecode has progressed, would focus on importers
- Working on adding importers, will push PRs on that next
@tdruez:
- Please make sure you leave status updates and post regularly to keep us updated on the work, and let us know about blockers.
- Keep status updates on the main public chat, as other would be able to see them too.
- @akugarg
- @AvishrantsSh
- @AyanSinhaMahapatra
- @Hritik14
- @JonoYang
- @pratikrocks
- GSoC project updates
- Working on improving license data model definition
- Moving onto reporting known licenses and unknown licenses separately
- Work on virtualcodebase is ready for review
- Working on additional test cases, documenting the changes made, remove unused dependencies from project
- Working on speed improvements
- Begin adding importers, create Contributing.md file
- Worked on scancode.io pipeline for google license classifier
- @Hritik14 asked if we should also discuss documentation updates related to GSoD in the GSoC call
- It would behoove us to combine both calls so we are on the same page regarding documentation
- Reminder that evaluations start on 2021-07-12
- Avishrant @AvishrantsSh
- Shivam @sbs2001
- Tushar @tg1999
- Philippe @pombredanne
- Thomas @tdruez
- Dennis @DennisClark
- Pratick @pratikrocks
- Steven @majurg
- Akanksha @akugarg
- Ayan Mahapatra @AyanSinhaMahapatra
- Hritik @Hritik14
- GSoC projects status
- ScanCode.io integration with VulnerableCode
- Q: We need a project boards for each GSoC project
- A: Philippe to send invites as GitHub commitetsr yo: Akanksha on ScanCode Toolkit, Pratick on DeltaCode, Avishrant on ScanCode.io
- Working on ScanCode TK license models changes to add "is_unknown" flag. Had questions on models resolved by Ayan.
- Made PR on CommonCode that was merged.
- other PR for fingerprint support is pending for review. Steven will check out.
- Discussion about options for Python integration for Go: either as a command line subprocess or using a shared library integration (native, cffi or ctypes)
Some questions:
- Q: I have some issues with ScanCode.io pipelines failing
- A: best is to enter an issue with error log
- Q: Do I need to support multiple OSses?
- A: not needed. For your project this is only Linux
- Working on performance for VulnerableCode with a major performance improvements
- Working on improving tests speeds
Some questions:
- Q: what should be our main channels of communications?
- A: instant discussions on chat, anything that needs to persist goes in tickets
- Q: GSoC evaluations: do we need daily work log?
- A: Nope. The code and commits is all that's needed, but you are welcome to keep your own if you find it useful for you
New importers additions/questions from Shivam pending in the chat
Tushar: New contribution for fetching details for Alpine Docker images for https://github.com/nexB/scancode.io/issues/194