refactor: accept provenance data in artifact pipeline check #872

behnazh-w · 2024-09-27T07:41:29Z

Refactoring the artifact pipeline detection check

Renames mcn_infer_artifact_pipeline_1 to mcn_find_artifact_pipeline_1.
This check can support all the package registries now.
Modifies the check fact table schema by adding new columns and allowing some existing columns to be nullable. This change enables us to store the reasons for check failures, such as when a GitHub workflow run is deleted, which may result in some previous columns lacking values.
Improve the heusristics, e.g., if an artifact is published before the corresponding code is committed, there cannot be a CI pipeline that triggered the publishing.
This check depends on the deploy command identified by the mcn_build_as_code_1 check. If a deploy command is detected, this check will attempt to locate a successful CI pipeline that triggered the step containing the deploy command.
When a verifiable provenance is found for an artifact, we use it to obtain the pipeline trigger. Otherwise, we use heuristics to find the triggering pipeline.

Improvements to `mcn_build_as_code_1`

If a provenance is found, we obtain the workflow that has triggered the artifact release.
Add support for Reusable GitHub Actions that perform automatic deployment. Since we do not analyze the external Reusable GitHub Actions, we use an allow list of approved Actions.
A new function, infer_confidence_deploy_workflow is added to BaseBuildTool to infer the confidence for such Reusable workflows.

The `store_inferred_build_info_results` function

Renamed store_inferred_provenance to store_inferred_build_info_results.
To avoid confusion, we avoid using the term inferred provenance here and instead simplify store build related information in the context object provided to checks.
Instead of using CIInfo["provenances"] for inferred build command analysis results, we use a new field: CIInfo["build_info_results"].

Provenance Extractor

New abstractions added to the provenance extractor to reuse the logic for extracting information such as ProvenanceBuildDefinition and ProvenancePredicate. With these new abstractions, we don't need to hardcode the expected buildType value while processing a provenance.

`find_publish_timestamp`

Added an API that can obtain the artifact timestamp for all the supported package registries.
By default we use deps.dev to obtain the timestamp except for Maven artifacts because we have observed that Maven Central has more accurate results.
Decoupled the Maven Central search API from the repository, making the hostname fully configurable to enable offline testing with a localhost server.

Tutorial and integration tests

Changed the Detecting a malicious Java dependency uploaded manually to Maven Central tutorial to Detecting Java dependencies manually uploaded to Maven Central
Used log4j-core artifact instead of guava, which has an automated deployment workflow.
Fixed the integration tests and added a new one for log4j-core.

src/macaron/slsa_analyzer/checks/infer_artifact_pipeline_check.py

src/macaron/repo_finder/provenance_extractor.py

src/macaron/slsa_analyzer/build_tool/base_build_tool.py

src/macaron/slsa_analyzer/ci_service/github_actions/github_actions_ci.py

src/macaron/slsa_analyzer/package_registry/package_registry.py

tests/integration/cases/micronaut-projects_micronaut-core/config.ini

src/macaron/config/defaults.ini

src/macaron/slsa_analyzer/checks/infer_artifact_pipeline_check.py

src/macaron/slsa_analyzer/package_registry/package_registry.py

src/macaron/slsa_analyzer/package_registry/maven_central_registry.py

src/macaron/slsa_analyzer/specs/ci_spec.py

src/macaron/slsa_analyzer/checks/build_as_code_check.py

src/macaron/repo_finder/provenance_extractor.py

src/macaron/slsa_analyzer/checks/infer_artifact_pipeline_check.py

src/macaron/repo_finder/provenance_extractor.py

src/macaron/slsa_analyzer/ci_service/github_actions/github_actions_ci.py

tromai

I have finished my review. Thanks.
Overall, there isn't any major changes needs. Most of my comments are for minor improvements/nit picking.

tromai · 2024-11-16T05:13:54Z

Except from a small question in #872 (comment), my approval still holds. The PR could be merged as it.

Signed-off-by: behnazh-w <behnaz.hassanshahi@oracle.com>

This PR renames `mcn_infer_artifact_pipeline_1` to `mcn_find_artifact_pipeline_1`. This check can support all the package registries now. When a verifiable provenance is found for an artifact, we use it to obtain the pipeline trigger. Otherwise, we use heuristics to find the triggering pipeline. Signed-off-by: behnazh-w <behnaz.hassanshahi@oracle.com>

oracle-contributor-agreement bot added the OCA Verified label Sep 27, 2024

behnazh-w force-pushed the behnazh/refactor-infer-publish-check branch 5 times, most recently from ac6cbcd to 7eac146 Compare October 2, 2024 09:47

tromai reviewed Oct 6, 2024

View reviewed changes

src/macaron/slsa_analyzer/checks/infer_artifact_pipeline_check.py Outdated Show resolved Hide resolved

behnazh-w force-pushed the behnazh/refactor-infer-publish-check branch from 1586789 to e592c5d Compare October 29, 2024 05:29

behnazh-w marked this pull request as ready for review October 29, 2024 05:29

behnazh-w requested a review from benmss October 29, 2024 05:30

benmss reviewed Oct 31, 2024

View reviewed changes

src/macaron/repo_finder/provenance_extractor.py Outdated Show resolved Hide resolved

benmss reviewed Nov 5, 2024

View reviewed changes

src/macaron/slsa_analyzer/build_tool/base_build_tool.py Show resolved Hide resolved

benmss reviewed Nov 5, 2024

View reviewed changes

src/macaron/slsa_analyzer/ci_service/github_actions/github_actions_ci.py Outdated Show resolved Hide resolved

benmss reviewed Nov 5, 2024

View reviewed changes

src/macaron/slsa_analyzer/ci_service/github_actions/github_actions_ci.py Outdated Show resolved Hide resolved

benmss reviewed Nov 5, 2024

View reviewed changes

src/macaron/slsa_analyzer/package_registry/package_registry.py Outdated Show resolved Hide resolved

benmss reviewed Nov 5, 2024

View reviewed changes

tests/integration/cases/micronaut-projects_micronaut-core/config.ini Outdated Show resolved Hide resolved

behnazh-w force-pushed the behnazh/refactor-infer-publish-check branch from e592c5d to c9761dc Compare November 6, 2024 06:03

benmss approved these changes Nov 8, 2024

View reviewed changes

tromai reviewed Nov 10, 2024

View reviewed changes

src/macaron/config/defaults.ini Show resolved Hide resolved

tromai reviewed Nov 10, 2024

View reviewed changes

src/macaron/config/defaults.ini Show resolved Hide resolved

tromai reviewed Nov 11, 2024

View reviewed changes

src/macaron/slsa_analyzer/checks/infer_artifact_pipeline_check.py Show resolved Hide resolved

tromai reviewed Nov 11, 2024

View reviewed changes

src/macaron/slsa_analyzer/package_registry/package_registry.py Show resolved Hide resolved

tromai reviewed Nov 11, 2024

View reviewed changes

src/macaron/slsa_analyzer/package_registry/package_registry.py Show resolved Hide resolved

tromai reviewed Nov 11, 2024

View reviewed changes

src/macaron/slsa_analyzer/package_registry/package_registry.py Outdated Show resolved Hide resolved

tromai reviewed Nov 11, 2024

View reviewed changes

src/macaron/slsa_analyzer/package_registry/package_registry.py Outdated Show resolved Hide resolved

tromai reviewed Nov 11, 2024

View reviewed changes

src/macaron/slsa_analyzer/package_registry/package_registry.py Outdated Show resolved Hide resolved