-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Motivation
Given a repository of quality metrics and a package repository, how do we guarantee that the metrics would be reproducible given packages from that repository?
User Experience
From an end-user perspective (administrator, analyst, etc), a user may want to include assertions about checksum consistency as part of their filter. Assuming CRAN is a moving target, this may mean that packages that were permitted a week ago may be filtered out today. To ensure that metrics continue to reflect a repository, a snapshot of the repository would be needed.
A user may provide a filter such as:
options(available_package_filters = risk_filter(...))Where assertions about checksum matches (either for the package source code, hard dependencies or soft dependencies) can be enforced and used as part of the filtering criteria.
Repository structure
This will mean updating the PACKAGES format to include this metadata
Package: A3
Version: 1.0.0
Depends: R (>= 2.15.0), xtable, pbapply
Suggests: randomForest, e1071
# ... additional stats ...
MD5sum: 027ebdd8affce8f0effaecfcd5f5ade2
MD5sumReqDeps: 0a1b2c3d4f5g6h
MD5sumAllDeps: 0a1b2c3d4f5g6h
# ... metrics ...
Or alternatively, we can store hashes for the dependencies used during evaluation
Package: A3
Version: 1.0.0
Depends: R (>= 2.15.0), xtable, pbapply
Suggests: randomForest, e1071
MD5sum: 027ebdd8affce8f0effaecfcd5f5ade2
MD5sum/xtable: 0a1b2c3d4f5g6h
MD5sum/pbapply: 0a1b2c3d4f5g6h
MD5sum/randomForest: 0a1b2c3d4f5g6h
MD5sum/e1071: 0a1b2c3d4f5g6h
This would have the benefit of allowing us to ignore situations where Suggests dependencies are not available to an end user or were not available during evaluation and is a bit more interpretable at the expense of file size.
Implementation
I think the tools for deriving these checksums should live with the filtering tools because it will need to be re-derived for the package repository to apply a filter. But that function should be used in these pipelines to derive the same checksums during metric derivation.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status