-
-
Notifications
You must be signed in to change notification settings - Fork 61
Description
I have set up PurlDB locally and explored the data model, particularly Package and DependentPackage in packagedb/models.py.
From my understanding:
Dependencies are stored via the DependentPackage model
Each dependency links to a Package via a package ForeignKey (the source package)
The target dependency is stored as a PURL string (purl field), not a direct ForeignKey to another Package
This suggests that dependency relationships form a directed graph:
Package A ---> Package B
where B is resolved from the dependency PURL.
Proposal:
Graph-Based Popularity Metric
I would like to explore building a popularity metric based on:
Dependency graph centrality (PageRank-style algorithm)
In-degree (number of reverse dependencies)
Optional freshness/activity decay factor (e.g., release_date or mining_level)
Possibly ignoring versions and computing popularity at package identity level
Proposed Approach (PoC)
Start with a single ecosystem (e.g., PyPI)
Resolve dependency PURLs to canonical package identities
Build directed graph:
Nodes = Packages (ignoring version)
Edges = Dependency relationships
Compute:
In-degree
PageRank score
Store result as popularity_score field on Package
Expose score via REST API
Questions
Should popularity be computed:
As a periodic batch job
Or dynamcally?
Should we:
Store resolved dependency edges in a normalized table?
Or resolve PURLs during computation?
Is ignoring version appropriate for initial PoC?
I would appreciate feedback before proceeding with implementation.