fix: deduplicate vulnerabilities in recommend endpoint#2160
fix: deduplicate vulnerabilities in recommend endpoint#2160Strum355 merged 1 commit intoguacsec:mainfrom
Conversation
Reviewer's GuideThis PR ensures the /api/v2/purl/recommend endpoint returns each vulnerability only once even when multiple advisories reference the same vulnerability, by deduplicating vulnerability statuses in the service layer, adjusting the VexStatus model to accept arbitrary string statuses, and adding a regression test (with new test data) that covers the deduplication behavior for a real-world OSV/RustSec advisory pair. Sequence diagram for deduplicated vulnerabilities in recommend endpointsequenceDiagram
actor Client
participant ApiV2PurlRecommend as ApiV2PurlRecommendHandler
participant PurlService
participant AdvisoryRepo as AdvisoryRepository
Client->>ApiV2PurlRecommend: GET /api/v2/purl/recommend?purl=...
ApiV2PurlRecommend->>PurlService: recommend(purl)
PurlService->>AdvisoryRepo: fetch_purl_details(purl)
AdvisoryRepo-->>PurlService: PurlDetails{ advisories }
Note over PurlService: Iterate advisories, flat_map advisory.status, unique_by vulnerability.identifier
PurlService-->>ApiV2PurlRecommend: RecommendResponse{ vulnerabilities (deduplicated) }
ApiV2PurlRecommend-->>Client: 200 OK
Updated class diagram for purl recommendation vulnerability modelingclassDiagram
class PurlService {
+recommend(purl)
}
class PurlDetails {
+Vec~Advisory~ advisories
}
class Advisory {
+Vec~AdvisoryStatus~ status
}
class AdvisoryStatus {
+Vulnerability vulnerability
+VexStatus status
}
class Vulnerability {
+String identifier
}
class VulnerabilityStatus {
+String id
+Option~VexStatus~ status
+Option~String~ justification
}
class VexStatus {
<<enum>>
Affected
Fixed
NotAffected
UnderInvestigation
Recommended
Other(String)
}
PurlService --> PurlDetails : builds_from
PurlDetails --> Advisory : has_many
Advisory --> AdvisoryStatus : has_many_status
AdvisoryStatus --> Vulnerability : refers_to
AdvisoryStatus --> VexStatus : uses
PurlService --> VulnerabilityStatus : produces_deduplicated
VulnerabilityStatus --> VexStatus : wraps
%% Deduplication behavior in PurlService
class DeduplicationLogic {
+collect_unique_vulnerabilities(advisories)
}
PurlService --> DeduplicationLogic : uses
DeduplicationLogic --> AdvisoryStatus : iterates_over
DeduplicationLogic --> Vulnerability : unique_by_identifier
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey there - I've reviewed your changes - here's some feedback:
- The
#[serde(untagged)]attribute on theOther(String)variant ofVexStatuslooks misplaced;untaggedis an enum-level attribute and putting it on a single variant is likely to be ignored or fail to compile—if the intention is to change howVexStatusis deserialized, consider moving this to the enum or clarifying why it’s variant-scoped. - When deduplicating vulnerabilities with
unique_by(|status| &status.vulnerability.identifier), you now implicitly assume that all advisories for a given vulnerability share the same effective status; if differing statuses per advisory are possible, consider documenting this behavior or revisiting whether deduplication should merge or prioritize statuses explicitly.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `#[serde(untagged)]` attribute on the `Other(String)` variant of `VexStatus` looks misplaced; `untagged` is an enum-level attribute and putting it on a single variant is likely to be ignored or fail to compile—if the intention is to change how `VexStatus` is deserialized, consider moving this to the enum or clarifying why it’s variant-scoped.
- When deduplicating vulnerabilities with `unique_by(|status| &status.vulnerability.identifier)`, you now implicitly assume that all advisories for a given vulnerability share the same effective status; if differing statuses per advisory are possible, consider documenting this behavior or revisiting whether deduplication should merge or prioritize statuses explicitly.
## Individual Comments
### Comment 1
<location> `modules/fundamental/src/purl/model/mod.rs:159-160` </location>
<code_context>
NotAffected,
UnderInvestigation,
Recommended,
+ #[serde(untagged)]
Other(String),
}
</code_context>
<issue_to_address>
**issue (bug_risk):** Using `#[serde(untagged)]` on a single enum variant is likely incorrect and may not yield the intended serialization format.
`#[serde(untagged)]` is meant for whole enums, not individual variants, so mixing a single untagged variant with tagged ones is likely to behave unexpectedly or fail to compile. If you need to capture unknown string values, consider a more explicit approach (e.g., a custom `Deserialize` impl or a string-backed enum with `#[serde(rename = "...")]` for known variants) and align it with the intended JSON schema and Serde’s `untagged` docs.
</issue_to_address>
### Comment 2
<location> `modules/fundamental/src/purl/endpoints/test.rs:385` </location>
<code_context>
+
+#[test_context(TrustifyContext)]
+#[test(actix_web::test)]
+async fn get_recommendations_dedup(ctx: &TrustifyContext) -> Result<(), anyhow::Error> {
+ ctx.ingestor
+ .graph()
</code_context>
<issue_to_address>
**suggestion (testing):** Add a complementary test to ensure different vulnerabilities are not merged by the deduplication logic.
You already cover multiple advisories for the same vulnerability ID. Please also add a test where two advisories reference different vulnerability IDs for the same PURL and assert that both are returned. This will help catch any future over-deduplication that might merge distinct vulnerabilities.
Suggested implementation:
```rust
#[test_context(TrustifyContext)]
#[test(actix_web::test)]
async fn get_recommendations_dedup(ctx: &TrustifyContext) -> Result<(), anyhow::Error> {
ctx.ingestor
.graph()
.ingest_qualified_package(
&Purl::from_str("pkg:cargo/hyper@0.14.1-redhat-00001")?,
&ctx.db,
)
.await?;
// Two advisories that refer to the same vulnerability ID for the same package.
ctx.ingest_documents([
"osv/RUSTSEC-2021-0079.json",
"osv/RUSTSEC-2021-0079-DUPLICATE.json",
])
.await?;
let app = caller(ctx).await?;
let body: Value = app
.call_and_read_body_json(
TestRequest::post()
.uri("/api/v1/purl/recommendations")
.set_json(&json!({
"packages": [{
"purl": "pkg:cargo/hyper@0.14.1-redhat-00001"
}]
})),
)
.await;
// Ensure that deduplication merges multiple advisories for the same vulnerability.
let packages = body["packages"].as_array().expect("packages should be an array");
assert_eq!(packages.len(), 1, "expected a single package in the response");
let vulns = packages[0]["vulnerabilities"]
.as_array()
.expect("vulnerabilities should be an array");
assert_eq!(
vulns.len(),
1,
"expected duplicate advisories for the same vulnerability to be deduplicated"
);
Ok(())
}
#[test_context(TrustifyContext)]
#[test(actix_web::test)]
async fn get_recommendations_no_overdedup(ctx: &TrustifyContext) -> Result<(), anyhow::Error> {
// Same package as in the deduplication test.
ctx.ingestor
.graph()
.ingest_qualified_package(
&Purl::from_str("pkg:cargo/hyper@0.14.1-redhat-00001")?,
&ctx.db,
)
.await?;
// Two advisories that reference the same PURL but represent *different* vulnerability IDs.
ctx.ingest_documents([
"osv/RUSTSEC-2021-0079.json",
"osv/RUSTSEC-2021-0080.json",
])
.await?;
let app = caller(ctx).await?;
let body: Value = app
.call_and_read_body_json(
TestRequest::post()
.uri("/api/v1/purl/recommendations")
.set_json(&json!({
"packages": [{
"purl": "pkg:cargo/hyper@0.14.1-redhat-00001"
}]
})),
)
.await;
// Ensure that the two distinct vulnerabilities are both returned and not merged.
let packages = body["packages"].as_array().expect("packages should be an array");
assert_eq!(packages.len(), 1, "expected a single package in the response");
let vulns = packages[0]["vulnerabilities"]
.as_array()
.expect("vulnerabilities should be an array");
assert_eq!(
vulns.len(),
2,
"expected two distinct vulnerabilities for the same PURL"
);
let vuln_ids: std::collections::HashSet<_> = vulns
.iter()
.filter_map(|v| v["id"].as_str().map(|s| s.to_string()))
.collect();
assert_eq!(
vuln_ids.len(),
2,
"expected two distinct vulnerability IDs"
);
Ok(())
}
```
- Ensure `serde_json::json` is imported at the top of the file (e.g. `use serde_json::{json, Value};`) if it is not already.
- Adjust the endpoint path (`"/api/v1/purl/recommendations"`), request shape, and response field paths (`"packages"`, `"vulnerabilities"`, `"id"`) to match the actual API if they differ in your codebase.
- Replace `"osv/RUSTSEC-2021-0080.json"` with an advisory fixture that:
1. Affects the same PURL (`pkg:cargo/hyper@0.14.1-redhat-00001`), and
2. Has a different vulnerability identifier than `RUSTSEC-2021-0079`,
so that the new test truly verifies that different vulnerabilities are not merged by the deduplication logic.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
|
||
| #[test_context(TrustifyContext)] | ||
| #[test(actix_web::test)] | ||
| async fn get_recommendations_dedup(ctx: &TrustifyContext) -> Result<(), anyhow::Error> { |
There was a problem hiding this comment.
suggestion (testing): Add a complementary test to ensure different vulnerabilities are not merged by the deduplication logic.
You already cover multiple advisories for the same vulnerability ID. Please also add a test where two advisories reference different vulnerability IDs for the same PURL and assert that both are returned. This will help catch any future over-deduplication that might merge distinct vulnerabilities.
Suggested implementation:
#[test_context(TrustifyContext)]
#[test(actix_web::test)]
async fn get_recommendations_dedup(ctx: &TrustifyContext) -> Result<(), anyhow::Error> {
ctx.ingestor
.graph()
.ingest_qualified_package(
&Purl::from_str("pkg:cargo/hyper@0.14.1-redhat-00001")?,
&ctx.db,
)
.await?;
// Two advisories that refer to the same vulnerability ID for the same package.
ctx.ingest_documents([
"osv/RUSTSEC-2021-0079.json",
"osv/RUSTSEC-2021-0079-DUPLICATE.json",
])
.await?;
let app = caller(ctx).await?;
let body: Value = app
.call_and_read_body_json(
TestRequest::post()
.uri("/api/v1/purl/recommendations")
.set_json(&json!({
"packages": [{
"purl": "pkg:cargo/hyper@0.14.1-redhat-00001"
}]
})),
)
.await;
// Ensure that deduplication merges multiple advisories for the same vulnerability.
let packages = body["packages"].as_array().expect("packages should be an array");
assert_eq!(packages.len(), 1, "expected a single package in the response");
let vulns = packages[0]["vulnerabilities"]
.as_array()
.expect("vulnerabilities should be an array");
assert_eq!(
vulns.len(),
1,
"expected duplicate advisories for the same vulnerability to be deduplicated"
);
Ok(())
}
#[test_context(TrustifyContext)]
#[test(actix_web::test)]
async fn get_recommendations_no_overdedup(ctx: &TrustifyContext) -> Result<(), anyhow::Error> {
// Same package as in the deduplication test.
ctx.ingestor
.graph()
.ingest_qualified_package(
&Purl::from_str("pkg:cargo/hyper@0.14.1-redhat-00001")?,
&ctx.db,
)
.await?;
// Two advisories that reference the same PURL but represent *different* vulnerability IDs.
ctx.ingest_documents([
"osv/RUSTSEC-2021-0079.json",
"osv/RUSTSEC-2021-0080.json",
])
.await?;
let app = caller(ctx).await?;
let body: Value = app
.call_and_read_body_json(
TestRequest::post()
.uri("/api/v1/purl/recommendations")
.set_json(&json!({
"packages": [{
"purl": "pkg:cargo/hyper@0.14.1-redhat-00001"
}]
})),
)
.await;
// Ensure that the two distinct vulnerabilities are both returned and not merged.
let packages = body["packages"].as_array().expect("packages should be an array");
assert_eq!(packages.len(), 1, "expected a single package in the response");
let vulns = packages[0]["vulnerabilities"]
.as_array()
.expect("vulnerabilities should be an array");
assert_eq!(
vulns.len(),
2,
"expected two distinct vulnerabilities for the same PURL"
);
let vuln_ids: std::collections::HashSet<_> = vulns
.iter()
.filter_map(|v| v["id"].as_str().map(|s| s.to_string()))
.collect();
assert_eq!(
vuln_ids.len(),
2,
"expected two distinct vulnerability IDs"
);
Ok(())
}- Ensure
serde_json::jsonis imported at the top of the file (e.g.use serde_json::{json, Value};) if it is not already. - Adjust the endpoint path (
"/api/v1/purl/recommendations"), request shape, and response field paths ("packages","vulnerabilities","id") to match the actual API if they differ in your codebase. - Replace
"osv/RUSTSEC-2021-0080.json"with an advisory fixture that:- Affects the same PURL (
pkg:cargo/hyper@0.14.1-redhat-00001), and - Has a different vulnerability identifier than
RUSTSEC-2021-0079,
so that the new test truly verifies that different vulnerabilities are not merged by the deduplication logic.
- Affects the same PURL (
02e9785 to
0f63fe1
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2160 +/- ##
=======================================
Coverage 68.17% 68.18%
=======================================
Files 375 375
Lines 21048 21046 -2
Branches 21048 21046 -2
=======================================
Hits 14350 14350
+ Misses 5836 5828 -8
- Partials 862 868 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| NotAffected, | ||
| UnderInvestigation, | ||
| Recommended, | ||
| #[serde(untagged)] |
There was a problem hiding this comment.
Looks good in general. I would also note that serde untagged is both not documented nor tested behaviour. Do we have an example of "Other" status, that we can test with?
There was a problem hiding this comment.
I had a test but it seems I forgot to git add the change. QE had examples of it during testing but I wasnt able to reproduce it locally so I had to do a bit of a workaround in the unit test by manually adding it to postgres
#[test_context(TrustifyContext)]
#[test(actix_web::test)]
async fn get_recommendations_other_status(ctx: &TrustifyContext) -> Result<(), anyhow::Error> {
use sea_orm::{ActiveModelTrait, ColumnTrait, EntityTrait, QueryFilter, Set};
use trustify_entity::{purl_status, status};
ctx.ingestor
.graph()
.ingest_qualified_package(
&Purl::from_str("pkg:cargo/hyper@0.14.1-redhat-00001")?,
&ctx.db,
)
.await?;
ctx.ingest_documents(["osv/RUSTSEC-2021-0079.json"]).await?;
let custom_status_id = Uuid::new_v4();
let custom_status = status::ActiveModel {
id: Set(custom_status_id),
slug: Set("custom_status".to_string()),
name: Set("Custom Status".to_string()),
description: Set(Some("A custom status for testing".to_string())),
};
status::Entity::insert(custom_status).exec(&ctx.db).await?;
let purl_statuses = purl_status::Entity::find()
.filter(purl_status::Column::VulnerabilityId.eq("CVE-2021-32714"))
.all(&ctx.db)
.await?;
assert!(!purl_statuses.is_empty());
for ps in purl_statuses {
let mut active: purl_status::ActiveModel = ps.into();
active.status_id = Set(custom_status_id);
active.update(&ctx.db).await?;
}
let app = caller(ctx).await?;
let recommendations: Value = app
.call_and_read_body_json(
TestRequest::post()
.uri("/api/v2/purl/recommend")
.set_json(json!({"purls": ["pkg:cargo/hyper@0.14.1"]}))
.to_request(),
)
.await;
log::info!("{recommendations:#?}");
let entry =
&recommendations["recommendations"].as_object().unwrap()["pkg:cargo/hyper@0.14.1"][0];
let vulns = entry["vulnerabilities"].as_array().unwrap();
let vuln = vulns
.iter()
.find(|v| v["id"].as_str().unwrap() == "CVE-2021-32714")
.unwrap();
assert_eq!(vuln["status"], "custom_status");
Ok(())
}without the variant being untagged, this test yields:
2025-12-03T11:30:22.239191Z INFO trustify_module_fundamental::purl::endpoints::test: Object {
"recommendations": Object {
"pkg:cargo/hyper@0.14.1": Array [
Object {
"package": String("pkg:cargo/hyper@0.14.1-redhat-00001"),
"vulnerabilities": Array [
Object {
"id": String("CVE-2021-32714"),
"status": Object {
"Other": String("custom_status"),
},
},
],
},
],
},
}
thread 'purl::endpoints::test::get_recommendations_other_status' (599692) panicked at modules/fundamental/src/purl/endpoints/test.rs:483:5:
assertion `left == right` failed
left: Object {"Other": String("custom_status")}
right: "custom_status"
0f63fe1 to
2d8ee62
Compare
2d8ee62 to
476ada3
Compare
dejanb
left a comment
There was a problem hiding this comment.
Looks good. Let's try to find a real case when status can be "other" and test with that. If that is not possible, we need to refactor the enum.
|
/backport |
|
Successfully created backport PR for |
If multiple advisories referenced the same vulnerability, the /api/v2/purl/recommend endpoint would return duplicate vulnerabilities as there is no advisory ID included to differentiate them from each other.
Summary by Sourcery
Deduplicate vulnerabilities returned by the purl recommendation endpoint when multiple advisories reference the same underlying vulnerability.
Bug Fixes:
Enhancements:
Tests: