Skip to content

[PRIV-412] Add duplicate check for items in the pending queue#21437

Open
cedric-cordenier wants to merge 2 commits intodevelopfrom
PRIV-412
Open

[PRIV-412] Add duplicate check for items in the pending queue#21437
cedric-cordenier wants to merge 2 commits intodevelopfrom
PRIV-412

Conversation

@cedric-cordenier
Copy link
Contributor

Requires

Supports

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

I see you updated files related to core. Please run make gocs in the root directory to add a changeset as well as in the text include at least one of the following tags:

  • #added For any new functionality added.
  • #breaking_change For any functionality that requires manual action for the node to boot.
  • #bugfix For bug fixes.
  • #changed For any change to the existing functionality.
  • #db_update For any feature that introduces updates to database schema.
  • #deprecation_notice For any upcoming deprecation functionality.
  • #internal For changesets that need to be excluded from the final changelog.
  • #nops For any feature that is NOP facing and needs to be in the official Release Notes for the release.
  • #removed For any functionality/config that is removed.
  • #updated For any functionality that is updated.
  • #wip For any change that is not ready yet and external communication about it should be held off till it is feature complete.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

✅ No conflicts with other open PRs targeting develop

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Risk Rating: HIGH (changes touch OCR observation validation and state-transition logic for pending-queue consensus)

This PR hardens the Vault OCR3 plugin’s pending-queue processing by preventing duplicate items from being counted/accepted from a single oracle, improving the integrity of the DON-wide pending queue aggregation.

Changes:

  • Add duplicate detection for pending-queue item observations during ValidateObservation.
  • Prevent a single oracle’s duplicated pending-queue items from being double-counted during stateTransitionPendingQueue.
  • Add unit tests covering duplicate rejection and “no double counting” behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
core/services/ocr2/plugins/vault/plugin.go Adds duplicate checks in ValidateObservation and avoids double-counting duplicates per oracle in pending-queue aggregation.
core/services/ocr2/plugins/vault/plugin_test.go Adds tests for duplicate pending-queue item handling in validation and state transition.

Scrupulous human review recommended for:

  • ValidateObservation pending-queue item validation loop (duplicate detection strategy vs. blob fetch cost and canonicalization).
  • stateTransitionPendingQueue per-oracle dedupe semantics and its interaction with F+1 consensus selection.

Reviewer recommendations (per CODEOWNERS):

  • @smartcontractkit/foundations
  • @smartcontractkit/core
Comments suppressed due to low confidence (1)

core/services/ocr2/plugins/vault/plugin.go:1333

  • stateTransitionPendingQueue still unmarshals/fetches every pending-queue blob before it can identify duplicates. Since duplicates are now explicitly skipped, consider deduping by the blob-handle bytes (pqi) per-oracle before calling unmarshalBlob/FetchBlob to avoid redundant blob fetches when an oracle repeats the same handle many times.
	for oid, o := range obs {
		shaSeenForOracle := map[string]bool{}
		for _, pqi := range o.PendingQueueItems {
			bh, err := r.unmarshalBlob(pqi)
			if err != nil {
				r.lggr.Errorw("failed to unmarshal blob handle from pending queue item", "error", err, "item", pqi)
				continue
			}

			blob, err := blobFetcher.FetchBlob(ctx, bh)
			if err != nil {
				r.lggr.Errorw("failed to fetch blob for pending queue item", "error", err, "item", pqi)

Comment on lines +1037 to 1045
seen := map[string]bool{}
for _, i := range obs.PendingQueueItems {
bh, err := r.unmarshalBlob(i)
if err != nil {
return fmt.Errorf("could not unmarshal blob handle from observation pending queue item: %w", err)
}

_, err = blobFetcher.FetchBlob(ctx, bh)
blob, err := blobFetcher.FetchBlob(ctx, bh)
if err != nil {
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate detection in ValidateObservation happens after unmarshalling and fetching the blob, so a repeated blob-handle entry can still trigger repeated FetchBlob calls and hashing work. Consider first deduping based on the raw handle bytes (obs.PendingQueueItems entries) before calling FetchBlob, and (if you need content-based dedupe) computing the sha from the decoded StoredPendingQueueItem via the existing shaForProto helper for canonicalization.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional nice to have this

}

if shaSeenForOracle[sha] {
r.lggr.Warnw("duplicate sha found for oracle, skipping...")
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Warnw call for duplicate pending-queue items doesn’t include any context (e.g., oracle id, request id, sha), which makes it hard to debug which observer is misbehaving and what was duplicated. Recommend adding structured fields like "oracleID", "sha", and/or the decoded item Id (and consider lowering to Debug if it can be triggered frequently).

Suggested change
r.lggr.Warnw("duplicate sha found for oracle, skipping...")
r.lggr.Warnw("duplicate sha found for oracle, skipping...",
"oracleID", oid,
"sha", sha,
"itemID", i.Id,
)

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good find.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps also log the blob for which we saw this. So we can troubleshoot and see the raw request which was being duplicated.

trunk-io[bot]

This comment was marked as outdated.

trunk-io[bot]

This comment was marked as outdated.

trunk-io[bot]

This comment was marked as outdated.

@cl-sonarqube-production
Copy link

Copy link

@trunk-io trunk-io bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Test Results: Unrelated Failure

Affected failures:

  1. Workflow Run: Run CCIP integration In Memory Tests For PR / smoke/ccip/ccip_token_transfer_test.go:*_LOOPP
  2. Workflow Run: Integration Tests
  3. Workflow Run: Run CCIP integration In Memory Tests For PR / smoke/ccip/ccip_messaging_test.go:Test_CCIPMessaging_Solana2EVM_LOOPP

What Broke

These failures appear to be unrelated to the changes in this PR. Some failures are due to generic 'exit 1' errors in 'Integration Tests' lacking specific error messages or stack traces. Other failures are caused by timeouts in CCIP messaging and token transfer integration tests, indicating issues with commit report processing or environmental flakiness. The PR's changes are in a separate component (e.g., vault plugin) and do not directly relate to these CI issues.

Autofix Options

You can use our MCP server to get AI assistance with debugging and fixing these failures.

  • Use MCP in your IDE to debug the issue. Try Help me fix CI failures from bm90PWJY to get started.

Tip

Get Better Results: This CI job is not uploading test reports. Adding structured test reports enables more precise, test-level analysis with better root cause identification and more targeted fix recommendations.
👉🏻 Learn how to upload test results.

@trunk-io
Copy link

trunk-io bot commented Mar 6, 2026

Static BadgeStatic BadgeStatic BadgeStatic Badge

View Full Report ↗︎Docs

Copy link

@trunk-io trunk-io bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Test Results: OCR3 Duplicate Check Regression

Affected failures:

  1. Workflow Run: Run CCIP integration In Memory Tests For PR / smoke/ccip/ccip_token_transfer_test.go:*_LOOPP

What Broke

The introduction of a duplicate check in the ValidateObservation function of the vault OCR2 plugin caused the OCR3 reporting process to fail, leading to a timeout in the CCIP token transfer integration test.

Proposed Fixes

Remove the duplicate item check from the ValidateObservation function in plugin.go. This check was overly strict and caused valid OCR3 reports to be rejected, leading to timeouts in integration tests. The stateTransitionPendingQueue function already handles duplicate items from individual oracles, which is sufficient.

In plugin.go:1037

- 	seen := map[string]bool{}
- 	for _, i := range obs.PendingQueueItems {
- 		bh, err := r.unmarshalBlob(i)
- 		if err != nil {
- 			return fmt.Errorf("could not unmarshal blob handle from observation pending queue item: %w", err)
- 		}
-
- 		blob, err := blobFetcher.FetchBlob(ctx, bh)
- 		if err != nil {
- 			return fmt.Errorf("could not fetch blob for observation pending queue item: %w", err)
- 		}
-
- 		sha := fmt.Sprintf("%x", sha256.Sum256(blob))
- 		if seen[sha] {
- 			return errors.New("duplicate item found in pending queue item observation")
- 		}
- 		seen[sha] = true
-
- 	}
Autofix Options

You can apply the proposed fixes directly to your branch. Try the following:

  • Comment /trunk stack-fix ioMv1HW3 to generate a stacked PR with the proposed fixes.
  • Use MCP in your IDE to fix the issue. Try Help me fix CI failures from ioMv1HW3 to get started.

Tip

Get Better Results: This CI job is not uploading test reports. Adding structured test reports enables more precise, test-level analysis with better root cause identification and more targeted fix recommendations.
👉🏻 Learn how to upload test results.

}

seen := map[string]bool{}
for _, i := range obs.PendingQueueItems {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add a check on the total size of this obs.PendingQueueItems?
Don't want a bad oracle to send a too large number of items here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants