Skip to content

Add replay-witness-test job to L2 CI workflow#6364

Draft
avilagaston9 wants to merge 1 commit intorefactor/remove-keys-from-execution-witnessfrom
ci/replay-witness-test
Draft

Add replay-witness-test job to L2 CI workflow#6364
avilagaston9 wants to merge 1 commit intorefactor/remove-keys-from-execution-witnessfrom
ci/replay-witness-test

Conversation

@avilagaston9
Copy link
Contributor

Motivation

We need CI coverage to verify that ethrex-replay works against the current ethrex code. This catches breaking changes to ExecutionWitness, the debug_executionWitness RPC endpoint, or any VM/blockchain APIs that ethrex-replay depends on. Follows up on #6356.

Description

Add a replay-witness-test job to the L2 CI workflow that:

  1. Starts an L1 dev node using the existing docker image
  2. Clones ethrex-replay and patches its ethrex dependencies to the current PR's local checkout using Cargo [patch]
  3. Sends a test transaction via rex and replays the block containing it with --no-zkvm
  4. Asserts the replay succeeds (state root matches)

The [patch] approach means ethrex-replay's Cargo.toml always points to ethrex main, but CI overrides the dependencies with the local checkout. No pinned SHAs or branch name assumptions.

Checklist

  • Updated STORE_SCHEMA_VERSION — N/A, CI-only change

patches its ethrex dependencies with the current PR's local checkout using
Cargo [patch], then sends a transaction to a dev L1 node and replays the
block to verify the execution witness end-to-end.
@github-actions
Copy link

🤖 Kimi Code Review

Review of PR #6364: Replay Witness Test CI Job

Security & Correctness Issues

  1. Hard-coded private key in CI workflow (lines 1016-1017)

    • The private key 0x850643a0224065ecce3882673c21f56bcf6eef86274cc21cadff15930b59fc8c is hard-coded in the workflow file
    • This is a security risk as it's exposed in the repository history
    • Recommendation: Use GitHub secrets or environment variables for sensitive data
  2. Unvalidated transaction hash extraction (lines 1018-1021)

    • The TX_HASH extraction uses head -1 without validation
    • If the rex send command fails or outputs unexpected format, this could capture incorrect data
    • Recommendation: Add validation to ensure the output is a valid transaction hash format
  3. Race condition in transaction receipt polling (lines 1026-1032)

    • Uses fixed 3-second sleep instead of proper polling with timeout
    • Could fail if block production is slower than expected
    • Recommendation: Implement proper polling with exponential backoff and timeout

Performance & Reliability Issues

  1. Missing timeout for replay command (lines 1037-1041)

    • The ethrex-replay command runs without timeout protection
    • Could hang indefinitely if there are issues
    • Recommendation: Add timeout wrapper or use GitHub Actions timeout-minutes more granularly
  2. Inefficient dependency patching (lines 965-1000)

    • The Cargo.toml patching uses a large block of hard-coded paths
    • This is brittle and will break if crate structure changes
    • Recommendation: Consider using a script to generate the patch dynamically or use workspace inheritance

Code Quality & Maintainability

  1. Python dependency assumption (line 1031)

    • Assumes Python3 is available for JSON parsing
    • Recommendation: Use jq instead (likely already available in CI) or add explicit Python installation step
  2. Missing cleanup steps

    • No cleanup of /tmp/ethrex-replay directory after test completion
    • Recommendation: Add cleanup step to prevent disk space issues
  3. Inconsistent job naming

    • Job name is "Replay Witness Test" but the actual test seems to be a basic transaction replay
    • Recommendation: Rename to better reflect the test's purpose

Minor Issues

  1. Redundant condition in final job (line 1055)

    • The if condition is getting very long and hard to read
    • Recommendation: Consider using a more concise approach or splitting into multiple lines
  2. Missing error handling for docker commands

    • No validation that docker load and docker compose up commands succeed
    • Recommendation: Add || exit 1 or similar error handling

The workflow addition is generally well-structured, but addressing the security issue with the private key and the reliability issues with transaction polling should be prioritized.


Automated review by Kimi (Moonshot AI) · custom prompt

@github-actions
Copy link

🤖 Codex Code Review

  1. High – Unverified remote binary execution in CI
    File: .github/workflows/pr-main_l2.yaml
    curl -L .../rex-linux-x86_64 downloads and executes a binary without checksum/signature verification. This is a supply-chain risk (runner code execution if release/CDN/repo is compromised).
    Suggested change: pin artifact digest and verify before install (e.g., sha256sum -c), or build from source at a pinned commit.

  2. High – Non-deterministic/unpinned external repo in required check
    File: .github/workflows/pr-main_l2.yaml
    git clone https://github.com/lambdaclass/ethrex-replay.git pulls default branch head. Required CI behavior can change independently of this PR, and compromise of that repo can alter gate behavior.
    Suggested change: checkout a pinned commit/tag (git checkout <sha>), and update intentionally.

  3. Medium – Flaky transaction/receipt handling can produce false CI failures
    File: .github/workflows/pr-main_l2.yaml
    TX_HASH=$(... | head -1) assumes first output line is hash; this is brittle if CLI output format changes.
    File: .github/workflows/pr-main_l2.yaml
    Single sleep 3 + one receipt fetch is race-prone; result can be null, causing Python failure at line 1029.
    Suggested change: parse structured output if available (--json), validate hash format, and poll receipt with timeout/retries until non-null block number.

No Rust/EVM/state-transition code changed in this PR, so consensus/opcode/gas/state-trie/RLP correctness is not directly impacted here.


Automated review by OpenAI Codex · custom prompt

@github-actions
Copy link

🤖 Claude Code Review

Here is my review of PR 6364:


PR Review: Add replay-witness-test job to L2 CI workflow

The approach is sound: using Cargo [patch] to override ethrex dependencies in ethrex-replay with the local PR checkout is clean and avoids branch-name assumptions. The overall structure follows the existing CI patterns.


Bugs / Reliability Issues

Fragile sleep 3 for transaction receipt (line 1025)

The fixed 3-second sleep before requesting the receipt is a race condition. If the dev node is under any load or the block time is configured longer, eth_getTransactionReceipt returns {"result": null}, and the downstream python3 invocation will crash with an unhelpful TypeError: 'NoneType' object is not subscriptable.

The "Wait for L1 to be ready" step already has a well-structured polling loop — the same pattern should be used here:

MAX_TRIES=15
for i in $(seq 1 $MAX_TRIES); do
  BLOCK_NUMBER=$(curl -sf -X POST http://localhost:8545 \
    -H "Content-Type: application/json" \
    -d "{\"jsonrpc\":\"2.0\",\"method\":\"eth_getTransactionReceipt\",\"params\":[\"$TX_HASH\"],\"id\":1}" \
    | python3 -c "import sys,json; r=json.load(sys.stdin); d=r.get('result'); print(int(d['blockNumber'],16)) if d else exit(1)" 2>/dev/null) && break
  echo "Waiting for receipt... (attempt $i)"
  sleep 2
done

TX_HASH extracted without validation (line 1021)

rex send ... 2>&1 | head -1 returns the first line of combined stdout+stderr. If rex prints a warning or progress line before the hash, TX_HASH will contain non-hash content, and the subsequent eth_getTransactionReceipt call will silently return null (or an error), again crashing in python3 with no useful message. At minimum, validate the extracted value looks like a hex hash:

if ! [[ "$TX_HASH" =~ ^0x[0-9a-fA-F]{64}$ ]]; then
  echo "Failed to extract valid tx hash: '$TX_HASH'"
  exit 1
fi

Security

rex binary downloaded without checksum verification (line 977)

curl -L https://github.com/lambdaclass/rex/releases/download/v8.0.0/rex-linux-x86_64 -o /tmp/rex

The binary is downloaded and executed with no integrity check. If the GitHub release is tampered with or the download is MITM'd (unlikely with GitHub but still a supply-chain concern for a CI pipeline that runs on PRs), the binary runs in the pipeline with full access to $GITHUB_WORKSPACE and the Docker socket. A SHA-256 checksum check should follow the download:

echo "<expected-sha256>  /tmp/rex" | sha256sum --check

Note: The hardcoded private key on line 1017 is fine — it is a well-known dev genesis account used in a local-only ephemeral node, equivalent to Hardhat/Foundry's default accounts.


Performance

cargo build --release for a CI smoke test (line 1012)

A release build for ethrex-replay is significantly slower than a debug build. Since this job is purely a correctness/integration test (not a performance benchmark), cargo build (debug mode) would be sufficient and meaningfully faster. The binary path would change from ./target/release/ethrex-replay to ./target/debug/ethrex-replay.


Minor / Design Notes

ethrex-replay cloned from latest main (line 983)

Acknowledged in the PR description, but worth noting: if ethrex-replay's main branch has a breaking API change or build failure unrelated to this PR, the CI job will fail on this PR with a misleading error. Pinning to a tag (e.g., git clone --branch v1.2.3) or a specific commit SHA would make failures deterministic and attributable.

all-tests gate condition is asymmetric (line 1058)

The existing if: condition checks that several jobs are not skipped, and this PR correctly adds needs.replay-witness-test.result != 'skipped'. However, integration-test-tdx is listed in needs but is absent from the if: condition check (this is a pre-existing issue, not introduced here). The new replay-witness-test is handled correctly.

[patch] section indentation (lines 991–1009)

The heredoc content has leading whitespace (4 spaces before [patch....] and each line). Cargo accepts this for TOML, but it's slightly non-standard. More importantly, Cargo requires [patch] sections to be at the top level — leading spaces on the [patch....] header line may be rejected by stricter TOML parsers in future Cargo versions. It would be safer to strip the leading indentation from the heredoc body.


Summary

Severity Item
Medium sleep 3 is a race — poll for receipt with retry logic
Medium rex binary not checksum-verified
Low TX_HASH not validated before use
Low cargo build --release is slow; debug build suffices for CI
Info ethrex-replay pinned to latest main
Info TOML heredoc indentation

The core design (Cargo [patch], docker image reuse, dev node setup) is solid. Addressing the receipt polling race and the binary checksum are the most important fixes before merging.


Automated review by Claude (Anthropic) · custom prompt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant