Skip to content

Conversation

@zhexuany
Copy link
Contributor

@zhexuany zhexuany commented Feb 8, 2026

Overview

This PR fixes S3 request signing for non-standard ports, brings ROS1 bag writing and reading into full format compliance (so bags open correctly in Foxglove), implements the extract CLI end-to-end, and tightens bag parsing with strict validation.

Closes #30, #31, #32

Changes

S3 transport

  • Host header: Include port in the Host header when using non-standard ports (e.g. MinIO), per AWS behavior.
  • SigV4: Fix canonical request construction to match the AWS signing spec so pre-signed URLs and signed requests work correctly.

ROS1 bag writer (BagWriter)

  • Connection records: Add the required topic field in the connection record data section (previously only in the header). Omit callerid when empty instead of writing a bare /.
  • INDEX_DATA offsets: Store message offsets relative to the start of chunk data only. Offsets previously included the chunk header length, so readers (e.g. Foxglove) seeking by index read into the wrong byte range and hit "Header field is missing equals sign" on multi-chunk bags.

ROS1 bag reader (BagParser)

  • Strict validation:
    • parse_record_header: error if any header field is missing the = separator (no silent skip).
    • connection_from_fields: require topic, type, and md5sum in the connection data section; return Result and propagate errors instead of using unwrap_or_default().
  • Streaming parser: Correctly handle OP_CHUNK records in StreamingBagParser so streaming and sequential reads are consistent.

Extract CLI and public API

  • iter_raw: Add FormatReader::iter_raw_boxed() and RoboReader::iter_raw() to iterate raw (undecoded) messages with channel info. Implement for bag (sequential and parallel) and MCAP readers.
  • Extract commands: Implement extract messages, extract topics, extract per-topic, extract time-range, and extract fixture using the new API.

Fixtures and tests

  • Bag fixtures: Regenerate all 8 bag fixtures with the fixed writer so they are valid (connection data + correct INDEX_DATA offsets). Removed temporary debug test tests/test_bag_conn_simple.rs.
  • Streaming: Add integration tests for StreamingBagParser with real fixtures; expand test_bag_stream.rs. Fix clippy len_zero lint in tests.

Testing

  • Unit and integration tests pass (library, CLI, round-trip, bag decode, schema preservation, bag streaming).
  • Manual check: 90-message extract from a real bag opens and plays in Foxglove without "Header field is missing equals sign".
  • cargo clippy --package robocodec -- -D warnings clean.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality – extract CLI, iter_raw API)
  • Breaking change
  • Documentation update
  • Performance improvement
  • Code cleanup/refactoring
  • CI/CD changes

Checklist

  • Code follows project style (format, clippy)
  • Self-review
  • Existing tests pass; new/updated tests for bag streaming and extract
  • No new warnings
  • Bag fixtures regenerated and validated

Fixes SigV4 signature mismatch when connecting to S3-compatible
services (e.g., MinIO) on non-standard ports. The signer now
includes the port in the Host header when explicitly specified
in the URI, matching what reqwest sends in the actual request.

Previously, uri.host() returned only the hostname without port,
but reqwest sends "host:port" for non-standard ports, causing
signature verification to fail with 403 Forbidden.
@codecov
Copy link

codecov bot commented Feb 8, 2026

@greptile-apps
Copy link

greptile-apps bot commented Feb 8, 2026

Greptile Overview

Greptile Summary

Fixed AWS SigV4 signature mismatch when connecting to S3-compatible services on non-standard ports (e.g., MinIO on port 9000).

Key changes:

  • Modified sign_request() in src/io/s3/signer.rs:34-46 to check for explicit ports using uri.port_u16() and include them in the Host header value when present
  • Matches reqwest's behavior: includes port for non-standard ports (e.g., 127.0.0.1:9000), omits port for standard ports (implicit 443/80)
  • Added two comprehensive tests: test_sign_request_with_non_standard_port verifies port inclusion, test_sign_request_with_standard_port verifies port exclusion

Root cause: The signer was computing signatures using only the hostname (via uri.host()), but reqwest automatically includes the port in the Host header for non-standard ports, causing signature verification to fail on the server side.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The fix is surgical, well-tested, and directly addresses a real production issue with MinIO and other S3-compatible services. The logic correctly handles both standard and non-standard ports, matching reqwest's behavior. Two new tests provide comprehensive coverage of both cases, and all existing tests continue to pass.
  • No files require special attention

Important Files Changed

Filename Overview
src/io/s3/signer.rs Fixed SigV4 signature mismatch by including port in Host header for non-standard ports, with comprehensive test coverage

Sequence Diagram

sequenceDiagram
    participant Client as S3Client
    participant Signer as sign_request()
    participant URI as http::Uri
    participant Reqwest as reqwest::Client
    participant MinIO as S3-compatible Service

    Note over Client,MinIO: Before fix: signature mismatch on non-standard ports

    Client->>URI: Parse URL (http://127.0.0.1:9000/bucket/key)
    Client->>Signer: sign_request(credentials, region, method, uri, headers)
    
    Signer->>URI: uri.host()
    URI-->>Signer: "127.0.0.1" (no port!)
    Note over Signer: OLD: Host = "127.0.0.1"
    Note over Signer: Signature computed with Host: 127.0.0.1
    
    Signer-->>Client: headers with Authorization signature
    Client->>Reqwest: build request with signed headers
    Note over Reqwest: Reqwest auto-sets Host: 127.0.0.1:9000
    Reqwest->>MinIO: Request with Host: 127.0.0.1:9000
    MinIO-->>Reqwest: 403 Forbidden (signature mismatch)
    
    Note over Client,MinIO: After fix: signatures match

    Client->>URI: Parse URL (http://127.0.0.1:9000/bucket/key)
    Client->>Signer: sign_request(credentials, region, method, uri, headers)
    
    Signer->>URI: uri.port_u16()
    URI-->>Signer: Some(9000)
    Signer->>URI: uri.host()
    URI-->>Signer: "127.0.0.1"
    Note over Signer: NEW: Host = "127.0.0.1:9000"
    Note over Signer: Signature computed with Host: 127.0.0.1:9000
    
    Signer-->>Client: headers with Authorization signature
    Client->>Reqwest: build request (excludes Host from signed headers)
    Note over Reqwest: Reqwest auto-sets Host: 127.0.0.1:9000
    Reqwest->>MinIO: Request with Host: 127.0.0.1:9000
    Note over MinIO: Verifies signature with Host: 127.0.0.1:9000 ✓
    MinIO-->>Reqwest: 200 OK
Loading

zhexuany and others added 4 commits February 8, 2026 20:02
Three issues in the SigV4 signer:

1. Missing payload hash: The canonical request had 5 components but
   AWS SigV4 requires 6. The HexEncode(Hash(Payload)) was missing as
   the final component, causing signature mismatches with all S3-
   compatible services.

2. Query string in canonical URI: The URI query string was appended
   to the canonical URI field. Per the spec, the canonical URI must
   contain only the URL-encoded path; the query string belongs in
   the separate canonical query string field.

3. Missing blank line separator: The format string omitted the '\n'
   between canonical headers and signed headers, which (combined
   with canonical headers' trailing '\n') produces the required
   blank line separator per the AWS spec.

Also adds comprehensive tests covering:
- Canonical request structure (6-component validation)
- Query string separation from URI path
- Payload hash presence
- Deterministic signing with fixed timestamps
- MinIO path-style URL signing
- Host:port for non-standard ports

Co-authored-by: Cursor <cursoragent@cursor.com>
The streaming BAG parser was incorrectly classifying OP_CHUNK records
(opcode 0x05) as metadata and skipping them entirely. In ROS1 bag files,
OP_CHUNK records contain the actual compressed/uncompressed message data
(OP_MSG_DATA and OP_CONNECTION records).

This fix adds:
- decompress_chunk(): handles none/lz4/bz2 compression formats
- parse_inner_records(): extracts message data and connection records
  from decompressed chunk payloads
- Updated channel cache rebuilding to detect new connections from chunks
- 22 new tests covering decompression (all formats, error cases),
  inner record parsing, end-to-end chunk processing, incremental
  streaming, and the StreamingParser trait integration
- Fix clippy needless_borrow warning in signer.rs

Co-authored-by: Cursor <cursoragent@cursor.com>
Add integration tests that feed real .bag fixture files through the
streaming parser and verify:
- All 8 fixture bags produce non-zero messages and channels
- Streaming vs non-streaming parser consistency (same connection and
  message counts for robocodec_test_18.bag)
- Small 64-byte chunk streaming produces identical results to 256KB chunks
- Individual fixture validation (test_15, _18, _19, _23)

Co-authored-by: Cursor <cursoragent@cursor.com>
Replace `len() > 0` with `!is_empty()` to satisfy clippy's
len_zero lint, which recommends using is_empty() for
clarity when checking if a collection is non-empty.
@zhexuany zhexuany changed the title fix(s3): include port in Host header for non-standard ports fix: S3 signer, ROS1 bag writer/reader, and extract CLI for Foxglove-compatible bag Feb 8, 2026
- Use RoboReader, RoboWriter, RawMessage directly instead of robocodec::
- Add use statement for RoboReader in cli/mod.rs
- Fix proptest case for negative duration (restrict nanos range)
@zhexuany zhexuany merged commit f57b550 into main Feb 8, 2026
14 checks passed
@zhexuany zhexuany deleted the fix/s3-signer-host-header-port branch February 8, 2026 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BAG] Implement partial message extraction (extract messages --count)

1 participant