Skip to content

E2E integration test reliability and worker logging gaps #135

@mihow

Description

@mihow

Context

Found during E2E testing of RolnickLab/antenna#1197 + #134.

Issues

1. psv2_integration_test.sh populate step fails intermittently

The api_post_empty call to /captures/collections/{id}/populate/ sometimes fails with curl -sf (exit code 22), even though the endpoint returns 200 when called manually moments later. The set -euo pipefail causes the entire script to abort.

Likely cause: race condition between collection creation and populate, or a transient connection issue with curl -sf being too strict.

Suggested fix: add a short retry loop around the populate call, or replace curl -sf with a function that retries on transient failures.

2. Worker "Done" summary lines missing from log files

When the worker spawns per-GPU subprocesses (Found 2 GPUs, spawning one AMI worker instance per GPU), the batch completion summary lines (Done, detections: N. Detecting time: ...) only appear in the subprocess that processed the batch. When redirecting to a log file, these lines sometimes don't appear because the parent process's log stream captures only its own output.

The psv2_integration_test.sh script greps for Done, detections: in the worker log to show timing, but this is unreliable with multi-GPU workers.

3. Worker log analysis section in test script sometimes skipped

The integration test script uses set -euo pipefail. If grep finds no matches (e.g., no errors in logs), it returns exit code 1, which causes the script to exit before printing the final PASS/FAIL verdict. This makes clean runs report as failures.

4. POST URLs require trailing slash — easy to miss

Django's APPEND_SLASH can silently redirect GET requests but returns a 500 for POST requests without a trailing slash. This caused the first E2E test failure. The error message is clear in the Django logs but the worker only sees a generic 500.

Suggestion: document this in the ADC's CLAUDE.md or add a note in the Antenna API docs. Alternatively, the ADC HTTP client could normalize URLs to always include a trailing slash.

Affected files

  • scripts/psv2_integration_test.sh (Antenna repo)
  • trapdata/antenna/datasets.py (trailing slash)
  • Worker subprocess logging infrastructure

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions