Skip to content

Add SWE-bench setup automation script (setup_from_parquet.sh) #3

@rowan-stein

Description

@rowan-stein

Add a reusable Bash script to automate creating forks and per-instance branches from SWE-bench Parquet metadata.\n\nScope\n- Script: scripts/setup_from_parquet.sh (Bash 4+)\n- Reads: SWE-bench_Verified/test-00000-of-00001.parquet (columns: repo, instance_id, environment_setup_commit)\n- For each row: ensures fork in TARGET_ORG and creates branch refs/heads/<instance_id> at <environment_setup_commit>\n\nRequirements\n- Preflight: gh CLI auth with GITHUB_TOKEN; python3; pandas+pyarrow (auto-install unless NO_INSTALL=1)\n- Filters: FILTER_REPO (exact list or regex), FILTER_INSTANCE_ID (regex)\n- Controls: DRY_RUN, CONCURRENCY, SLEEP_SECS, MAX_ERRORS, LOG_JSON, OUTPUT_CSV\n- Idempotency: skip existing forks/branches\n- Retries: transient 5xx, rate limit handling, invalid SHA retry after merge-upstream\n- Logging: human-readable or JSON; summary counts; optional CSV mapping\n- Docs: README section “SWE-bench Setup Automation”; scripts/requirements.txt\n\nAcceptance criteria\n- Script and docs added; runnable on macOS/Linux; idempotent and robust per spec; provides clear summary and optional CSV output.\n

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions