-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Add a reusable Bash script to automate creating forks and per-instance branches from SWE-bench Parquet metadata.\n\nScope\n- Script: scripts/setup_from_parquet.sh (Bash 4+)\n- Reads: SWE-bench_Verified/test-00000-of-00001.parquet (columns: repo, instance_id, environment_setup_commit)\n- For each row: ensures fork in TARGET_ORG and creates branch refs/heads/<instance_id> at <environment_setup_commit>\n\nRequirements\n- Preflight: gh CLI auth with GITHUB_TOKEN; python3; pandas+pyarrow (auto-install unless NO_INSTALL=1)\n- Filters: FILTER_REPO (exact list or regex), FILTER_INSTANCE_ID (regex)\n- Controls: DRY_RUN, CONCURRENCY, SLEEP_SECS, MAX_ERRORS, LOG_JSON, OUTPUT_CSV\n- Idempotency: skip existing forks/branches\n- Retries: transient 5xx, rate limit handling, invalid SHA retry after merge-upstream\n- Logging: human-readable or JSON; summary counts; optional CSV mapping\n- Docs: README section “SWE-bench Setup Automation”; scripts/requirements.txt\n\nAcceptance criteria\n- Script and docs added; runnable on macOS/Linux; idempotent and robust per spec; provides clear summary and optional CSV output.\n