Lightweight CLI + job-runner for dispatching compute jobs to remote Linux workstations.
Zero dependencies — Python 3.8+ stdlib only.
- Code sync: rsync your project to the remote machine
- Job dispatch: Submit Python scripts with GPU/CPU queue separation
- Live logs: Stream job output in real-time (
-fflag) - Progress tracking: Scripts can print
[progress X/Y]for ETA estimation - Output retrieval: Pull results back to your local machine
- Duplicate detection: Warns if same job is already running
- Orphan detection: Kills zombie processes (OOM-killed children)
- Retry logic: Configurable max retries on failure
- Multi-project: Config-per-project via
.remote-run.yaml
# Copy job-runner.py to your compute server
scp job-runner.py user@server:~/
# Start the job runner (port 9810 by default)
ssh user@server 'python3 job-runner.py'
# Or with auth token:
ssh user@server 'AUTH_TOKEN=mysecret python3 job-runner.py'- Install the CLI:
# Option 1: Copy to PATH
cp remote-run /usr/local/bin/
chmod +x /usr/local/bin/remote-run
# Option 2: Symlink
ln -s $(pwd)/remote-run /usr/local/bin/remote-run- Create global config:
mkdir -p ~/.config/remote-run
cat > ~/.config/remote-run/config.yaml << 'EOF'
host: my-server # SSH host (from ~/.ssh/config)
api_url: http://my-server:9810 # job-runner API URL
EOF- Create project config:
cat > .remote-run.yaml << 'EOF'
project_name: my-ml-project
remote_path: /home/user/my-ml-project
# venv: /home/user/my-venv # optional
gpu: false
timeout: 3600
excludes:
- output/
- data/
- "*.h5"
EOFremote-run sync # Sync code to server
remote-run run train.py --gpu -f # Submit GPU job, follow logs
remote-run run eval.py # Submit CPU job
remote-run jobs # List jobs (current project)
remote-run jobs --all # List all projects
remote-run log <job_id> -f # Stream logs
remote-run pull # Pull output/ back
remote-run cancel <job_id> # Cancel a job
remote-run gpu # GPU status
remote-run status # System overview
remote-run info # Show resolved confighost: my-server # SSH host alias (required)
api_url: http://my-server:9810
# default_venv: /path/to/venv # optional default Python venvPlace in your project root. The CLI walks up from CWD to find it.
project_name: my-project
remote_path: /home/user/my-project
# venv: /path/to/venv # overrides global default_venv
gpu: false # default queue (true=GPU serial, false=CPU parallel)
timeout: 3600 # default timeout in seconds
excludes: # additional rsync excludes
- output/
- data/
- "*.h5"| Variable | Default | Description |
|---|---|---|
PORT |
9810 |
HTTP port |
AUTH_TOKEN |
(none) | Bearer token for API auth |
DATA_DIR |
~/remote-run |
Data directory (DB + logs) |
MAX_CPU_WORKERS |
cores/4 |
Parallel CPU job slots |
Scripts can report progress by printing to stdout:
for i, batch in enumerate(batches):
train(batch)
print(f"[progress {i+1}/{len(batches)}]")The job runner parses these and calculates ETA, visible via remote-run log <id>.
For multiprocessing scripts, use pool.imap_unordered() instead of pool.map() to get incremental progress:
from multiprocessing import Pool
with Pool(N) as pool:
for i, result in enumerate(pool.imap_unordered(fn, items)):
print(f"[progress {i+1}/{len(items)}]")┌─────────────┐ rsync ┌──────────────────┐
│ Your │ ─────────────────────→ │ Remote Server │
│ Laptop │ │ │
│ │ HTTP API (:9810) │ job-runner.py │
│ remote-run │ ←────────────────────→ │ ├─ GPU queue │
│ (CLI) │ │ └─ CPU queue │
└─────────────┘ rsync └──────────────────┘
←─────────────────
(pull output/)
The job runner exposes a simple HTTP API:
| Method | Endpoint | Description |
|---|---|---|
GET |
/health |
Health check |
GET |
/jobs |
List jobs (?status=running&limit=50) |
POST |
/jobs |
Submit a job (JSON body) |
GET |
/jobs/<id> |
Job detail |
GET |
/jobs/<id>/log |
Job log (?tail=200 or ?offset=0&limit=100) |
POST |
/jobs/<id>/cancel |
Cancel a job |
POST |
/jobs/cleanup |
Delete old logs (?days=7) |
GET |
/gpu |
GPU status (nvidia-smi) |
MIT