remote-run

Lightweight CLI + job-runner for dispatching compute jobs to remote Linux workstations.

Zero dependencies — Python 3.8+ stdlib only.

Features

Code sync: rsync your project to the remote machine
Job dispatch: Submit Python scripts with GPU/CPU queue separation
Live logs: Stream job output in real-time (-f flag)
Progress tracking: Scripts can print [progress X/Y] for ETA estimation
Output retrieval: Pull results back to your local machine
Duplicate detection: Warns if same job is already running
Orphan detection: Kills zombie processes (OOM-killed children)
Retry logic: Configurable max retries on failure
Multi-project: Config-per-project via .remote-run.yaml

Quick Start

Server Setup (remote machine)

# Copy job-runner.py to your compute server
scp job-runner.py user@server:~/

# Start the job runner (port 9810 by default)
ssh user@server 'python3 job-runner.py'

# Or with auth token:
ssh user@server 'AUTH_TOKEN=mysecret python3 job-runner.py'

Client Setup (your laptop)

Install the CLI:

# Option 1: Copy to PATH
cp remote-run /usr/local/bin/
chmod +x /usr/local/bin/remote-run

# Option 2: Symlink
ln -s $(pwd)/remote-run /usr/local/bin/remote-run

Create global config:

mkdir -p ~/.config/remote-run
cat > ~/.config/remote-run/config.yaml << 'EOF'
host: my-server          # SSH host (from ~/.ssh/config)
api_url: http://my-server:9810  # job-runner API URL
EOF

Create project config:

cat > .remote-run.yaml << 'EOF'
project_name: my-ml-project
remote_path: /home/user/my-ml-project
# venv: /home/user/my-venv  # optional
gpu: false
timeout: 3600
excludes:
  - output/
  - data/
  - "*.h5"
EOF

Usage

remote-run sync                    # Sync code to server
remote-run run train.py --gpu -f   # Submit GPU job, follow logs
remote-run run eval.py             # Submit CPU job
remote-run jobs                    # List jobs (current project)
remote-run jobs --all              # List all projects
remote-run log <job_id> -f         # Stream logs
remote-run pull                    # Pull output/ back
remote-run cancel <job_id>         # Cancel a job
remote-run gpu                     # GPU status
remote-run status                  # System overview
remote-run info                    # Show resolved config

Configuration

Global Config (`~/.config/remote-run/config.yaml`)

host: my-server          # SSH host alias (required)
api_url: http://my-server:9810
# default_venv: /path/to/venv   # optional default Python venv

Project Config (`.remote-run.yaml`)

Place in your project root. The CLI walks up from CWD to find it.

project_name: my-project
remote_path: /home/user/my-project
# venv: /path/to/venv    # overrides global default_venv
gpu: false                # default queue (true=GPU serial, false=CPU parallel)
timeout: 3600             # default timeout in seconds
excludes:                 # additional rsync excludes
  - output/
  - data/
  - "*.h5"

Job Runner Config (environment variables)

Variable	Default	Description
`PORT`	`9810`	HTTP port
`AUTH_TOKEN`	(none)	Bearer token for API auth
`DATA_DIR`	`~/remote-run`	Data directory (DB + logs)
`MAX_CPU_WORKERS`	`cores/4`	Parallel CPU job slots

Progress Protocol

Scripts can report progress by printing to stdout:

for i, batch in enumerate(batches):
    train(batch)
    print(f"[progress {i+1}/{len(batches)}]")

The job runner parses these and calculates ETA, visible via remote-run log <id>.

For multiprocessing scripts, use pool.imap_unordered() instead of pool.map() to get incremental progress:

from multiprocessing import Pool

with Pool(N) as pool:
    for i, result in enumerate(pool.imap_unordered(fn, items)):
        print(f"[progress {i+1}/{len(items)}]")

Architecture

┌─────────────┐         rsync          ┌──────────────────┐
│  Your       │ ─────────────────────→ │  Remote Server   │
│  Laptop     │                        │                  │
│             │    HTTP API (:9810)     │  job-runner.py   │
│  remote-run │ ←────────────────────→ │  ├─ GPU queue    │
│  (CLI)      │                        │  └─ CPU queue    │
└─────────────┘         rsync          └──────────────────┘
                  ←─────────────────
                    (pull output/)

API Reference

The job runner exposes a simple HTTP API:

Method	Endpoint	Description
`GET`	`/health`	Health check
`GET`	`/jobs`	List jobs (`?status=running&limit=50`)
`POST`	`/jobs`	Submit a job (JSON body)
`GET`	`/jobs/<id>`	Job detail
`GET`	`/jobs/<id>/log`	Job log (`?tail=200` or `?offset=0&limit=100`)
`POST`	`/jobs/<id>/cancel`	Cancel a job
`POST`	`/jobs/cleanup`	Delete old logs (`?days=7`)
`GET`	`/gpu`	GPU status (nvidia-smi)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
job-runner.py		job-runner.py
remote-run		remote-run

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

remote-run

Features

Quick Start

Server Setup (remote machine)

Client Setup (your laptop)

Usage

Configuration

Global Config (`~/.config/remote-run/config.yaml`)

Project Config (`.remote-run.yaml`)

Job Runner Config (environment variables)

Progress Protocol

Architecture

API Reference

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

remote-run

Features

Quick Start

Server Setup (remote machine)

Client Setup (your laptop)

Usage

Configuration

Global Config (~/.config/remote-run/config.yaml)

Project Config (.remote-run.yaml)

Job Runner Config (environment variables)

Progress Protocol

Architecture

API Reference

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Global Config (`~/.config/remote-run/config.yaml`)

Project Config (`.remote-run.yaml`)

Packages