Skip to content

lagameon/remote-run

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

remote-run

Lightweight CLI + job-runner for dispatching compute jobs to remote Linux workstations.

Zero dependencies — Python 3.8+ stdlib only.

Features

  • Code sync: rsync your project to the remote machine
  • Job dispatch: Submit Python scripts with GPU/CPU queue separation
  • Live logs: Stream job output in real-time (-f flag)
  • Progress tracking: Scripts can print [progress X/Y] for ETA estimation
  • Output retrieval: Pull results back to your local machine
  • Duplicate detection: Warns if same job is already running
  • Orphan detection: Kills zombie processes (OOM-killed children)
  • Retry logic: Configurable max retries on failure
  • Multi-project: Config-per-project via .remote-run.yaml

Quick Start

Server Setup (remote machine)

# Copy job-runner.py to your compute server
scp job-runner.py user@server:~/

# Start the job runner (port 9810 by default)
ssh user@server 'python3 job-runner.py'

# Or with auth token:
ssh user@server 'AUTH_TOKEN=mysecret python3 job-runner.py'

Client Setup (your laptop)

  1. Install the CLI:
# Option 1: Copy to PATH
cp remote-run /usr/local/bin/
chmod +x /usr/local/bin/remote-run

# Option 2: Symlink
ln -s $(pwd)/remote-run /usr/local/bin/remote-run
  1. Create global config:
mkdir -p ~/.config/remote-run
cat > ~/.config/remote-run/config.yaml << 'EOF'
host: my-server          # SSH host (from ~/.ssh/config)
api_url: http://my-server:9810  # job-runner API URL
EOF
  1. Create project config:
cat > .remote-run.yaml << 'EOF'
project_name: my-ml-project
remote_path: /home/user/my-ml-project
# venv: /home/user/my-venv  # optional
gpu: false
timeout: 3600
excludes:
  - output/
  - data/
  - "*.h5"
EOF

Usage

remote-run sync                    # Sync code to server
remote-run run train.py --gpu -f   # Submit GPU job, follow logs
remote-run run eval.py             # Submit CPU job
remote-run jobs                    # List jobs (current project)
remote-run jobs --all              # List all projects
remote-run log <job_id> -f         # Stream logs
remote-run pull                    # Pull output/ back
remote-run cancel <job_id>         # Cancel a job
remote-run gpu                     # GPU status
remote-run status                  # System overview
remote-run info                    # Show resolved config

Configuration

Global Config (~/.config/remote-run/config.yaml)

host: my-server          # SSH host alias (required)
api_url: http://my-server:9810
# default_venv: /path/to/venv   # optional default Python venv

Project Config (.remote-run.yaml)

Place in your project root. The CLI walks up from CWD to find it.

project_name: my-project
remote_path: /home/user/my-project
# venv: /path/to/venv    # overrides global default_venv
gpu: false                # default queue (true=GPU serial, false=CPU parallel)
timeout: 3600             # default timeout in seconds
excludes:                 # additional rsync excludes
  - output/
  - data/
  - "*.h5"

Job Runner Config (environment variables)

Variable Default Description
PORT 9810 HTTP port
AUTH_TOKEN (none) Bearer token for API auth
DATA_DIR ~/remote-run Data directory (DB + logs)
MAX_CPU_WORKERS cores/4 Parallel CPU job slots

Progress Protocol

Scripts can report progress by printing to stdout:

for i, batch in enumerate(batches):
    train(batch)
    print(f"[progress {i+1}/{len(batches)}]")

The job runner parses these and calculates ETA, visible via remote-run log <id>.

For multiprocessing scripts, use pool.imap_unordered() instead of pool.map() to get incremental progress:

from multiprocessing import Pool

with Pool(N) as pool:
    for i, result in enumerate(pool.imap_unordered(fn, items)):
        print(f"[progress {i+1}/{len(items)}]")

Architecture

┌─────────────┐         rsync          ┌──────────────────┐
│  Your       │ ─────────────────────→ │  Remote Server   │
│  Laptop     │                        │                  │
│             │    HTTP API (:9810)     │  job-runner.py   │
│  remote-run │ ←────────────────────→ │  ├─ GPU queue    │
│  (CLI)      │                        │  └─ CPU queue    │
└─────────────┘         rsync          └──────────────────┘
                  ←─────────────────
                    (pull output/)

API Reference

The job runner exposes a simple HTTP API:

Method Endpoint Description
GET /health Health check
GET /jobs List jobs (?status=running&limit=50)
POST /jobs Submit a job (JSON body)
GET /jobs/<id> Job detail
GET /jobs/<id>/log Job log (?tail=200 or ?offset=0&limit=100)
POST /jobs/<id>/cancel Cancel a job
POST /jobs/cleanup Delete old logs (?days=7)
GET /gpu GPU status (nvidia-smi)

License

MIT

About

Lightweight CLI + job runner for dispatching compute jobs to remote Linux workstations. Zero dependencies.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages