Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
129 changes: 129 additions & 0 deletions docs/della-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# Running MuSE Translation Jobs on Della

For general Della documentation, see the [Princeton Research Computing Della page](https://researchcomputing.princeton.edu/systems/della).

## Prerequisites

- A Princeton HPC account with access to Della — request access through the [Research Computing portal](https://researchcomputing.princeton.edu/get-started/request-account)
- Membership in the `CDHRSE` group to access `/scratch/gpfs/CDHRSE/`
- The faculty collaborator's netid to use as the Slurm `--account`

## Scratch Storage

CDH RSE files live at `/scratch/gpfs/CDHRSE/<netid>/`. A few things to know about this space:

- It is **not backed up** — do not store anything you cannot reproduce
- Unlike `/tmp` and other scratch areas, `/scratch/gpfs` is **not purged** on a schedule, so files persist across sessions
- Large model files and corpora should live here rather than in your home directory, which has a much smaller quota

## Setup

### Set up the muse working directory

Clone the repo into your scratch space:

```bash
cd /scratch/gpfs/CDHRSE/<netid>
git clone <repo-url> muse
```

Create the logs directory that the Slurm script writes to:

```bash
cd muse
mkdir -p logs
```

### Create the conda environment

Della's module system does not include `uv`, so we use `conda` as a thin wrapper solely to make `uv` available. The actual Python environment and dependencies are managed by `uv`. Create the environment once on a login node:

```bash
module purge
module load anaconda3/2025.12
conda create -n muse python=3.12 -y
conda activate muse
pip install uv
uv sync
```

Note: we tried using a project-specific conda environment with all dependencies managed by conda, but ran into compatibility issues. The current approach — a minimal conda env that installs `uv`, which then manages everything else — is the workaround.

### Set up the HuggingFace cache

Compute nodes have no internet access, so models must be cached on a login node before submitting jobs. The cache should live in scratch:

```bash
export HF_HOME=/scratch/gpfs/CDHRSE/<netid>/huggingface-cache
export HF_HUB_CACHE=/scratch/gpfs/CDHRSE/<netid>/huggingface-cache/hub
```

To populate the cache, either download models directly on a login node:

```bash
python -c "
from transformers import AutoTokenizer, AutoModelForCausalLM
AutoTokenizer.from_pretrained('tencent/HY-MT1.5-1.8B')
AutoModelForCausalLM.from_pretrained('tencent/HY-MT1.5-1.8B')
"
```

## Submitting a Job

The example script is at `examples/slurm/translate-della.slurm`. It accepts three positional arguments:

```bash
sbatch examples/slurm/translate-della.slurm <model> <input> <output>
```

For example:

```bash
sbatch examples/slurm/translate-della.slurm hymt input.jsonl output.jsonl
```

Before submitting, update the two placeholder variables at the top of the script:

- `FACULTY_NETID` — the Slurm account (faculty collaborator's netid), used for `--account`
- `YOUR_NETID` — your Princeton netid, used to construct scratch paths

### Script configuration

The script is configured for a **CPU job** by default:

- `--cpus-per-task=1` — single CPU; the translation models are not parallelised across CPUs
- `--mem-per-cpu=10G` — 10G is sufficient for the 1.8B–4B parameter models
- `--time=00:15:00` — 15-minute wall time limit; increase this for large corpora

### Running on GPU

GPU jobs run ~14x faster than CPU for the 1.8B–4B models. To switch to GPU:

1. Uncomment `##SBATCH --gres=gpu:1` in the script
2. Remove or comment out `--mem-per-cpu` — GPU memory allocation is pre-defined by the partition and cannot be set manually

By default, `--gres=gpu:1` allocates a MIG slice of an A100 (the `mig` partition). If you need a full A100, also add `--partition=gpu`.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our job should include the --partition=mig parameter and should state this.

Replace the second sentence with a pointer to the Della GPU Jobs section. Other slurm directives should be used if wanting to access a GPU in the "gpu" partition. I was mistaken about the directive always being --partition in other cases it is --constraint.

It's worth reading through this documentation if you haven't.


All HuggingFace models are loaded with `device_map="auto"`, so they use the GPU automatically when a GPU is allocated — no code changes needed.

## Logs

Job stdout and stderr are written to `logs/` in the repo directory, named `<job-name>_<job-id>.out` and `.err`. Check them after a job completes:

```bash
cat logs/muse-translate_<jobid>.out
cat logs/muse-translate_<jobid>.err
```

## Useful Commands

```bash
# Check job status
squeue -u <netid>

# Check efficiency after job completes
jobstats <jobid>

# Pull latest code and sync dependencies (login node)
git pull && uv sync
```
59 changes: 59 additions & 0 deletions examples/slurm/translate-della.slurm
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add the following GPU configuration setting:

#SBATCH --partition=mig

Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
#!/bin/bash
# MuSE translation job (Della)
#
# Usage: sbatch translate-della.slurm <model> <input> <output>
# model — model identifier: hymt | madlad | nllb | gemma
# input — path to input JSONL file
# output — path to write translation output JSONL
#
# Placeholder variables to update before submitting:
# FACULTY_NETID — Slurm account (faculty collaborator's netid)
# YOUR_NETID — your Princeton netid
#
# To run on GPU, uncomment the ##SBATCH lines below and remove --mem-per-cpu.
#
#SBATCH --job-name=muse-translate
#SBATCH --account=FACULTY_NETID
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=10G # CPU only — remove this line for GPU jobs
##SBATCH --gres=gpu:1 # GPU only — uncomment to request a GPU
##SBATCH --partition=mig # GPU only — default MIG slice; use 'gpu' for a full A100
#SBATCH --time=00:15:00 # Wall time limit — increase for large corpora
#SBATCH --output=logs/%x_%j.out # Logs are written to logs/ relative to the working directory
#SBATCH --error=logs/%x_%j.err

# ---------------------------------------------------------------------------
# Environment
# ---------------------------------------------------------------------------
NETID=YOUR_NETID
REPO=/scratch/gpfs/CDHRSE/${NETID}/muse

module purge
# uv is not available as a module, so we load anaconda to get conda
module load anaconda3/2025.12
# Activate the muse conda environment (contains uv; see della-instructions.md for setup)
conda activate muse

export HF_HOME=/scratch/gpfs/CDHRSE/${NETID}/huggingface-cache
export HF_HUB_CACHE=/scratch/gpfs/CDHRSE/${NETID}/huggingface-cache/hub
export HF_HUB_OFFLINE=1

# ---------------------------------------------------------------------------
# Validate arguments
# ---------------------------------------------------------------------------
if [ -z "$1" ] || [ -z "$2" ] || [ -z "$3" ]; then
echo "Usage: sbatch translate-della.slurm <model> <input> <output>"
echo " model — hymt | madlad | nllb | gemma"
echo " input — path to input JSONL file"
echo " output — path to write translation output JSONL"
exit 1
fi

# ---------------------------------------------------------------------------
# Run translation
# ---------------------------------------------------------------------------
cd ${REPO}

uv run src/muse/translation/translate_corpus.py $1 $2 $3
21 changes: 15 additions & 6 deletions src/muse/translation/translate.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,10 @@ def hymt_translate(
start = timer()
LOADED_MODEL["model_name"] = model_name
LOADED_MODEL["tokenizer"] = AutoTokenizer.from_pretrained(model_name)
LOADED_MODEL["model"] = AutoModelForCausalLM.from_pretrained(model_name)
# device_map="auto" places the model on GPU if available, CPU otherwise
LOADED_MODEL["model"] = AutoModelForCausalLM.from_pretrained(
model_name, device_map="auto"
)
if verbose:
print(f"Loaded tokenizer & model in {timer() - start:.0f} seconds")
tokenizer = LOADED_MODEL["tokenizer"]
Expand Down Expand Up @@ -155,7 +158,9 @@ def nllb_translate(
start = timer()
LOADED_MODEL["model_name"] = model_name
LOADED_MODEL["tokenizer"] = AutoTokenizer.from_pretrained(model_name)
LOADED_MODEL["model"] = AutoModelForSeq2SeqLM.from_pretrained(model_name)
LOADED_MODEL["model"] = AutoModelForSeq2SeqLM.from_pretrained(
model_name, device_map="auto"
)
if verbose:
print(f"Loaded tokenizer & model in {timer() - start:.0f} seconds")
tokenizer = LOADED_MODEL["tokenizer"]
Expand All @@ -164,7 +169,7 @@ def nllb_translate(
# Generate model input
## Set source language for proper tokenization
tokenizer.src_lang = nllb_lang_idx[src_lang]
model_inputs = tokenizer(text, return_tensors="pt")
model_inputs = tokenizer(text, return_tensors="pt").to(model.device)
input_len = model_inputs["input_ids"][0].size()[0]
if verbose:
print(f"Input length: {input_len} tokens")
Expand Down Expand Up @@ -216,14 +221,18 @@ def madlad_translate(
start = timer()
LOADED_MODEL["model_name"] = model_name
LOADED_MODEL["tokenizer"] = AutoTokenizer.from_pretrained(model_name)
LOADED_MODEL["model"] = AutoModelForSeq2SeqLM.from_pretrained(model_name)
LOADED_MODEL["model"] = AutoModelForSeq2SeqLM.from_pretrained(
model_name, device_map="auto"
)
if verbose:
print(f"Loaded tokenizer & model in {timer() - start:.0f} seconds")
tokenizer = LOADED_MODEL["tokenizer"]
model = LOADED_MODEL["model"]

# Generate model input
model_inputs = tokenizer(f"<2{tgt_lang}> {text}", return_tensors="pt")
model_inputs = tokenizer(f"<2{tgt_lang}> {text}", return_tensors="pt").to(
model.device
)
input_len = model_inputs["input_ids"][0].size()[0]
if verbose:
print(f"Input length: {input_len} tokens")
Expand Down Expand Up @@ -272,7 +281,7 @@ def gemma_translate(
LOADED_MODEL["model_name"] = model_name
LOADED_MODEL["tokenizer"] = AutoTokenizer.from_pretrained(model_name)
LOADED_MODEL["model"] = AutoModelForImageTextToText.from_pretrained(
model_name
model_name, device_map="auto"
)
except Exception as e:
# Check if error is related to authentication
Expand Down
Loading