Della gpu support by tanhaow · Pull Request #65 · Princeton-CDH/muse

tanhaow · 2026-04-06T17:53:41Z

Associated Issue(s): resolves #

Changes in this PR

Add device_map="auto" to all from_pretrained calls so HuggingFace models use GPU when available
Add example Slurm script examples/slurm/translate-della-gpu.slurm for running translation jobs on Della
Add docs/della-gpu.md covering Della setup, job submission, and useful commands

Notes

Before adding the GPU support, the job took ~6.5 minutes of CPU time. After adding GPU support, CPU time dropped to ~27 seconds. 🚀 Roughly a 14x speedup.

Reviewer Checklist

Review the code change
Review the example script
Review the markdown doc

Also move model inputs to the correct device before generation in nllb_translate and madlad_translate.

laurejt

Please also document how to run on della with CPUs. I'll review once that information is added.

This reverts commit 164e663.

laurejt

Thanks for documenting this.

Requested changes:

Simplify translate.py code (see comments)
Combine della-cpu.md and della-gpu.md into a single document. Be sure to document the differences between the two types of jobs
Combine the translate-della-cpu.slurm and translate-della-gpu.slurm. For the the non-overlapping parameters, simply comment them out )(i.e., start with ##SBATCH)
Update the slurm script so it uses bash input args
Update the slurm script so it uses the recommended configurations for della
Include more documentation in the md file. It should describe in enough detail how the slurm script is configured by default. For example, mention the time, partition and/or memory allocation (gpu vs. cpu).

laurejt · 2026-04-08T14:30:59Z

src/muse/translation/translate.py

    model_inputs = tokenizer(text, return_tensors="pt")
+    # Move input tensors to the same device as the model
+    model_inputs = {k: v.to(model.device) for k, v in model_inputs.items()}


You shouldn't need to iterate the tensor directly

Suggested change

model_inputs = tokenizer(text, return_tensors="pt")

# Move input tensors to the same device as the model

model_inputs = {k: v.to(model.device) for k, v in model_inputs.items()}

model_inputs = tokenizer(text, return_tensors="pt").to(model.device)

laurejt · 2026-04-08T14:31:44Z

src/muse/translation/translate.py

    model_inputs = tokenizer(f"<2{tgt_lang}> {text}", return_tensors="pt")
+    model_inputs = {k: v.to(model.device) for k, v in model_inputs.items()}


You shouldn't need to iterate the tensor directly

Suggested change

model_inputs = tokenizer(f"<2{tgt_lang}> {text}", return_tensors="pt")

model_inputs = {k: v.to(model.device) for k, v in model_inputs.items()}

model_inputs = tokenizer(f"<2{tgt_lang}> {text}", return_tensors="pt").to(model.device)

laurejt · 2026-04-08T14:37:59Z

examples/slurm/translate-della-cpu.slurm

@@ -0,0 +1,44 @@
+#!/bin/bash
+#SBATCH --job-name=muse-translate        # Job name shown in squeue
+#SBATCH --account=YOUR_ACCOUNT           # Slurm account (faculty collaborator's account)


Make the fake var more descriptive

Suggested change

#SBATCH --account=YOUR_ACCOUNT # Slurm account (faculty collaborator's account)

#SBATCH --account=FACULTY_NETID # Slurm account (faculty collaborator's account)

examples/slurm/translate-della.slurm

laurejt · 2026-04-08T14:41:50Z

examples/slurm/translate-della-gpu.slurm

+#SBATCH --nodes=1                        # Single node job
+#SBATCH --ntasks=1                       # Single task
+#SBATCH --cpus-per-task=1               # CPUs for data loading / tokenization
+#SBATCH --mem-per-cpu=10G               # Memory per CPU (10G is sufficient for 1.8B–4B models)


This should not be included per della's instructions

docs/della-instructions.md

laurejt · 2026-04-08T15:01:10Z

docs/della-instructions.md

+# Update HuggingFace cache (login node only)
+export HF_HOME=/scratch/gpfs/CDHRSE/<netid>/huggingface-cache
+python -c "from transformers import AutoTokenizer, AutoModelForCausalLM; \
+    AutoTokenizer.from_pretrained('tencent/HY-MT1.5-1.8B'); \
+    AutoModelForCausalLM.from_pretrained('tencent/HY-MT1.5-1.8B')"
+```


Seems weird to have it here, since it is not a slurm command. Move this to set-up.

Also, worth mentioning that an easy alternative to this would be to just copy over the cache from your local dev environment

laurejt · 2026-04-08T15:07:43Z

examples/slurm/translate-della-gpu.slurm

+#SBATCH --mail-type=END,FAIL            # Email on job end or failure
+#SBATCH --mail-user=YOUR_NETID@princeton.edu


If you intend to keep this, you should mention that this is happening in the documentation. This is not something I would personally want on by default.

laurejt · 2026-04-08T15:08:34Z

docs/della-gpu.md

+```bash
+cd /scratch/gpfs/CDHRSE/<netid>/muse
+git clone <repo-url> muse && cd muse
+mkdir -p logs


Provide more detail about logging.

- Fix tensor device placement in translate.py: use .to(model.device) on tokenizer output directly instead of iterating the dict - Rewrite della-instructions.md: add scratch storage info, conda env setup, HF cache setup, CPU vs GPU job differences, logging detail - Update slurm scripts: use bash positional args, rename account placeholder, add header comments, remove mem-per-cpu from GPU script, remove mail-type default

- Merge translate-della-cpu.slurm and translate-della-gpu.slurm into a single translate-della.slurm; GPU-only lines are commented out with ##SBATCH - Update della-instructions.md to reference the combined script and document default configuration (time, memory, CPU vs GPU differences)

laurejt

This is looking pretty good. This should only need one more round of changes:

Requested change:

Add the slurm directive --partition=mig to translate-ella.slurm
For della-instructions.md:
- Update the "Clone the repo" section to better document the logging directory creation (and hard-coding within the slurm script)
- Expand the documentation for the conda environment and our current workaround. See comments.
- Update the "Set up the HuggingFace cache" section so that it does not recommend building the HuggingFace cache by loading the tokenizers and models on the head node.
- Update the "Running on GPU" so that it directs the reader to the Della GPU Jobs documentation rather than the not-quite-right slurm directive.

laurejt · 2026-04-08T16:37:45Z

examples/slurm/translate-della.slurm

+# ---------------------------------------------------------------------------
+cd ${REPO}
+
+uv run src/muse/translation/translate_corpus.py $1 $2 $3


This works for now, but if we intend to use this. It's useful to add some argument validation.

examples/slurm/translate-della.slurm

laurejt · 2026-04-08T16:41:47Z

examples/slurm/translate-della.slurm

Need to add the following GPU configuration setting:

#SBATCH --partition=mig

laurejt · 2026-04-08T16:42:57Z

docs/della-instructions.md

+## Prerequisites
+
+- A Princeton HPC account with access to Della — request access through the [Research Computing portal](https://researchcomputing.princeton.edu/get-started/request-account)
+- Membership in the `CDHRSE` group to access `/scratch/gpfs/CDHRSE/` — ask a current CDH RSE to add you


@rlskoeser Is this how the CDHRSE group works?

laurejt · 2026-04-08T16:44:31Z

docs/della-instructions.md

+
+## Setup
+
+### Clone the repo


This is more accurately setting up the muse working directory. Rename accordingly.

docs/della-instructions.md

laurejt · 2026-04-08T16:52:29Z

examples/slurm/translate-della.slurm

+#SBATCH --output=logs/%x_%j.out
+#SBATCH --error=logs/%x_%j.err


I think it's worth documenting that this is choice is being hard-coded. I think it's fine to do that, but worth mentioning, since the logs could be saved anywhere.

docs/della-instructions.md

laurejt · 2026-04-08T17:05:06Z

docs/della-instructions.md

+1. Uncomment `##SBATCH --gres=gpu:1` in the script
+2. Remove or comment out `--mem-per-cpu` — GPU memory allocation is pre-defined by the partition and cannot be set manually
+
+By default, `--gres=gpu:1` allocates a MIG slice of an A100 (the `mig` partition). If you need a full A100, also add `--partition=gpu`.


Our job should include the --partition=mig parameter and should state this.

Replace the second sentence with a pointer to the Della GPU Jobs section. Other slurm directives should be used if wanting to access a GPU in the "gpu" partition. I was mistaken about the directive always being --partition in other cases it is --constraint.

It's worth reading through this documentation if you haven't.

@rlskoeser

- Add --partition=mig as commented-out GPU option in slurm script - Add argument validation to slurm script - Add comments explaining module load and conda env activation - Add comment noting logs location is hard-coded - Rename Setup section to 'Set up the muse working directory' - Separate git clone and mkdir into explicit steps in doc - Expand conda env explanation: why conda instead of uv, workaround context - Remove scp cache copy option (non-trivial, loads model/tokenizer) - Flag CDHRSE group access detail for @rlskoeser to confirm

laurejt

🚀

tanhaow added 3 commits April 6, 2026 13:31

Add device_map="auto" to enable GPU inference for HuggingFace models

f9ca907

Also move model inputs to the correct device before generation in nllb_translate and madlad_translate.

Add Della GPU job documentation and example Slurm script

db0065f

Update translate-della-gpu.slurm

73dbfc1

tanhaow self-assigned this Apr 6, 2026

add comments

533610d

tanhaow requested a review from laurejt April 6, 2026 18:03

laurejt reviewed Apr 8, 2026

View reviewed changes

tanhaow added 2 commits April 8, 2026 09:34

Add cpu slurm job doc and sample script

3a9426a

Add HF_HUB_CACHE path in gpu slurm job script

aa5c1be

tanhaow requested a review from laurejt April 8, 2026 13:44

tanhaow added 4 commits April 8, 2026 10:55

combine cpu and gpu instructions into one doc

164e663

Revert "combine cpu and gpu instructions into one doc"

90b539c

This reverts commit 164e663.

Combine CPU and GPU Della instructions into one doc

5e1f411

Rename della.md to della-instructions.md

5045e03

laurejt requested changes Apr 8, 2026

View reviewed changes

tanhaow added 4 commits April 8, 2026 11:28

remove the optional verbose flag

6d58f83

ruff format

d74e1e7

tanhaow requested a review from laurejt April 8, 2026 16:13

laurejt requested changes Apr 8, 2026

View reviewed changes

tanhaow added 2 commits April 8, 2026 13:22

Update della-instructions.md

6dce0b3

tanhaow requested a review from laurejt April 8, 2026 17:59

laurejt approved these changes Apr 8, 2026

View reviewed changes

tanhaow merged commit 9154a0b into develop Apr 8, 2026
1 check passed

tanhaow deleted the feature/della-gpu-support branch April 8, 2026 19:06

		model_inputs = tokenizer(f"<2{tgt_lang}> {text}", return_tensors="pt")
		model_inputs = {k: v.to(model.device) for k, v in model_inputs.items()}

	#SBATCH --account=YOUR_ACCOUNT # Slurm account (faculty collaborator's account)
	#SBATCH --account=FACULTY_NETID # Slurm account (faculty collaborator's account)

		#SBATCH --mail-type=END,FAIL # Email on job end or failure
		#SBATCH --mail-user=YOUR_NETID@princeton.edu

		#SBATCH --output=logs/%x_%j.out
		#SBATCH --error=logs/%x_%j.err

Conversation

tanhaow commented Apr 6, 2026 • edited by laurejt Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes in this PR

Notes

Reviewer Checklist

Uh oh!

laurejt left a comment

Choose a reason for hiding this comment

Uh oh!

laurejt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

laurejt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

laurejt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tanhaow commented Apr 6, 2026 •

edited by laurejt

Loading