Skip to content

Della gpu support#65

Merged
tanhaow merged 16 commits intodevelopfrom
feature/della-gpu-support
Apr 8, 2026
Merged

Della gpu support#65
tanhaow merged 16 commits intodevelopfrom
feature/della-gpu-support

Conversation

@tanhaow
Copy link
Copy Markdown

@tanhaow tanhaow commented Apr 6, 2026

Associated Issue(s): resolves #

Changes in this PR

  • Add device_map="auto" to all from_pretrained calls so HuggingFace models use GPU when available
  • Add example Slurm script examples/slurm/translate-della-gpu.slurm for running translation jobs on Della
  • Add docs/della-gpu.md covering Della setup, job submission, and useful commands

Notes

Before adding the GPU support, the job took ~6.5 minutes of CPU time. After adding GPU support, CPU time dropped to ~27 seconds. 🚀 Roughly a 14x speedup.

Reviewer Checklist

  • Review the code change
  • Review the example script
  • Review the markdown doc

@tanhaow tanhaow self-assigned this Apr 6, 2026
@tanhaow tanhaow requested a review from laurejt April 6, 2026 18:03
Copy link
Copy Markdown

@laurejt laurejt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also document how to run on della with CPUs. I'll review once that information is added.

@tanhaow tanhaow requested a review from laurejt April 8, 2026 13:44
Copy link
Copy Markdown

@laurejt laurejt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for documenting this.

Requested changes:

  • Simplify translate.py code (see comments)
  • Combine della-cpu.md and della-gpu.md into a single document. Be sure to document the differences between the two types of jobs
  • Combine the translate-della-cpu.slurm and translate-della-gpu.slurm. For the the non-overlapping parameters, simply comment them out )(i.e., start with ##SBATCH)
  • Update the slurm script so it uses bash input args
  • Update the slurm script so it uses the recommended configurations for della
  • Include more documentation in the md file. It should describe in enough detail how the slurm script is configured by default. For example, mention the time, partition and/or memory allocation (gpu vs. cpu).

Comment on lines +172 to +174
model_inputs = tokenizer(text, return_tensors="pt")
# Move input tensors to the same device as the model
model_inputs = {k: v.to(model.device) for k, v in model_inputs.items()}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need to iterate the tensor directly

Suggested change
model_inputs = tokenizer(text, return_tensors="pt")
# Move input tensors to the same device as the model
model_inputs = {k: v.to(model.device) for k, v in model_inputs.items()}
model_inputs = tokenizer(text, return_tensors="pt").to(model.device)

Comment on lines +235 to +236
model_inputs = tokenizer(f"<2{tgt_lang}> {text}", return_tensors="pt")
model_inputs = {k: v.to(model.device) for k, v in model_inputs.items()}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need to iterate the tensor directly

Suggested change
model_inputs = tokenizer(f"<2{tgt_lang}> {text}", return_tensors="pt")
model_inputs = {k: v.to(model.device) for k, v in model_inputs.items()}
model_inputs = tokenizer(f"<2{tgt_lang}> {text}", return_tensors="pt").to(model.device)

@@ -0,0 +1,44 @@
#!/bin/bash
#SBATCH --job-name=muse-translate # Job name shown in squeue
#SBATCH --account=YOUR_ACCOUNT # Slurm account (faculty collaborator's account)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the fake var more descriptive

Suggested change
#SBATCH --account=YOUR_ACCOUNT # Slurm account (faculty collaborator's account)
#SBATCH --account=FACULTY_NETID # Slurm account (faculty collaborator's account)

#SBATCH --nodes=1 # Single node job
#SBATCH --ntasks=1 # Single task
#SBATCH --cpus-per-task=1 # CPUs for data loading / tokenization
#SBATCH --mem-per-cpu=10G # Memory per CPU (10G is sufficient for 1.8B–4B models)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be included per della's instructions

Comment on lines +45 to +50
# Update HuggingFace cache (login node only)
export HF_HOME=/scratch/gpfs/CDHRSE/<netid>/huggingface-cache
python -c "from transformers import AutoTokenizer, AutoModelForCausalLM; \
AutoTokenizer.from_pretrained('tencent/HY-MT1.5-1.8B'); \
AutoModelForCausalLM.from_pretrained('tencent/HY-MT1.5-1.8B')"
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems weird to have it here, since it is not a slurm command. Move this to set-up.

Also, worth mentioning that an easy alternative to this would be to just copy over the cache from your local dev environment

Comment on lines +12 to +13
#SBATCH --mail-type=END,FAIL # Email on job end or failure
#SBATCH --mail-user=YOUR_NETID@princeton.edu
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you intend to keep this, you should mention that this is happening in the documentation. This is not something I would personally want on by default.

```bash
cd /scratch/gpfs/CDHRSE/<netid>/muse
git clone <repo-url> muse && cd muse
mkdir -p logs
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provide more detail about logging.

tanhaow added 4 commits April 8, 2026 11:28
- Fix tensor device placement in translate.py: use .to(model.device) on
  tokenizer output directly instead of iterating the dict
- Rewrite della-instructions.md: add scratch storage info, conda env
  setup, HF cache setup, CPU vs GPU job differences, logging detail
- Update slurm scripts: use bash positional args, rename account
  placeholder, add header comments, remove mem-per-cpu from GPU script,
  remove mail-type default
- Merge translate-della-cpu.slurm and translate-della-gpu.slurm into a
  single translate-della.slurm; GPU-only lines are commented out with ##SBATCH
- Update della-instructions.md to reference the combined script and
  document default configuration (time, memory, CPU vs GPU differences)
@tanhaow tanhaow requested a review from laurejt April 8, 2026 16:13
Copy link
Copy Markdown

@laurejt laurejt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good. This should only need one more round of changes:

Requested change:

  • Add the slurm directive --partition=mig to translate-ella.slurm
  • For della-instructions.md:
    • Update the "Clone the repo" section to better document the logging directory creation (and hard-coding within the slurm script)
    • Expand the documentation for the conda environment and our current workaround. See comments.
    • Update the "Set up the HuggingFace cache" section so that it does not recommend building the HuggingFace cache by loading the tokenizers and models on the head node.
    • Update the "Running on GPU" so that it directs the reader to the Della GPU Jobs documentation rather than the not-quite-right slurm directive.

# ---------------------------------------------------------------------------
cd ${REPO}

uv run src/muse/translation/translate_corpus.py $1 $2 $3 No newline at end of file
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works for now, but if we intend to use this. It's useful to add some argument validation.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add the following GPU configuration setting:

#SBATCH --partition=mig

## Prerequisites

- A Princeton HPC account with access to Della — request access through the [Research Computing portal](https://researchcomputing.princeton.edu/get-started/request-account)
- Membership in the `CDHRSE` group to access `/scratch/gpfs/CDHRSE/` — ask a current CDH RSE to add you
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rlskoeser Is this how the CDHRSE group works?


## Setup

### Clone the repo
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more accurately setting up the muse working directory. Rename accordingly.

Comment on lines +23 to +24
#SBATCH --output=logs/%x_%j.out
#SBATCH --error=logs/%x_%j.err
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth documenting that this is choice is being hard-coded. I think it's fine to do that, but worth mentioning, since the logs could be saved anywhere.

1. Uncomment `##SBATCH --gres=gpu:1` in the script
2. Remove or comment out `--mem-per-cpu` — GPU memory allocation is pre-defined by the partition and cannot be set manually

By default, `--gres=gpu:1` allocates a MIG slice of an A100 (the `mig` partition). If you need a full A100, also add `--partition=gpu`.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our job should include the --partition=mig parameter and should state this.

Replace the second sentence with a pointer to the Della GPU Jobs section. Other slurm directives should be used if wanting to access a GPU in the "gpu" partition. I was mistaken about the directive always being --partition in other cases it is --constraint.

It's worth reading through this documentation if you haven't.

tanhaow added 2 commits April 8, 2026 13:22
- Add --partition=mig as commented-out GPU option in slurm script
- Add argument validation to slurm script
- Add comments explaining module load and conda env activation
- Add comment noting logs location is hard-coded
- Rename Setup section to 'Set up the muse working directory'
- Separate git clone and mkdir into explicit steps in doc
- Expand conda env explanation: why conda instead of uv, workaround context
- Remove scp cache copy option (non-trivial, loads model/tokenizer)
- Flag CDHRSE group access detail for @rlskoeser to confirm
@tanhaow tanhaow requested a review from laurejt April 8, 2026 17:59
Copy link
Copy Markdown

@laurejt laurejt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@tanhaow tanhaow merged commit 9154a0b into develop Apr 8, 2026
1 check passed
@tanhaow tanhaow deleted the feature/della-gpu-support branch April 8, 2026 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants