Conversation
Also move model inputs to the correct device before generation in nllb_translate and madlad_translate.
laurejt
left a comment
There was a problem hiding this comment.
Please also document how to run on della with CPUs. I'll review once that information is added.
laurejt
left a comment
There was a problem hiding this comment.
Thanks for documenting this.
Requested changes:
- Simplify
translate.pycode (see comments) - Combine
della-cpu.mdanddella-gpu.mdinto a single document. Be sure to document the differences between the two types of jobs - Combine the
translate-della-cpu.slurmandtranslate-della-gpu.slurm. For the the non-overlapping parameters, simply comment them out )(i.e., start with##SBATCH) - Update the slurm script so it uses bash input args
- Update the slurm script so it uses the recommended configurations for della
- Include more documentation in the
mdfile. It should describe in enough detail how the slurm script is configured by default. For example, mention the time, partition and/or memory allocation (gpu vs. cpu).
src/muse/translation/translate.py
Outdated
| model_inputs = tokenizer(text, return_tensors="pt") | ||
| # Move input tensors to the same device as the model | ||
| model_inputs = {k: v.to(model.device) for k, v in model_inputs.items()} |
There was a problem hiding this comment.
You shouldn't need to iterate the tensor directly
| model_inputs = tokenizer(text, return_tensors="pt") | |
| # Move input tensors to the same device as the model | |
| model_inputs = {k: v.to(model.device) for k, v in model_inputs.items()} | |
| model_inputs = tokenizer(text, return_tensors="pt").to(model.device) |
src/muse/translation/translate.py
Outdated
| model_inputs = tokenizer(f"<2{tgt_lang}> {text}", return_tensors="pt") | ||
| model_inputs = {k: v.to(model.device) for k, v in model_inputs.items()} |
There was a problem hiding this comment.
You shouldn't need to iterate the tensor directly
| model_inputs = tokenizer(f"<2{tgt_lang}> {text}", return_tensors="pt") | |
| model_inputs = {k: v.to(model.device) for k, v in model_inputs.items()} | |
| model_inputs = tokenizer(f"<2{tgt_lang}> {text}", return_tensors="pt").to(model.device) |
| @@ -0,0 +1,44 @@ | |||
| #!/bin/bash | |||
| #SBATCH --job-name=muse-translate # Job name shown in squeue | |||
| #SBATCH --account=YOUR_ACCOUNT # Slurm account (faculty collaborator's account) | |||
There was a problem hiding this comment.
Make the fake var more descriptive
| #SBATCH --account=YOUR_ACCOUNT # Slurm account (faculty collaborator's account) | |
| #SBATCH --account=FACULTY_NETID # Slurm account (faculty collaborator's account) |
| #SBATCH --nodes=1 # Single node job | ||
| #SBATCH --ntasks=1 # Single task | ||
| #SBATCH --cpus-per-task=1 # CPUs for data loading / tokenization | ||
| #SBATCH --mem-per-cpu=10G # Memory per CPU (10G is sufficient for 1.8B–4B models) |
There was a problem hiding this comment.
This should not be included per della's instructions
docs/della-instructions.md
Outdated
| # Update HuggingFace cache (login node only) | ||
| export HF_HOME=/scratch/gpfs/CDHRSE/<netid>/huggingface-cache | ||
| python -c "from transformers import AutoTokenizer, AutoModelForCausalLM; \ | ||
| AutoTokenizer.from_pretrained('tencent/HY-MT1.5-1.8B'); \ | ||
| AutoModelForCausalLM.from_pretrained('tencent/HY-MT1.5-1.8B')" | ||
| ``` |
There was a problem hiding this comment.
Seems weird to have it here, since it is not a slurm command. Move this to set-up.
Also, worth mentioning that an easy alternative to this would be to just copy over the cache from your local dev environment
| #SBATCH --mail-type=END,FAIL # Email on job end or failure | ||
| #SBATCH --mail-user=YOUR_NETID@princeton.edu |
There was a problem hiding this comment.
If you intend to keep this, you should mention that this is happening in the documentation. This is not something I would personally want on by default.
docs/della-gpu.md
Outdated
| ```bash | ||
| cd /scratch/gpfs/CDHRSE/<netid>/muse | ||
| git clone <repo-url> muse && cd muse | ||
| mkdir -p logs |
- Fix tensor device placement in translate.py: use .to(model.device) on tokenizer output directly instead of iterating the dict - Rewrite della-instructions.md: add scratch storage info, conda env setup, HF cache setup, CPU vs GPU job differences, logging detail - Update slurm scripts: use bash positional args, rename account placeholder, add header comments, remove mem-per-cpu from GPU script, remove mail-type default
- Merge translate-della-cpu.slurm and translate-della-gpu.slurm into a single translate-della.slurm; GPU-only lines are commented out with ##SBATCH - Update della-instructions.md to reference the combined script and document default configuration (time, memory, CPU vs GPU differences)
laurejt
left a comment
There was a problem hiding this comment.
This is looking pretty good. This should only need one more round of changes:
Requested change:
- Add the slurm directive
--partition=migtotranslate-ella.slurm - For
della-instructions.md:- Update the "Clone the repo" section to better document the logging directory creation (and hard-coding within the slurm script)
- Expand the documentation for the conda environment and our current workaround. See comments.
- Update the "Set up the HuggingFace cache" section so that it does not recommend building the HuggingFace cache by loading the tokenizers and models on the head node.
- Update the "Running on GPU" so that it directs the reader to the Della GPU Jobs documentation rather than the not-quite-right slurm directive.
examples/slurm/translate-della.slurm
Outdated
| # --------------------------------------------------------------------------- | ||
| cd ${REPO} | ||
|
|
||
| uv run src/muse/translation/translate_corpus.py $1 $2 $3 No newline at end of file |
There was a problem hiding this comment.
This works for now, but if we intend to use this. It's useful to add some argument validation.
There was a problem hiding this comment.
Need to add the following GPU configuration setting:
#SBATCH --partition=mig
docs/della-instructions.md
Outdated
| ## Prerequisites | ||
|
|
||
| - A Princeton HPC account with access to Della — request access through the [Research Computing portal](https://researchcomputing.princeton.edu/get-started/request-account) | ||
| - Membership in the `CDHRSE` group to access `/scratch/gpfs/CDHRSE/` — ask a current CDH RSE to add you |
There was a problem hiding this comment.
@rlskoeser Is this how the CDHRSE group works?
docs/della-instructions.md
Outdated
|
|
||
| ## Setup | ||
|
|
||
| ### Clone the repo |
There was a problem hiding this comment.
This is more accurately setting up the muse working directory. Rename accordingly.
examples/slurm/translate-della.slurm
Outdated
| #SBATCH --output=logs/%x_%j.out | ||
| #SBATCH --error=logs/%x_%j.err |
There was a problem hiding this comment.
I think it's worth documenting that this is choice is being hard-coded. I think it's fine to do that, but worth mentioning, since the logs could be saved anywhere.
| 1. Uncomment `##SBATCH --gres=gpu:1` in the script | ||
| 2. Remove or comment out `--mem-per-cpu` — GPU memory allocation is pre-defined by the partition and cannot be set manually | ||
|
|
||
| By default, `--gres=gpu:1` allocates a MIG slice of an A100 (the `mig` partition). If you need a full A100, also add `--partition=gpu`. |
There was a problem hiding this comment.
Our job should include the --partition=mig parameter and should state this.
Replace the second sentence with a pointer to the Della GPU Jobs section. Other slurm directives should be used if wanting to access a GPU in the "gpu" partition. I was mistaken about the directive always being --partition in other cases it is --constraint.
It's worth reading through this documentation if you haven't.
- Add --partition=mig as commented-out GPU option in slurm script - Add argument validation to slurm script - Add comments explaining module load and conda env activation - Add comment noting logs location is hard-coded - Rename Setup section to 'Set up the muse working directory' - Separate git clone and mkdir into explicit steps in doc - Expand conda env explanation: why conda instead of uv, workaround context - Remove scp cache copy option (non-trivial, loads model/tokenizer) - Flag CDHRSE group access detail for @rlskoeser to confirm
Associated Issue(s): resolves #
Changes in this PR
device_map="auto"to all from_pretrained calls so HuggingFace models useGPUwhen availableexamples/slurm/translate-della-gpu.slurmfor running translation jobs on Delladocs/della-gpu.mdcovering Della setup, job submission, and useful commandsNotes
Before adding the GPU support, the job took ~6.5 minutes of CPU time. After adding GPU support, CPU time dropped to ~27 seconds. 🚀 Roughly a 14x speedup.
Reviewer Checklist