-
Notifications
You must be signed in to change notification settings - Fork 63
build: fix the blackwell dockerfile #84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| PyTorch: 2.7.0a0+ecf3bae40a.nv25.02 | ||
|
|
||
| ``` | ||
| CUDA: 13.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is important: with CUDA 12.9+ we get sm121 support out of the box
| RUN git clone https://github.com/aqlaboratory/openfold-3.git && \ | ||
| cd openfold-3 && \ | ||
| cp -p environments/production-linux-64.yml environments/production.yml.backup && \ | ||
| grep -v "pytorch::pytorch" environments/production.yml > environments/production.yml.tmp && \ | ||
| mv environments/production.yml.tmp environments/production.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was completely unused: everything is installed via the system python+pip
| # Set environment variables including CUDA architecture for Blackwell | ||
| ENV PYTHONUNBUFFERED=1 \ | ||
| PYTHONDONTWRITEBYTECODE=1 \ | ||
| KMP_AFFINITY=none \ | ||
| CUTLASS_PATH=/opt/cutlass \ | ||
| TORCH_CUDA_ARCH_LIST="12.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can still remove some of these – all of those could be provided at runtime, and are quite specific to the use case here
| "nvidia-cutlass<4" \ | ||
| "cuda-python<12.9.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We get coda-python with the image, no need to duplicate that
- We also only need the cutlass headers, no need to install the package
|
|
||
| # Pre-compile DeepSpeed operations for Blackwell GPUs to avoid runtime compilation | ||
| # Create necessary cache directories | ||
| RUN python3 -c "import os; os.makedirs('/root/.triton/autotune', exist_ok=True)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is empirically needed in my tests, which is a bit odd
| RUN mkdir -p /usr/local/lib/python3.12/site-packages && \ | ||
| echo 'import os' > /usr/local/lib/python3.12/site-packages/sitecustomize.py && \ | ||
| echo 'os.environ.setdefault("TORCH_CUDA_ARCH_LIST", "12.0")' >> /usr/local/lib/python3.12/site-packages/sitecustomize.py && \ | ||
| echo 'os.environ.setdefault("CUTLASS_PATH", "/opt/cutlass")' >> /usr/local/lib/python3.12/site-packages/sitecustomize.py && \ | ||
| echo 'os.environ.setdefault("KMP_AFFINITY", "none")' >> /usr/local/lib/python3.12/site-packages/sitecustomize.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this can be removed
Lots and lots of ENV magic and overrides... not great
| - --extra-index-url https://download.pytorch.org/whl/cu130 | ||
| - torch>=2.9.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is important to get a sufficiently high version of torch. A couple of things got removed or moved
- biotite conda package only exists for linux64 but the pip package does better
- mkl removed
- pytorch-cuda, again only for linux64
| CUDA_HOME: /usr/local/cuda | ||
| PATH: /usr/local/cuda/bin:${PATH} | ||
| LD_LIBRARY_PATH: /usr/local/cuda/lib64:${LD_LIBRARY_PATH} | ||
| # Triton bundles its own ptaxs which does not support sm_121 | ||
| # This forces Triton to use the system ptaxas compiler, aware of sm_121 | ||
| TRITON_PTXAS_PATH: /usr/local/cuda/bin/ptxas | ||
| # Requires: git clone https://github.com/NVIDIA/cutlass --branch v3.6.0 --depth 1 ~/workspace/cutlass | ||
| CUTLASS_PATH: /home/jandom/workspace/cutlass | ||
| # Note: OMP_NUM_THREADS=1 is required to avoid threading conflicts | ||
| OMP_NUM_THREADS: "1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the really ugly part, especially the hard-coded paths specific to my box or $HOME – all of this get taken care of when using the docker image from nvidia with torch pre-installed
Summary
Goal: ideally we can have a single process (single base image?) for building the docker image that also works on Blackwell.
Context: the current state is Dockerfile that uses conda (one base image), and the Blackwell image that installs everything through the system python, and comes with PyTorch included (another base image).
we've had a number of contributions around this
Mike Henry has donated his DGX for testing and I was able to completed the Blackwell build, and get some simple inferences running.
Bad news: because our environment.yml uses PyTorch-cuda, that's not actually available for aarch64/arm, we basically install all the deps manually, both the apt-get packages and the pip packages. I also took a completely different base image (in line with both contributions but different to our standard base).
Good news: I was able to simplify the Dockerfile significantly because all the tools have been upgraded to handle sm121. The performance looks comparable to what was reported
For ubiquitin
Changes
Upgraded the base image, removed some duplicate package installs that were not needed, better layering.
Related Issues
Testing
Other Notes