build: fix the blackwell dockerfile #84

jandom · 2026-01-07T17:43:02Z

Summary

Goal: ideally we can have a single process (single base image?) for building the docker image that also works on Blackwell.

Context: the current state is Dockerfile that uses conda (one base image), and the Blackwell image that installs everything through the system python, and comes with PyTorch included (another base image).

we've had a number of contributions around this

[Community] ARM64/Blackwell Docker Deployment for DGX Spark #78 a ticket with build instructions
Add Blackwell GPU support using NVIDIA PyTorch container #26 original PR with Blackwell support

Mike Henry has donated his DGX for testing and I was able to completed the Blackwell build, and get some simple inferences running.

Bad news: because our environment.yml uses PyTorch-cuda, that's not actually available for aarch64/arm, we basically install all the deps manually, both the apt-get packages and the pip packages. I also took a completely different base image (in line with both contributions but different to our standard base).

Good news: I was able to simplify the Dockerfile significantly because all the tools have been upgraded to handle sm121. The performance looks comparable to what was reported

For ubiquitin

cold-start 0:02:26
warm-start 0:00:05

Changes

Upgraded the base image, removed some duplicate package installs that were not needed, better layering.

Related Issues

Training on Blackwell (out of scope)
Pre-complie the triton extension via docker commit (out of scope)
Run multiple benchmark runs to get a full picture
Visually confirm that the predictions look sane
Unify the Blackwell and 'main' docker image

Testing

Other Notes

jandom · 2026-01-08T13:18:09Z

docker/Build_instructions_blackwell.md

-PyTorch: 2.7.0a0+ecf3bae40a.nv25.02

+```
+CUDA: 13.1


This is important: with CUDA 12.9+ we get sm121 support out of the box

jandom · 2026-01-08T13:18:38Z

docker/Dockerfile.blackwell

-RUN git clone https://github.com/aqlaboratory/openfold-3.git && \
-    cd openfold-3 && \
-    cp -p environments/production-linux-64.yml environments/production.yml.backup && \
-    grep -v "pytorch::pytorch" environments/production.yml > environments/production.yml.tmp && \
-    mv environments/production.yml.tmp environments/production.yml


This was completely unused: everything is installed via the system python+pip

jandom · 2026-01-08T13:19:08Z

docker/Dockerfile.blackwell

+# Set environment variables including CUDA architecture for Blackwell
+ENV PYTHONUNBUFFERED=1 \
+    PYTHONDONTWRITEBYTECODE=1 \
+    KMP_AFFINITY=none \
+    CUTLASS_PATH=/opt/cutlass \
+    TORCH_CUDA_ARCH_LIST="12.1"


I think we can still remove some of these – all of those could be provided at runtime, and are quite specific to the use case here

jandom · 2026-01-08T13:19:39Z

docker/Dockerfile.blackwell

-    "nvidia-cutlass<4" \
-    "cuda-python<12.9.1"


We get coda-python with the image, no need to duplicate that

We also only need the cutlass headers, no need to install the package

jandom · 2026-01-08T13:20:08Z

docker/Dockerfile.blackwell

+
+# Pre-compile DeepSpeed operations for Blackwell GPUs to avoid runtime compilation
+# Create necessary cache directories
+RUN python3 -c "import os; os.makedirs('/root/.triton/autotune', exist_ok=True)"


This is empirically needed in my tests, which is a bit odd

jandom · 2026-01-08T13:20:25Z

docker/Dockerfile.blackwell

-RUN mkdir -p /usr/local/lib/python3.12/site-packages && \
-    echo 'import os' > /usr/local/lib/python3.12/site-packages/sitecustomize.py && \
-    echo 'os.environ.setdefault("TORCH_CUDA_ARCH_LIST", "12.0")' >> /usr/local/lib/python3.12/site-packages/sitecustomize.py && \
-    echo 'os.environ.setdefault("CUTLASS_PATH", "/opt/cutlass")' >> /usr/local/lib/python3.12/site-packages/sitecustomize.py && \
-    echo 'os.environ.setdefault("KMP_AFFINITY", "none")' >> /usr/local/lib/python3.12/site-packages/sitecustomize.py


All of this can be removed

Lots and lots of ENV magic and overrides... not great

jandom · 2026-01-09T15:01:36Z

environments/production-linux-aarch64.yml

+      - --extra-index-url https://download.pytorch.org/whl/cu130
+      - torch>=2.9.0


This is important to get a sufficiently high version of torch. A couple of things got removed or moved

biotite conda package only exists for linux64 but the pip package does better

mkl removed

pytorch-cuda, again only for linux64

jandom · 2026-01-09T15:02:35Z

environments/production-linux-aarch64.yml

+  CUDA_HOME: /usr/local/cuda
+  PATH: /usr/local/cuda/bin:${PATH}
+  LD_LIBRARY_PATH: /usr/local/cuda/lib64:${LD_LIBRARY_PATH}
+  # Triton bundles its own ptaxs which does not support sm_121
+  # This forces Triton to use the system ptaxas compiler, aware of sm_121
+  TRITON_PTXAS_PATH: /usr/local/cuda/bin/ptxas
+  # Requires: git clone https://github.com/NVIDIA/cutlass --branch v3.6.0 --depth 1 ~/workspace/cutlass
+  CUTLASS_PATH: /home/jandom/workspace/cutlass
+  # Note: OMP_NUM_THREADS=1 is required to avoid threading conflicts
+  OMP_NUM_THREADS: "1"


This is the really ugly part, especially the hard-coded paths specific to my box or $HOME – all of this get taken care of when using the docker image from nvidia with torch pre-installed

build: fix the blackwell dockerfile

2b5fa83

jandom requested a review from jnwei January 7, 2026 17:43

jandom self-assigned this Jan 7, 2026

jandom commented Jan 8, 2026

View reviewed changes

working example of blackwell running without docker

da12523

Lots and lots of ENV magic and overrides... not great

jandom commented Jan 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

build: fix the blackwell dockerfile #84

build: fix the blackwell dockerfile #84

Uh oh!

jandom commented Jan 7, 2026 •

edited

Loading

Uh oh!

jandom Jan 8, 2026

Uh oh!

jandom Jan 8, 2026

Uh oh!

jandom Jan 8, 2026

Uh oh!

jandom Jan 8, 2026

Uh oh!

jandom Jan 8, 2026

Uh oh!

jandom Jan 8, 2026

Uh oh!

jandom Jan 9, 2026

Uh oh!

jandom Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		- --extra-index-url https://download.pytorch.org/whl/cu130
		- torch>=2.9.0

build: fix the blackwell dockerfile #84

Are you sure you want to change the base?

build: fix the blackwell dockerfile #84

Uh oh!

Conversation

jandom commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jandom commented Jan 7, 2026 •

edited

Loading