-
Notifications
You must be signed in to change notification settings - Fork 63
Modernize conda environment #34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thank you for the draft! |
|
DeepSpeed accepted our first upstream fix regarding the ninja detection (deepspeedai/DeepSpeed#7687). Once a new version is released, this should allow us to get rid of the PyPI ninja dependency. Of course, this fix will only come into play if we decide against the vendoring approach. |
|
As of 2024/12/01, packages still installed from pypi after installing openfold3 in devel/editable mode: To investigate
Proposed solution: remove aria2 from pypi dependencies, as it is currently unused in OF3 codebase. The pypi package is an old convenience and should in general not be used to install aria2, as it is not even such big of a convenience. Because of cuequivariance_ops_torch_cu12
These should at the very least aligned with the CF version, but likely it is best to just install all from pypi until we understand how to deal with the license. The key question is what to do with libcublas, maybe we should add synonyms to parselmouth in pixi - although I am not 100% sure these two packages are 100% binary compatible. Currently the biggest block to have a conda package with these is their LICENSE. See also: NVIDIA/cuEquivariance#218 It could be interesting to see if openequivariance could be a viable alternative: Because of mkl
Proposed solution: remove mkl from pypi dependencies, as it is actually unused (pytorch links it statically, numpy and scipy are not build against it and do not dynamically dispatch). |
Also: |
|
Coming back to this after the end of year "hiatus". Current state and TODOs:
|
all tests pass, predictions seem to be correct corresponds to a modernized conda environment following best practices
incomplete, we might not need the native sources from upstream commit df59f203f40c8a292dd019ae68c9e6c88f107026
incomplete, we might not need the native sources from upstream commit df59f203f40c8a292dd019ae68c9e6c88f107026
Use vendored deepspeed in the attention primitives
…ks from pixi environment
|
Slowly cooking, hopefully this will taste good. While we still need to take care of multiple loose ties, everything is usable. So feel free to try @jnwei @jandom @Emrys-Merlin @dwhswenson. Feedback, suggestions, bug reports are very welcome! Happy to meet again sometime soon if useful. TryInstall pixi (you need >=0.63.1), then: # Clone or switch to the branch
git clone -b modernize-conda-environment https://github.com/sdvillal/openfold-3.git
cd openfold-3
# We could instruct pixi not to update the environment
# export PIXI_FROZEN=1
# See what is in store
pixi info
# List dependencies in all environments
pixi list
# Run tests in the CPU environment (which is also the default and works everywhere)...
pixi run -e openfold3-cpu test
# ...this will save the test log in test-results
# Run openfold3 using CUDA12
pixi run -e openfold3-cuda12 run_openfold --help
# Enter a shell with an activated environment
pixi shell -e openfold3-cuda13-pypi
run_openfold --help
# then exit
exit
# Export to good old conda environment yamls (I have to test these)
pixi run export-conda openfold3-cuda12 linux-aarch64
# ...or export all in bulk
pixi run export-conda-all
# ...files will be saved to the environments directory
# Same but generating conda-lock environements
pixi run export-conda-lock openfold3-cuda12
# ...or export all in bulk
pixi run export-conda-lock-all
# ...files will be saved to the environments directoryStatus
Some more loose ties
|
|
@sdvillal legend – this is huge, this is insane amount of work! Thank you – instant tier-1 contributor status 🏆 Will take a look at this next week, really looking forward to it! |
|
Ah, any chance we can remove the vendored code? i think you mentioned it's not needed any more |
Not yet. It depends on how much we want to wait: we need the last changes in deepspeed to make it into release and then run a quick round of testing. I have just checked and it seems deepspeed have a reasonably fast release cadence (every 2-4 weeks). They replied quickly to our two pull requests too, and the CF people were also fast at merging our package update request. So I feel it should not take long. I too see potential problems with vendoring: large diff for an only temporary solution and risk for bugs - although we have not experienced any so far. I do not think there is any license concern - deepspeed is Apache2 and every file contains the LICENSE stanza. But if we decide for waiting, then I will remove all traces of the deepspeed code from the branch history before merging. I suggest to let you guys try a bit, see how many problems and suggestions arise, and decide then if and when we merge. An idea would also be to point to this PR to people struggling with environment setup or having access to different untested hardware, to get some more eyes on it. Happy for anything, and I am anyway now out of time to work much on this until around the first week of February. |
|
Ah, makes sense – I mostly meant splitting up the auto-generated code into a separate PR. I'm particularly interested in the blackwell build, I struggled massively with this – ultimately was able to build but to support blackwell fully, i needed to go with latest-and-greatest CUDA and that wasn't fully supported downstream. |
`EvoformerAttnBuilder` has some problems which preclude compiling the extension on several scenarios (e.g., [isolated conda environment with cuda toolchain](aqlaboratory/openfold-3#34), lack of hardware in the system) and breaks some standard DeepSpeed configuration of target capabilities. *Changes* - Fix evoformer CUTLASS detection: - Allow to skip it, useful when CUTLASS is already correctly setup (e.g., in a conda environment with CUTLASS and the CUDA toolchain) - Fix misleading use of deprecated nvidia-cutlass pypi package by actually using the provided bindings but discouraging this route as [these bindings are not maintained anymore](NVIDIA/cutlass#2119) - Fix evoformer compilation with no GPU is present: - this is taken care correctly and more generally by builder.compute_capability_args - allow for cross-compilation in systems without GPU - allows for compilation against all available virtual architectures and binary outputs - see e.g., #5308 - Make all these changes configurable and explicit through documented environment variables Tested in all scenarios. --------- Signed-off-by: Santi Villalba <sdvillal@gmail.com> Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>
eb192fd to
23a7f0c
Compare
|
Note: I forced-pushed & merged main |
There is no autogenerated code, but rather some simple copy from deepspeed + patching to change the name-space and simplify. I believe it is all well isolated on a few commits, so it would not be hard to take apart. From my tests, things work in blackwell but they might not be taking full advantage of the new architecture yet. Just give it a try following the instructions above and let me know what fails :-) |
|
Thanks Santi :-) regarding:
I think it should be perfectly fine to remove all of them. Some context why I created the Dockerfiles originally: I had trouble isolating the separate environments on the same machine. In particular, even the "pure python" environment needed at least one conda installed package (kalign2). The cleanest idea, I could come up with, was to isolate everything into docker containers. The added benefit was that for the pure python environment, I could avoid installing the cudatoolkit etc. via conda, because I outsourced that to the base image. I think it's important to stress that I only created the images to test the environments. These are definitely no production ready images that should be used for inference runs or the like. If there is any chance of misuse we should definitely remove the files. |
As an intermediate step, I moved them to their own directory in 9d0e7f1 (which will break building them, but can be easily fixed by manipulating paths here and there). A key question is if these conda environment could help building the docker images, to ease our maintenance burden, or if it is better to depend on NVIDIA's official images that bring the latest and greatest dependencies supposed to work well together. |
something smilar should actually be supported in pyproject.toml
|
After a new round of yak shaving, things on secondary features are cleaner now:
|
|
Hey there @sdvillal – this is fantastic, should really be in all caps, some ideas and comments.
Here is some further good and bad news for when I checked out your code. First background on my box
CUDA12CUDA13 |
|
Hey @jandom
At the moment indeed we are targetting only inference in this PR. When you say there is no way to test these changes in training, do you mean that our test coverage is not useful enough? There will, of course, always be the effects that can only be seen in long training runs. |
|
@jandom DGX spark, right? That is AARCH64 + GB10 (= compute capability 12.1), possibly the two less mature combo? Let's make it work and add to our support matrix :-). Since I do not have access to the hardware I would need a bit of help, hope it does not become too hairy! Two notes first:
Let's start from the CUDA12 environment, that sounds like cuequivariance/triton not supporting the GPU. It seems to try to query for something that is not queryable in that machine, like the GPU temperature, which you can also see in the nvidia-smi output. From the top of my mind, we can try:
Once we know, we can create yet another environment variant (e.g., -no-acceleration or -reduced-acceleration). I suspect this is a problem we should report to cuequivariance upstream, like here Let's do the CUDA13 bit, which to me is where the real prize is, later. Maybe already against pytorch 2.10. I have some suspicion. |
Summary
Adds a modern conda environment following best practices to improve the quality of life of conda users.
The environment is self-contained, including a sane toolchain to build extensions fully compatible with the rest of dependencies, and with batteries included (inference, bioinformatics, fast kernels, dev dependencies).
We maintain a pixi workspace and an automatically generated conda environment for non-pixi users.
We still need to iron-out four known problems (see overcomments in pixi.toml and upcoming issues) and add documentation.
From here, creating conda-forge openfold3 package and bioconda openfold3-extra should be simple enough.
Changes
Related Issues
TBC
Testing
Current environment passes all tests and produce sensible predictions.
Other Notes
This is exploratory at the moment. Will cleanup commit history or open a clean PR when we are done.