Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Add NCCL notice for Docker containers running 12.8 in the Release Selector #574

Open
taureandyernv opened this issue Feb 12, 2025 · 1 comment
Labels
? - Needs Triage Need team to review and classify doc

Comments

@taureandyernv
Copy link
Contributor

taureandyernv commented Feb 12, 2025

Report needed documentation

Report needed documentation
@dantegd and team discovered that 25.02 ARM containers on CUDA 12.8 have a docker permission challenge, requiring them to run the flag NCCL_CUMEM_HOST_ENABLE=0 inside the container before running some multi GPU algorithms.

Describe the documentation you'd like
Can we add a note between the selector and the code command, if the user selects Stable, Docker, and CUDA 12.8 options, detailing the need to add the flag above?

@dantegd @bdice @aravenel for awareness

@taureandyernv taureandyernv added ? - Needs Triage Need team to review and classify doc labels Feb 12, 2025
@dantegd
Copy link
Member

dantegd commented Feb 13, 2025

Linking PR rapidsai/docker#735 in case we prefer to enable the variable in the container and not the instructions (or both)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify doc
Projects
None yet
Development

No branches or pull requests

2 participants