Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[develop] Replace nvidia-persistenced with parallelcluster_nvidia service #2348

Merged
merged 1 commit into from
Jul 7, 2023

Conversation

enrico-usai
Copy link
Contributor

@enrico-usai enrico-usai commented Jul 7, 2023

parallelcluster_nvidia service ensures the creation of the block devices /dev/nvidia0 and it is needed by the slurmd service.

parallelcluster_nvidia starts the nvidia-persistenced or runs nvidia-smi to avoid race condition with other services and avoids conflicts when using DLAMI with a gpu instance.

Tests

  • Modified ChefSpec to verify new changes.
  • bash kitchen.ec2.sh platform-config test nvidia-uvm-alinux2

References

Backport of: #2341

`parallelcluster_nvidia` service ensures the creation of the block devices `/dev/nvidia0`
and it is needed by the `slurmd` service.

`parallelcluster_nvidia` starts the `nvidia-persistenced` or runs `nvidia-smi`
to avoid race condition with other services and avoids conflicts when using DLAMI with a gpu instance.

### Tests
* Modified ChefSpec to verify new changes.

### References
Backport of: aws#2341

Signed-off-by: Enrico Usai <usai@amazon.com>
@enrico-usai enrico-usai requested review from a team as code owners July 7, 2023 09:26
@enrico-usai enrico-usai changed the title Replace nvidia-persistenced with parallelcluster_nvidia service [develop] Replace nvidia-persistenced with parallelcluster_nvidia service Jul 7, 2023
@codecov
Copy link

codecov bot commented Jul 7, 2023

Codecov Report

Merging #2348 (1eb92b7) into develop (c7ead76) will increase coverage by 0.28%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           develop    #2348      +/-   ##
===========================================
+ Coverage    70.05%   70.34%   +0.28%     
===========================================
  Files           13       13              
  Lines         1837     1851      +14     
===========================================
+ Hits          1287     1302      +15     
+ Misses         550      549       -1     
Flag Coverage Δ
unittests 70.34% <100.00%> (+0.28%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...s/default/head_node_slurm/slurm/config_renderer.py 99.21% <100.00%> (+0.03%) ⬆️
...ode_slurm/slurm/pcluster_slurm_config_generator.py 65.10% <100.00%> (+1.98%) ⬆️

... and 1 file with indirect coverage changes

@enrico-usai enrico-usai enabled auto-merge (rebase) July 7, 2023 09:42
@enrico-usai enrico-usai closed this Jul 7, 2023
auto-merge was automatically disabled July 7, 2023 11:00

Pull request was closed

@enrico-usai enrico-usai reopened this Jul 7, 2023
@enrico-usai enrico-usai enabled auto-merge (rebase) July 7, 2023 11:00
@enrico-usai enrico-usai merged commit 111488b into aws:develop Jul 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants