-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Do not interpret GPU index as GPU device minor.
There are 3 ways to identify a GPU device that are relevant: 1. GPU Index: This is the index of the GPU in PCIe ordering. 2. GPU Device Minor: This is the minor device number. The GPU device number on the system is 195:minor. 3. /dev/nvidia#: The GPU can be accessed via the device mounted at this path. nvproxy incorrectly assumed that all 3 identifiers are the same. In reality, only (2) and (3) are always the same. (1) can differ, as demonstrated in #9389. So passthrough the value of `NVIDIA_VISIBLE_DEVICES` to `nvidia-container-cli` while invoking the `configure` command via the `--devices` flag. The CLI uses NVML to figure out the GPU Index -> GPU Device mapping and mounts the right devices into the containers root filesystem at /container/rootfs/dev/. We subsequently scan this directory to *infer* the GPU devices exposed to the container. This information in plumbed through various places and appropriate virtualized gVisor devices are created. An unintended benefit of this is that identifying GPUs via GPU UUIDs now works! Earlier we only accepted GPU index. Now we can just pass through GPU UUID to the CLI and it will deal with it. So `docker run --gpus="device=GPU-4e716e7d" works now. Alternatives considered: 1. Parsing `nvidia-container-cli info` output to figure out index -> minor mapping. However, this is a costly operation (as reported in #9215) which can take 2-3 seconds. 2. Using NVML in runsc via nvidia/go-nvml library. Apart from the downsides of adding another third party dependency, this entails that we duplicate logic from nvidia-container-cli into runsc (mainly logic around src/cli/common.c:select_devices()). This is technical debt and won't age well (will require us to mimic updates to CLI into runsc). Fixes #9389 PiperOrigin-RevId: 567443164
- Loading branch information
1 parent
ae1294b
commit 05b7c55
Showing
9 changed files
with
126 additions
and
63 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
// Copyright 2023 The gVisor Authors. | ||
// | ||
// Licensed under the Apache License, Version 2.0 (the "License"); | ||
// you may not use this file except in compliance with the License. | ||
// You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, software | ||
// distributed under the License is distributed on an "AS IS" BASIS, | ||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
package boot | ||
|
||
import ( | ||
"fmt" | ||
"strconv" | ||
"strings" | ||
) | ||
|
||
// NvidiaDevMinors can be used to pass nvidia device minors via flags. | ||
type NvidiaDevMinors []uint32 | ||
|
||
// String implements flag.Value. | ||
func (n *NvidiaDevMinors) String() string { | ||
minors := make([]string, 0, len(*n)) | ||
for _, minor := range *n { | ||
minors = append(minors, strconv.Itoa(int(minor))) | ||
} | ||
return strings.Join(minors, ",") | ||
} | ||
|
||
// Get implements flag.Value. | ||
func (n *NvidiaDevMinors) Get() any { | ||
return n | ||
} | ||
|
||
// Set implements flag.Value and appends a device minor from the command | ||
// line to the device minors array. Set(String()) should be idempotent. | ||
func (n *NvidiaDevMinors) Set(s string) error { | ||
minors := strings.Split(s, ",") | ||
for _, minor := range minors { | ||
minorVal, err := strconv.Atoi(minor) | ||
if err != nil { | ||
return fmt.Errorf("invalid device minor value (%d): %v", minorVal, err) | ||
} | ||
*n = append(*n, uint32(minorVal)) | ||
} | ||
return nil | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters