Skip to content

Commit

Permalink
Merge pull request #5179 from GeorgianaElena/gpu-prefix
Browse files Browse the repository at this point in the history
AWS: GPU nodegroups and docs
  • Loading branch information
GeorgianaElena authored Nov 22, 2024
2 parents 9797232 + 7865fce commit 5079562
Show file tree
Hide file tree
Showing 7 changed files with 23 additions and 18 deletions.
9 changes: 7 additions & 2 deletions docs/howto/features/gpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,8 +112,12 @@ AWS, and we can configure a node group there to provide us GPUs.
```
{
instanceType: "g4dn.xlarge",
namePrefix: "gpu-{{hub-name}}",
minSize: 0,
labels+: { "2i2c/hub-name": "{{hub-name}}" },
tags+: {
"k8s.io/cluster-autoscaler/node-template/resources/nvidia.com/gpu": "1"
"k8s.io/cluster-autoscaler/node-template/resources/nvidia.com/gpu": "1",
"2i2c:hub-name": "{{hub-name}}",
},
taints+: {
"nvidia.com/gpu": "present:NoSchedule"
Expand All @@ -126,7 +130,8 @@ AWS, and we can configure a node group there to provide us GPUs.

`g4dn.xlarge` gives us 1 Nvidia T4 GPU and ~4 CPUs. The `tags` definition
is necessary to let the autoscaler know that this nodegroup has
1 GPU per node. The `taints` definition is required to prevent scheduling of
1 GPU per node and also for the cost attribution system to differentiate
between hubs. The `taints` definition is required to prevent scheduling of
non-GPU pods onto the GPU nodes. If you're using a different machine type with
more GPUs, adjust this definition accordingly.

Expand Down
4 changes: 2 additions & 2 deletions eksctl/2i2c-aws-us.jsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ local notebookNodes = [
},
{
instanceType: "g4dn.xlarge",
namePrefix: "nb-showcase",
namePrefix: "gpu-showcase",
minSize: 0,
labels+: { "2i2c/hub-name": "showcase" },
tags+: {
Expand Down Expand Up @@ -119,7 +119,7 @@ local notebookNodes = [
},
{
instanceType: "g4dn.xlarge",
namePrefix: "nb-ncar-cisl",
namePrefix: "gpu-ncar-cisl",
minSize: 0,
labels+: { "2i2c/hub-name": "ncar-cisl" },
tags+: {
Expand Down
4 changes: 2 additions & 2 deletions eksctl/gridsst.jsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ local notebookNodes = [
{
instanceType: "g4dn.xlarge",
minSize: 0,
namePrefix: "nb-staging",
namePrefix: "gpu-staging",
labels+: { "2i2c/hub-name": "staging" },
tags+: {
"2i2c:hub-name": "staging",
Expand All @@ -148,7 +148,7 @@ local notebookNodes = [
{
instanceType: "g4dn.xlarge",
minSize: 0,
namePrefix: "nb-prod",
namePrefix: "gpu-prod",
labels+: { "2i2c/hub-name": "prod" },
tags+: {
"2i2c:hub-name": "prod",
Expand Down
12 changes: 6 additions & 6 deletions eksctl/jupyter-meets-the-earth.jsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ local notebookNodes = [
{
instanceType: "g4dn.xlarge",
minSize: 0,
namePrefix: "nb-staging",
namePrefix: "gpu-staging",
labels+: { "2i2c/hub-name": "staging" },
tags+: {
"2i2c:hub-name": "staging",
Expand All @@ -125,7 +125,7 @@ local notebookNodes = [
{
instanceType: "g4dn.xlarge",
minSize: 0,
namePrefix: "nb-prod",
namePrefix: "gpu-prod",
labels+: { "2i2c/hub-name": "prod" },
tags+: {
"2i2c:hub-name": "prod",
Expand All @@ -138,7 +138,7 @@ local notebookNodes = [
{
instanceType: "g4dn.4xlarge",
minSize: 0,
namePrefix: "nb-staging",
namePrefix: "gpu-staging",
labels+: { "2i2c/hub-name": "staging" },
tags+: {
"2i2c:hub-name": "staging",
Expand All @@ -151,7 +151,7 @@ local notebookNodes = [
{
instanceType: "g4dn.4xlarge",
minSize: 0,
namePrefix: "nb-prod",
namePrefix: "gpu-prod",
labels+: { "2i2c/hub-name": "prod" },
tags+: {
"2i2c:hub-name": "prod",
Expand All @@ -164,7 +164,7 @@ local notebookNodes = [
{
instanceType: "g4dn.16xlarge",
minSize: 0,
namePrefix: "nb-staging",
namePrefix: "gpu-staging",
labels+: { "2i2c/hub-name": "staging" },
taints+: {
"nvidia.com/gpu": "NoSchedule"
Expand All @@ -177,7 +177,7 @@ local notebookNodes = [
{
instanceType: "g4dn.16xlarge",
minSize: 0,
namePrefix: "nb-prod",
namePrefix: "gpu-prod",
labels+: { "2i2c/hub-name": "prod" },
taints+: {
"nvidia.com/gpu": "NoSchedule"
Expand Down
4 changes: 2 additions & 2 deletions eksctl/kitware.jsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ local notebookNodes = [
},
{
instanceType: "g4dn.xlarge",
namePrefix: "nb-staging",
namePrefix: "gpu-staging",
labels+: { "2i2c/hub-name": "staging" },
tags+: {
"2i2c:hub-name": "staging",
Expand All @@ -78,7 +78,7 @@ local notebookNodes = [
},
{
instanceType: "g4dn.xlarge",
namePrefix: "nb-prod",
namePrefix: "gpu-prod",
labels+: { "2i2c/hub-name": "prod" },
tags+: {
"2i2c:hub-name": "prod",
Expand Down
4 changes: 2 additions & 2 deletions eksctl/nasa-cryo.jsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ local notebookNodes = [
},
{
instanceType: "g4dn.xlarge",
namePrefix: "staging",
namePrefix: "gpu-staging",
labels+: { "2i2c/hub-name": "staging" },
tags+: {
"2i2c:hub-name": "staging",
Expand All @@ -78,7 +78,7 @@ local notebookNodes = [
},
{
instanceType: "g4dn.xlarge",
namePrefix: "prod",
namePrefix: "gpu-prod",
labels+: { "2i2c/hub-name": "prod" },
tags+: {
"2i2c:hub-name": "prod",
Expand Down
4 changes: 2 additions & 2 deletions eksctl/smithsonian.jsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ local notebookNodes = [
},
{
instanceType: "g4dn.xlarge",
namePrefix: "nb-staging",
namePrefix: "gpu-staging",
labels+: { "2i2c/hub-name": "staging" },
tags+: {
"2i2c:hub-name": "staging",
Expand All @@ -78,7 +78,7 @@ local notebookNodes = [
},
{
instanceType: "g4dn.xlarge",
namePrefix: "nb-prod",
namePrefix: "gpu-prod",
labels+: { "2i2c/hub-name": "prod" },
tags+: {
"2i2c:hub-name": "prod",
Expand Down

0 comments on commit 5079562

Please sign in to comment.