You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@espg asked for instances with even higher memory than n1.16xlarge, and also if possible, to access a high performant storage. I'll explore the ability to arrange both.
Technical steps
Request AWS quota to be able to start a x1.16xlarge and x1.32xlarge server
Approved
Enable a JupyterHub user to start a x1.16xlarge server via JupyterHub
Configure eksctl to create a x1.16xlarge node pool
Configure JupyterHub to make use of them
Verify it works
Learn how to enable the JupyterHub user pod to access the local SSD storage and make it happen
Install and configure sig-storage-local-static-provisioner
Mount the storage to the user pod as /tmp
Verify it works
Discussion
Mount location of high performance local disk SSD @espg would you agree that it would make sense to mount the nodes high performance local storage SSD disks to /tmp - a folder you can always count on existing, and a folder that makes it clear you can't count on what you create there to remain at a later time?
A need for an even bigger node
x1.16xlarge provides 64 CPU + 976 GB of memory compared to the m5.16xlarge nodes with the 64 CPU + 256 GB of memory. Are we okay with x1.16xlarge or should I setup x1.32xlarge as well directly with twice of that?
The text was updated successfully, but these errors were encountered:
@consideRatio I'm fine with a block storage path-- something like /dev/nvmep1, or whatever the block device displays 'natively' and needs the least amount of configuration. Using /tmp may not be as good a solution... other processes write to /tmp, and I don't mind coding the extra path if it keeps it clean and only has files that are explicitly put there.
btw, what is our base image type? You mention n1.16xlarge, but I don't see that as a type-- are we using m5n/m5dn or m-something, or other? Does our current base unit has another memory tier?
We are running the node x1.16xlarge when choosing a 64 CPU high memory node, and m5.16xlarge when choosing a normal 64 CPU node, and then we have either m5 or m5a nodes as worker nodes with 4, 16, or 64 CPU.
@consideRatio@fperez pinging this issue after our ray discussion last week-- this is related to #92 which has the error message for the block storage. While block storage isn't technically required to run ray, it is needed if we want to do anything with shared memory via redis.
@espg asked for instances with even higher memory than n1.16xlarge, and also if possible, to access a high performant storage. I'll explore the ability to arrange both.
Technical steps
Discussion
@espg would you agree that it would make sense to mount the nodes high performance local storage SSD disks to /tmp - a folder you can always count on existing, and a folder that makes it clear you can't count on what you create there to remain at a later time?
x1.16xlarge provides 64 CPU + 976 GB of memory compared to the m5.16xlarge nodes with the 64 CPU + 256 GB of memory. Are we okay with x1.16xlarge or should I setup x1.32xlarge as well directly with twice of that?
The text was updated successfully, but these errors were encountered: