Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide high memory server and expose its local SSD storage #88

Open
6 of 11 tasks
consideRatio opened this issue Oct 14, 2021 · 4 comments
Open
6 of 11 tasks

Provide high memory server and expose its local SSD storage #88

consideRatio opened this issue Oct 14, 2021 · 4 comments
Assignees
Labels
🏷️ JupyterHub Something related to JupyterHub

Comments

@consideRatio
Copy link
Member

consideRatio commented Oct 14, 2021

@espg asked for instances with even higher memory than n1.16xlarge, and also if possible, to access a high performant storage. I'll explore the ability to arrange both.

Technical steps

  • Request AWS quota to be able to start a x1.16xlarge and x1.32xlarge server
    • Approved
  • Enable a JupyterHub user to start a x1.16xlarge server via JupyterHub
    • Configure eksctl to create a x1.16xlarge node pool
    • Configure JupyterHub to make use of them
    • Verify it works
  • Learn how to enable the JupyterHub user pod to access the local SSD storage and make it happen

Discussion

  • Mount location of high performance local disk SSD
    @espg would you agree that it would make sense to mount the nodes high performance local storage SSD disks to /tmp - a folder you can always count on existing, and a folder that makes it clear you can't count on what you create there to remain at a later time?
  • A need for an even bigger node
    x1.16xlarge provides 64 CPU + 976 GB of memory compared to the m5.16xlarge nodes with the 64 CPU + 256 GB of memory. Are we okay with x1.16xlarge or should I setup x1.32xlarge as well directly with twice of that?
@consideRatio consideRatio added the 🏷️ JupyterHub Something related to JupyterHub label Oct 14, 2021
@consideRatio consideRatio self-assigned this Oct 14, 2021
@espg
Copy link
Contributor

espg commented Oct 15, 2021

@consideRatio I'm fine with a block storage path-- something like /dev/nvmep1, or whatever the block device displays 'natively' and needs the least amount of configuration. Using /tmp may not be as good a solution... other processes write to /tmp, and I don't mind coding the extra path if it keeps it clean and only has files that are explicitly put there.

btw, what is our base image type? You mention n1.16xlarge, but I don't see that as a type-- are we using m5n/m5dn or m-something, or other? Does our current base unit has another memory tier?

@consideRatio
Copy link
Member Author

We are running the node x1.16xlarge when choosing a 64 CPU high memory node, and m5.16xlarge when choosing a normal 64 CPU node, and then we have either m5 or m5a nodes as worker nodes with 4, 16, or 64 CPU.

How to test disk performance

Thanks @espg for this command!

$ cd /tmp
$ dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync

1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 8.14749 s, 132 MB/s 

@espg
Copy link
Contributor

espg commented Apr 5, 2022

@consideRatio @fperez pinging this issue after our ray discussion last week-- this is related to #92 which has the error message for the block storage. While block storage isn't technically required to run ray, it is needed if we want to do anything with shared memory via redis.

@espg
Copy link
Contributor

espg commented Apr 5, 2022

...apparently another option for this that doesn't involve setting up block storage is spinning up a separate redis instance and connecting to it-- https://www.anyscale.com/blog/redis-in-ray-past-and-future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ JupyterHub Something related to JupyterHub
Projects
None yet
Development

No branches or pull requests

2 participants