Ray Executor #488

alxmrs · 2024-06-24T21:22:10Z

In addition to accelerator support (e.g. via #304), Cubed could benefit ML users by providing ray executor: https://docs.ray.io/en/latest/ray-core/walkthrough.html

Since Cubed is a serverless model, I bet it could get away with only using Tasks/remote functions.

From talking with @cromwellian a bit, my hope is that Cubed could provide memory bounds when trying to saturate GPUs during model training. I'm not totally sure exactly what a training loop with Cubed would look like. Here's how ray integrates with PyTorch, for example: https://docs.ray.io/en/latest/train/api/doc/ray.train.torch.TorchTrainer.html#ray.train.torch.TorchTrainer

@shoyer pointed out to me once the idea that GPU OOM errors occur while taking the gradient of a function graph, not necessarily on the forward pass. I'm not totally sure right now if Cubed is in fact a good fit for tackling this problem, only that the potential is exciting.

tomwhite · 2024-06-25T10:22:24Z

Thanks for opening this issue @alxmrs! I think Ray would be a great runtime for Cubed, and should be relatively straightforward to write an executor for (maybe a bit like the Modal one?). Do you know what people generally run Ray on in production/at scale?

alxmrs · 2024-06-25T10:42:14Z

Hey Tom! Do you mean what does the userbase look like, or do I know specific people? On the former: Ray is the engine that OpenAI uses to train its GPT models; it's really popular in the ML world. On the latter: Ray, the person (cromwellian), uses Ray, the framework, at Roblox for model training. :)

should be relatively straightforward to write an executor for (maybe a bit like the Modal one?).

I agree, and it does look like it will be similar to Modal.

tomwhite · 2024-06-25T10:51:32Z

I meant usage of Anyscale vs KubeRay vs ?? I was wondering if there was a choice that most people use, or whether it's a bit of everything.

Ray, the person (cromwellian), uses Ray, the framework, at Roblox for model training. :)

Got it!

tomwhite · 2024-07-18T10:33:36Z

I added some notes on how to write a new executor in #498.

rbavery · 2025-01-25T07:31:56Z

I meant usage of Anyscale vs KubeRay vs ?? I was wondering if there was a choice that most people use, or whether it's a bit of everything.

From attending Ray Summit last year, I think both Anyscale and KubeRay Operator (open source, self managed Ray) are popular but if I had to guess based on conversations and talks I attended, I think there are more users of self managed Kuberay.

tomwhite added the runtime label Jun 25, 2024

tomwhite linked a pull request Feb 3, 2025 that will close this issue

Add Ray executor #687

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ray Executor #488

Ray Executor #488

alxmrs commented Jun 24, 2024

tomwhite commented Jun 25, 2024

alxmrs commented Jun 25, 2024

tomwhite commented Jun 25, 2024

tomwhite commented Jul 18, 2024

rbavery commented Jan 25, 2025

Ray Executor #488

Ray Executor #488

Comments

alxmrs commented Jun 24, 2024

tomwhite commented Jun 25, 2024

alxmrs commented Jun 25, 2024

tomwhite commented Jun 25, 2024

tomwhite commented Jul 18, 2024

rbavery commented Jan 25, 2025