Warm up runpod workers #90

triplecookedchips · 2025-01-01T09:58:53Z

Problem: When a new worker gets assigned the first run always takes significantly longer due to models needing to be loaded into the GPU memory (around 45 secs in my case). Subsequent runs are much faster as the models have been cached (15 secs in my case). Runpod is pretty good at caching but workers do come and go and occasionally I'll have to wait a long time for my image if I'm dealing with a new worker.

It would be great if every time a worker gets assigned, they automatically run a warm up workflow that caches the models. That way, all API calls would be rapid.

I just can't figure out how to trigger the workflow once the docker image has been pulled.

franckdsf · 2025-01-06T13:05:37Z

I'm not an expert, but one potential workaround could be to run a Python script that preloads the model into RAM before starting ComfyUI.

Note that if you're using RunPod serverless, what you refer to as a "new worker" is essentially a worker with the Dockerfile pre-loaded on it. However, nothing is actually loaded onto the machine except for the Dockerfile. The "new worker" isn't online yet. It's still offline. As a result, preloading anything other than the Dockerfile isn't possible. You might want to set an "active worker" that will always run and have the model loaded on it.

alan0xd7 · 2025-01-06T14:41:27Z

Hey, I'm interested in doing some warmup exercise as well!

I tried adding vmtouch in the startup script, but like franckdsf said above, nothing actually gets run before a worker receives a request, so it doesn't really help too much.

Maybe some kind of external pings to the endpoint? But there's no guarantee it'll hit a "cold" worker 🤔

Also just wanted to ask, how are you guys getting the models into the container? I tried adding it in the Dockerfile, but the build just takes extremely long (1 or 2 plus hours!), so I ended up using a network volume but that restricts my workers to one DC and limits availability...

triplecookedchips · 2025-01-06T18:02:19Z

I have resorted to using active workers as I couldn't figure out a solution. As I'm only using SD1.5 I can get away with a low VRAM GPU, plus it's 30% discount for active workers.

yeah I bake the models into the dockerfile - takes a while to build, but then I believe it's faster to run over a network volume.

maxpaynestory · 2025-03-29T05:34:45Z

ComfyUI on Runpod Serverless is a pretty dumb idea. ComfyUI takes several minutes to boot up which makes it not suitable for any serverless architecture.

maxpaynestory · 2025-03-29T05:58:06Z

I have resorted to using active workers as I couldn't figure out a solution. As I'm only using SD1.5 I can get away with a low VRAM GPU, plus it's 30% discount for active workers.

yeah I bake the models into the dockerfile - takes a while to build, but then I believe it's faster to run over a network volume.

What is the point of using serverless if your workers are always active?

Runpod Serverless pricing is expensive than their normal on-demand spot on instances. Raising price and then giving a 30% discount. 👎

BenDes21 · 2025-03-31T07:29:14Z

Hey, I'm interested in doing some warmup exercise as well!

I tried adding vmtouch in the startup script, but like franckdsf said above, nothing actually gets run before a worker receives a request, so it doesn't really help too much.

Maybe some kind of external pings to the endpoint? But there's no guarantee it'll hit a "cold" worker 🤔

Also just wanted to ask, how are you guys getting the models into the container? I tried adding it in the Dockerfile, but the build just takes extremely long (1 or 2 plus hours!), so I ended up using a network volume but that restricts my workers to one DC and limits availability...

I'm not an expert, but one potential workaround could be to run a Python script that preloads the model into RAM before starting ComfyUI.

Note that if you're using RunPod serverless, what you refer to as a "new worker" is essentially a worker with the Dockerfile pre-loaded on it. However, nothing is actually loaded onto the machine except for the Dockerfile. The "new worker" isn't online yet. It's still offline. As a result, preloading anything other than the Dockerfile isn't possible. You might want to set an "active worker" that will always run and have the model loaded on it.

Problem: When a new worker gets assigned the first run always takes significantly longer due to models needing to be loaded into the GPU memory (around 45 secs in my case). Subsequent runs are much faster as the models have been cached (15 secs in my case). Runpod is pretty good at caching but workers do come and go and occasionally I'll have to wait a long time for my image if I'm dealing with a new worker.

It would be great if every time a worker gets assigned, they automatically run a warm up workflow that caches the models. That way, all API calls would be rapid.

I just can't figure out how to trigger the workflow once the docker image has been pulled.

Hi guy's ! Did you find a solution easy to setup ? Need 1-2 requests after X time for warm up my workers which is not suitable for my app

triplecookedchips · 2025-03-31T15:27:56Z

I believe Runpod are working on a solution, hopefully should be out soon

franckdsf · 2025-03-31T15:43:43Z

Your approach to serverless seems a bit incorrect. The principle of serverless is to provide an on-demand solution for requests and request surges. There's no need to scale your GPUs manually—serverless "spins up" machines for you when the number of requests increases.

However! If you don’t have traffic, you can’t "warm up" your worker because serverless workers (and GPUs) are shared among all users. This means that the model you want preloaded on your worker may be replaced (i.e., unloaded) by another user’s workload when no requests are received on your instance for some time. In this case, your GPU goes into throttle mode, meaning it is being used by another user.

The only solutions to this problem are:

Keeping at least one active worker to ensure your GPU stays powered on and your model remains cached.
Spinning up the GPU in advance before a user request, keeping it running, and then canceling the initial request once the actual user request is received. (Obviously, this will cost more than the actual request since the GPU needs to remain active. Additionally, you would need to implement a way to cancel the request within a certain timeout if no user requests are pending, etc. I believe this introduces more problems than it solves.)
Having enough traffic (clients) so that your GPU is almost always in use by your users.

triplecookedchips · 2025-03-31T15:59:01Z

it works for me, but maybe it isn't optimal. I have several active workers as I have constant requests throughout the day - this ensures 90% of my customers receive a fast response. Then, when there's a surge, my idle workers will boot up. Obviously there's a delay while the worker loads the models, but it isn't tooo bad.

However, Runpod say they will deploy a 'priority flashboot' that will preload models etc once the worker is assigned. So any idle assigned workers should run comfyui instantly.

BenDes21 · 2025-04-01T01:19:49Z

it works for me, but maybe it isn't optimal. I have several active workers as I have constant requests throughout the day - this ensures 90% of my customers receive a fast response. Then, when there's a surge, my idle workers will boot up. Obviously there's a delay while the worker loads the models, but it isn't tooo bad.

However, Runpod say they will deploy a 'priority flashboot' that will preload models etc once the worker is assigned. So any idle assigned workers should run comfyui instantly.

good news, hope they will implement this feature soon

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warm up runpod workers #90

Warm up runpod workers #90

triplecookedchips commented Jan 1, 2025

franckdsf commented Jan 6, 2025

alan0xd7 commented Jan 6, 2025

triplecookedchips commented Jan 6, 2025

maxpaynestory commented Mar 29, 2025

maxpaynestory commented Mar 29, 2025 •

edited

Loading

BenDes21 commented Mar 31, 2025

triplecookedchips commented Mar 31, 2025

franckdsf commented Mar 31, 2025 •

edited

Loading

triplecookedchips commented Mar 31, 2025

BenDes21 commented Apr 1, 2025

Warm up runpod workers #90

Warm up runpod workers #90

Comments

triplecookedchips commented Jan 1, 2025

franckdsf commented Jan 6, 2025

alan0xd7 commented Jan 6, 2025

triplecookedchips commented Jan 6, 2025

maxpaynestory commented Mar 29, 2025

maxpaynestory commented Mar 29, 2025 • edited Loading

BenDes21 commented Mar 31, 2025

triplecookedchips commented Mar 31, 2025

franckdsf commented Mar 31, 2025 • edited Loading

triplecookedchips commented Mar 31, 2025

BenDes21 commented Apr 1, 2025

maxpaynestory commented Mar 29, 2025 •

edited

Loading

franckdsf commented Mar 31, 2025 •

edited

Loading