-
Notifications
You must be signed in to change notification settings - Fork 650
Open
Description
Summary
I believe I’ve found a regression introduced after commit 6db4465.
On NVIDIA L4 (22GB) with Torch 2.10.0+cu128, newer commits fail to initialize the 5Hz LM using nano-vLLM due to CUDA graph capture errors. ACE-Step then falls back to the PyTorch backend, which later crashes during generation with:
RuntimeError: Offset increment outside graph capture encountered unexpectedly.
Pinning to commit 6db4465 (from ~2 days ago) restores correct behavior — vLLM initializes successfully and generation works.
Environment
- GPU: NVIDIA L4 (22GB VRAM)
- Runtime: Google Colab
- CUDA detected by ACE-Step: tier6b
- Torch: 2.10.0+cu128
- nano-vLLM installed via project setup
- LM model: acestep-5Hz-lm-1.7B
- DiT config: acestep-v15-turbo
More details
During startup of Gradio:
Initializing 5Hz LM with model: ...,
[nanovllm] KV cache allocated ...
❌ Error initializing 5Hz LM:
CUDA error: operation failed due to a previous error during capture
...
torch.AcceleratorError: CUDA error: operation not permitted when stream is capturing
which then yields
WARNING Falling back to PyTorch backend
5Hz LM initialized successfully using PyTorch backend on cuda
and if i go to generate a song:
RuntimeError: Offset increment outside graph capture encountered unexpectedly.
What I tried
- Restarted Gradio / service multiple times
- Disabled torch.compile/dynamo via env vars at launch:
- TORCHDYNAMO_DISABLE=1
- TORCH_COMPILE_DISABLE=1
- TORCHINDUCTOR_DISABLE_CUDAGRAPHS=1
- Selected vLLM backend in UI
- Tried to force eager mode (enforce_eager=True) to disable CUDA graphs, but it did not resolve the issue — logs still show enforce_eager: False and nano-vLLM still attempts CUDA graph capture / fails.
To reproduce:
- Launch Gradio on Colab with NVIDIA L4
- Initialize service (DiT + LM enabled)
- Observe vLLM init failure → PT fallback
- Generate a song → generation crash
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels