Skip to content

nano-vLLM initialization fails, falls back to PT backend and crashes during generation (on Google Colab L4) #616

@sean3939

Description

@sean3939

Summary
I believe I’ve found a regression introduced after commit 6db4465.

On NVIDIA L4 (22GB) with Torch 2.10.0+cu128, newer commits fail to initialize the 5Hz LM using nano-vLLM due to CUDA graph capture errors. ACE-Step then falls back to the PyTorch backend, which later crashes during generation with:

RuntimeError: Offset increment outside graph capture encountered unexpectedly.

Pinning to commit 6db4465 (from ~2 days ago) restores correct behavior — vLLM initializes successfully and generation works.

Environment

  • GPU: NVIDIA L4 (22GB VRAM)
  • Runtime: Google Colab
  • CUDA detected by ACE-Step: tier6b
  • Torch: 2.10.0+cu128
  • nano-vLLM installed via project setup
  • LM model: acestep-5Hz-lm-1.7B
  • DiT config: acestep-v15-turbo

More details
During startup of Gradio:

Initializing 5Hz LM with model: ..., 
[nanovllm] KV cache allocated ...
❌ Error initializing 5Hz LM:
CUDA error: operation failed due to a previous error during capture
...
torch.AcceleratorError: CUDA error: operation not permitted when stream is capturing

which then yields

WARNING Falling back to PyTorch backend
5Hz LM initialized successfully using PyTorch backend on cuda

and if i go to generate a song:
RuntimeError: Offset increment outside graph capture encountered unexpectedly.

What I tried

  • Restarted Gradio / service multiple times
  • Disabled torch.compile/dynamo via env vars at launch:
  • TORCHDYNAMO_DISABLE=1
  • TORCH_COMPILE_DISABLE=1
  • TORCHINDUCTOR_DISABLE_CUDAGRAPHS=1
  • Selected vLLM backend in UI
  • Tried to force eager mode (enforce_eager=True) to disable CUDA graphs, but it did not resolve the issue — logs still show enforce_eager: False and nano-vLLM still attempts CUDA graph capture / fails.

To reproduce:

  1. Launch Gradio on Colab with NVIDIA L4
  2. Initialize service (DiT + LM enabled)
  3. Observe vLLM init failure → PT fallback
  4. Generate a song → generation crash

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions