-
Notifications
You must be signed in to change notification settings - Fork 58
Open
Description
On an RTX 5090, it takes a full 13 minutes to generate a 25-second audio clip with Step-Audio-EditX TTS.
Is this normal? It feels insanely slow compared to what I expected from a 5090.
For reference: 25 seconds of audio → 13 minutes of generation time (RTF ≈ 31x).
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels