You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Builds two binaries: `ace-qwen3` (LLM) and `dit-vae` (DiT + VAE).
33
33
34
-
**CI (GitHub Actions)**
35
-
-**Build**: on every push/PR, builds on Ubuntu (BLAS) and macOS (Metal); smoke test runs each binary `--help`.
36
-
-**Test generation**: on release or manual trigger only; runs the same checks as **local**`tests/run-generation-tests.sh`. Validate locally first (build + `./models.sh`, then `tests/run-generation-tests.sh`), then use CI to confirm. See `.github/workflows/`.
37
-
38
34
## Models
39
35
40
36
Pre-quantized GGUFs on [Hugging Face](https://huggingface.co/Serveurperso/ACE-Step-1.5-GGUF).
@@ -143,16 +139,10 @@ cd examples
143
139
./partial.sh # caption + lyrics + duration
144
140
./full.sh # all metadata provided
145
141
./dit-only.sh # skip LLM, DiT from noise
146
-
./cover.sh # cover mode: decode precomputed audio_codes (no LLM)
147
-
./cover-reference.sh # cover + reference_audio for timbre (WAV/MP3; needs reference.wav or .mp3)
148
-
./test-reference.sh # reference_audio (WAV or MP3) + audio_cover_strength
149
-
./lora.sh # DiT + LoRA adapter
150
142
```
151
143
152
144
Each example has a `-sft` variant (SFT model, 50 steps, CFG 7.0)
153
-
alongside the turbo default (8 steps, no CFG). For **reference timbre**, set `reference_audio` to a **WAV or MP3** path; dit-vae loads it (MP3 decoded in memory via header-only minimp3, no temp files), encodes with the VAE encoder (requires a full VAE GGUF that includes encoder weights).
154
-
155
-
**LoRA adapters**: use `--lora <path>` and optional `--lora-scale <float>` with dit-vae to run the DiT with PEFT-style Ace-Step LoRAs.
145
+
alongside the turbo default (8 steps, no CFG).
156
146
157
147
## Generation modes
158
148
@@ -180,11 +170,10 @@ Run `dit-vae` to decode existing codes. See `examples/dit-only.json`.
180
170
181
171
## Request JSON reference
182
172
183
-
All fields with defaults. Only `caption` is required. Built-in modes (text2music, cover, repaint) and audio inputs follow the [ACE-Step 1.5 Tutorial](https://github.com/ace-step/ACE-Step-1.5/blob/main/docs/en/Tutorial.md); see [docs/MODES.md](docs/MODES.md) for what is implemented.
173
+
All fields with defaults. Only `caption` is required.
184
174
185
175
```json
186
176
{
187
-
"task_type": "text2music",
188
177
"caption": "",
189
178
"lyrics": "",
190
179
"instrumental": false,
@@ -199,12 +188,7 @@ All fields with defaults. Only `caption` is required. Built-in modes (text2music
199
188
"lm_top_p": 0.9,
200
189
"lm_top_k": 0,
201
190
"lm_negative_prompt": "",
202
-
"reference_audio": "",
203
-
"src_audio": "",
204
191
"audio_codes": "",
205
-
"audio_cover_strength": 1.0,
206
-
"repainting_start": 0.0,
207
-
"repainting_end": 0.0,
208
192
"inference_steps": 8,
209
193
"guidance_scale": 7.0,
210
194
"shift": 3.0
@@ -214,12 +198,7 @@ All fields with defaults. Only `caption` is required. Built-in modes (text2music
214
198
Key fields: `seed` -1 means random (resolved once, then +1 per batch
215
199
element). `audio_codes` is generated by ace-qwen3 and consumed by
216
200
dit-vae (comma separated FSQ token IDs). When present, the LLM is
217
-
skipped entirely (cover-style generation). `reference_audio`: path to a **WAV or MP3** file for global timbre/style (MP3 decoded in memory; encoded via built-in VAE encoder; requires VAE GGUF with encoder weights). `src_audio`: path to a **WAV or MP3** for cover source; dit-vae encodes it (VAE + FSQ nearest-codeword) to codes internally, no Python required (see docs/MODES.md).
218
-
219
-
**Reference and cover strength (not the same as guidance_scale):**
220
-
-**`audio_cover_strength`** (0.0–1.0): Controls how strongly the **cover/source** (from `audio_codes` or `src_audio`) influences the DiT context. The context is blended with silence: `(1 - audio_cover_strength)*silence + audio_cover_strength*decoded`. Use 1.0 for full cover influence, lower values to soften it. Only applies when cover context is present.
221
-
-**`reference_audio`**: Timbre from the reference file is applied at full strength; there is no separate strength parameter for reference timbre.
222
-
-**`guidance_scale`**: This is **DiT classifier-free guidance** (conditioned vs unconditioned prediction), not reference or cover strength. Turbo models ignore it (forced to 1.0).
201
+
skipped entirely.
223
202
224
203
Turbo preset: `inference_steps=8, shift=3.0` (no guidance_scale, turbo models don't use CFG).
0 commit comments