Skip to content

Commit ff2e739

Browse files
zhzhongshibeveradb
andauthored
Add support for MDXC models (#50)
* Add support for MDXC models * Updated poetry lockfile to match dependencies * fix err: CLI does not work * Fixed MDXC config YAML download, formatted mdxc separator class, bumped version ready for release * Added progress bar for file downloads * Added error handling for failed model load due to incomplete/corrupt download * Fixed outstanding issues with YAML config loading and file download, added todo list for integration tests to write * Moved load model into own method for consistency with mdxc class * Refactored MDXC class to use more descriptive variable names, removed dead code, added debug logging and clearer parameters etc. * Fixed and tested pitch shift logic for MDXC, added CLI params for other MDXC config parameters and tested these * Added MDXC to readme * Added thanks! --------- Co-authored-by: Andrew Beveridge <andrew@beveridge.uk>
1 parent 70ca099 commit ff2e739

File tree

10 files changed

+528
-143
lines changed

10 files changed

+528
-143
lines changed

README.md

Lines changed: 18 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
[![Docker pulls](https://img.shields.io/docker/pulls/beveradb/audio-separator.svg)](https://hub.docker.com/r/beveradb/audio-separator/tags)
66
[![codecov](https://codecov.io/gh/karaokenerds/python-audio-separator/graph/badge.svg?token=N7YK4ET5JP)](https://codecov.io/gh/karaokenerds/python-audio-separator)
77

8-
Summary: Easy to use audio stem separation from the command line or as a dependency in your own Python project, using the amazing MDX-Net and VR Arch models available in UVR by @Anjok07 & @aufr33.
8+
Summary: Easy to use audio stem separation from the command line or as a dependency in your own Python project, using the amazing MDX-Net, VR Arch, Demucs and MDXC models available in UVR by @Anjok07 & @aufr33.
99

1010
Audio Separator is a Python package that allows you to separate an audio file into various stems, using models trained by @Anjok07 for use with UVR (https://github.com/Anjok07/ultimatevocalremovergui).
1111

@@ -136,8 +136,9 @@ Any file listed in the list models output can be specified (with file extension)
136136
usage: audio-separator [-h] [-v] [-d] [-e] [-l] [--log_level LOG_LEVEL] [-m MODEL_FILENAME] [--output_format OUTPUT_FORMAT] [--output_dir OUTPUT_DIR] [--model_file_dir MODEL_FILE_DIR] [--invert_spect]
137137
[--normalization NORMALIZATION] [--single_stem SINGLE_STEM] [--sample_rate SAMPLE_RATE] [--mdx_segment_size MDX_SEGMENT_SIZE] [--mdx_overlap MDX_OVERLAP] [--mdx_batch_size MDX_BATCH_SIZE]
138138
[--mdx_hop_length MDX_HOP_LENGTH] [--mdx_enable_denoise] [--vr_batch_size VR_BATCH_SIZE] [--vr_window_size VR_WINDOW_SIZE] [--vr_aggression VR_AGGRESSION] [--vr_enable_tta]
139-
[--vr_high_end_process] [--vr_enable_post_process] [--vr_post_process_threshold VR_POST_PROCESS_THRESHOLD] [--demucs_stem DEMUCS_STEM] [--demucs_segment_size DEMUCS_SEGMENT_SIZE]
140-
[--demucs_shifts DEMUCS_SHIFTS] [--demucs_overlap DEMUCS_OVERLAP] [--demucs_segments_enabled DEMUCS_SEGMENTS_ENABLED]
139+
[--vr_high_end_process] [--vr_enable_post_process] [--vr_post_process_threshold VR_POST_PROCESS_THRESHOLD] [--demucs_segment_size DEMUCS_SEGMENT_SIZE] [--demucs_shifts DEMUCS_SHIFTS]
140+
[--demucs_overlap DEMUCS_OVERLAP] [--demucs_segments_enabled DEMUCS_SEGMENTS_ENABLED] [--mdxc_segment_size MDXC_SEGMENT_SIZE] [--mdxc_use_model_segment_size] [--mdxc_overlap MDXC_OVERLAP]
141+
[--mdxc_batch_size MDXC_BATCH_SIZE] [--mdxc_pitch_shift MDXC_PITCH_SHIFT]
141142
[audio_file]
142143

143144
Separate audio file into different stems.
@@ -149,11 +150,11 @@ options:
149150
-h, --help show this help message and exit
150151

151152
Info and Debugging:
152-
-v, --version show program's version number and exit
153-
-d, --debug enable debug logging, equivalent to --log_level=debug
154-
-e, --env_info print environment information and exit.
155-
-l, --list_models list all supported models and exit.
156-
--log_level LOG_LEVEL log level, e.g. info, debug, warning (default: info)
153+
-v, --version Show the program's version number and exit.
154+
-d, --debug Enable debug logging, equivalent to --log_level=debug.
155+
-e, --env_info Print environment information and exit.
156+
-l, --list_models List all supported models and exit.
157+
--log_level LOG_LEVEL Log level, e.g. info, debug, warning (default: info).
157158
158159
Separation I/O Params:
159160
-m MODEL_FILENAME, --model_filename MODEL_FILENAME model to use for separation (default: UVR-MDX-NET-Inst_HQ_3.onnx). Example: -m 2_HP-UVR.pth
@@ -164,7 +165,7 @@ Separation I/O Params:
164165
Common Separation Parameters:
165166
--invert_spect invert secondary stem using spectogram (default: False). Example: --invert_spect
166167
--normalization NORMALIZATION max peak amplitude to normalize input and output audio to (default: 0.9). Example: --normalization=0.7
167-
--single_stem SINGLE_STEM output only single stem, either instrumental or vocals. Example: --single_stem=instrumental
168+
--single_stem SINGLE_STEM output only single stem, e.g. Instrumental, Vocals, Drums, Bass, Guitar, Piano, Other. Example: --single_stem=Instrumental
168169
--sample_rate SAMPLE_RATE modify the sample rate of the output audio (default: 44100). Example: --sample_rate=44100
169170
170171
MDX Architecture Parameters:
@@ -184,11 +185,17 @@ VR Architecture Parameters:
184185
--vr_post_process_threshold VR_POST_PROCESS_THRESHOLD threshold for post_process feature: 0.1-0.3 (default: 0.2). Example: --vr_post_process_threshold=0.1
185186

186187
Demucs Architecture Parameters:
187-
--demucs_stem DEMUCS_STEM stem to extract from audio file, e.g. Vocals, Drums, Bass, Other (default: All Stems). Example: --demucs_stem=vocals
188188
--demucs_segment_size DEMUCS_SEGMENT_SIZE size of segments into which the audio is split, 1-100. higher = slower but better quality (default: Default). Example: --demucs_segment_size=256
189189
--demucs_shifts DEMUCS_SHIFTS number of predictions with random shifts, higher = slower but better quality (default: 2). Example: --demucs_shifts=4
190190
--demucs_overlap DEMUCS_OVERLAP overlap between prediction windows, 0.001-0.999. higher = slower but better quality (default: 0.25). Example: --demucs_overlap=0.25
191191
--demucs_segments_enabled DEMUCS_SEGMENTS_ENABLED enable segment-wise processing (default: True). Example: --demucs_segments_enabled=False
192+
193+
MDXC Architecture Parameters:
194+
--mdxc_segment_size MDXC_SEGMENT_SIZE larger consumes more resources, but may give better results (default: 256). Example: --mdxc_segment_size=256
195+
--mdxc_use_model_segment_size use model default segment size instead of the value from the config file. Example: --mdxc_use_model_segment_size
196+
--mdxc_overlap MDXC_OVERLAP amount of overlap between prediction windows, 2-50. higher is better but slower (default: 8). Example: --mdxc_overlap=8
197+
--mdxc_batch_size MDXC_BATCH_SIZE larger consumes more RAM but may process slightly faster (default: 1). Example: --mdxc_batch_size=4
198+
--mdxc_pitch_shift MDXC_PITCH_SHIFT shift audio pitch by a number of semitones while processing. may improve output for deep/high vocals. (default: 0). Example: --mdxc_pitch_shift=2
192199
```
193200
194201
### As a Dependency in a Python Project
@@ -348,6 +355,7 @@ This project is licensed under the MIT [License](LICENSE).
348355
- [Kuielab & Woosung Choi](https://github.com/kuielab) - Developed the original MDX-Net AI code.
349356
- [KimberleyJSN](https://github.com/KimberleyJensen) - Advised and aided the implementation of the training scripts for MDX-Net and Demucs. Thank you!
350357
- [Hv](https://github.com/NaJeongMo/Colab-for-MDX_B) - Helped implement chunks into the MDX-Net AI code. Thank you!
358+
- [zhzhongshi](https://github.com/zhzhongshi) - Helped add support for the MDXC models in `audio-separator`. Thank you!
351359
352360
## Contact 💌
353361
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
from .mdx_separator import MDXSeparator
22
from .vr_separator import VRSeparator
33
from .demucs_separator import DemucsSeparator
4+
from .mdxc_separator import MDXCSeparator

audio_separator/separator/architectures/mdx_separator.py

Lines changed: 21 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -90,8 +90,28 @@ def __init__(self, common_config, arch_config):
9090
# We haven't implemented support for the checkpoint models here, so we're not using it.
9191
# self.dim_c = 4
9292

93-
# Loading the model for inference
93+
self.load_model()
94+
95+
self.n_bins = 0
96+
self.trim = 0
97+
self.chunk_size = 0
98+
self.gen_size = 0
99+
self.stft = None
100+
101+
self.primary_source = None
102+
self.secondary_source = None
103+
self.audio_file_path = None
104+
self.audio_file_base = None
105+
self.secondary_source_map = None
106+
self.primary_source_map = None
107+
108+
def load_model(self):
109+
"""
110+
Load the model into memory from file on disk, initialize it with config from the model data,
111+
and prepare for inferencing using hardware accelerated Torch device.
112+
"""
94113
self.logger.debug("Loading ONNX model for inference...")
114+
95115
if self.segment_size == self.dim_t:
96116
ort_session_options = ort.SessionOptions()
97117
if self.log_level > 10:
@@ -107,19 +127,6 @@ def __init__(self, common_config, arch_config):
107127
self.model_run.to(self.torch_device).eval()
108128
self.logger.warning("Model converted from onnx to pytorch due to segment size not matching dim_t, processing may be slower.")
109129

110-
self.n_bins = 0
111-
self.trim = 0
112-
self.chunk_size = 0
113-
self.gen_size = 0
114-
self.stft = None
115-
116-
self.primary_source = None
117-
self.secondary_source = None
118-
self.audio_file_path = None
119-
self.audio_file_base = None
120-
self.secondary_source_map = None
121-
self.primary_source_map = None
122-
123130
def separate(self, audio_file_path):
124131
"""
125132
Separates the audio file into primary and secondary sources based on the model's configuration.

0 commit comments

Comments
 (0)