From e164c6ab6a6fe8c51ae4d46a5dfa526dcad20e70 Mon Sep 17 00:00:00 2001 From: William Dye Date: Fri, 27 Oct 2023 11:35:54 -0400 Subject: [PATCH 1/3] Minor documentation updates --- README.md | 22 ++++++++++++---------- docs/api.md | 12 ++++-------- docs/mac.md | 12 ++++++------ docs/windows.md | 2 +- 4 files changed, 23 insertions(+), 25 deletions(-) diff --git a/README.md b/README.md index a93e294d..a1207212 100644 --- a/README.md +++ b/README.md @@ -7,7 +7,7 @@ This is the 4th release of Demucs (v4), featuring Hybrid Transformer based source separation. **For the classic Hybrid Demucs (v3):** [Go this commit][demucs_v3]. -If you are experiencing issues and want the old Demucs back, please fill an issue, and then you can get back to the v3 with +If you are experiencing issues and want the old Demucs back, please file an issue, and then you can get back to Demucs v3 with `git checkout v3`. You can also go [Demucs v2][demucs_v2]. @@ -15,7 +15,7 @@ Demucs is a state-of-the-art music source separation model, currently capable of drums, bass, and vocals from the rest of the accompaniment. Demucs is based on a U-Net convolutional architecture inspired by [Wave-U-Net][waveunet]. The v4 version features [Hybrid Transformer Demucs][htdemucs], a hybrid spectrogram/waveform separation model using Transformers. -It is based on [Hybrid Demucs][hybrid_paper] (also provided in this repo) with the innermost layers are +It is based on [Hybrid Demucs][hybrid_paper] (also provided in this repo), with the innermost layers replaced by a cross-domain Transformer Encoder. This Transformer uses self-attention within each domain, and cross-attention across domains. The model achieves a SDR of 9.00 dB on the MUSDB HQ test set. Moreover, when using sparse attention @@ -123,7 +123,7 @@ python3 -m pip install -U git+https://github.com/facebookresearch/demucs#egg=dem Advanced OS support are provided on the following page, **you must read the page for your OS before posting an issues**: - **If you are using Windows:** [Windows support](docs/windows.md). -- **If you are using MAC OS X:** [Mac OS X support](docs/mac.md). +- **If you are using macOS:** [macOS support](docs/mac.md). - **If you are using Linux:** [Linux support](docs/linux.md). ### For machine learning scientists @@ -194,16 +194,18 @@ demucs --two-stems=vocals myfile.mp3 ``` -If you have a GPU, but you run out of memory, please use `--segment SEGMENT` to reduce length of each split. `SEGMENT` should be changed to a integer. Personally recommend not less than 10 (the bigger the number is, the more memory is required, but quality may increase). Create an environment variable `PYTORCH_NO_CUDA_MEMORY_CACHING=1` is also helpful. If this still cannot help, please add `-d cpu` to the command line. See the section hereafter for more details on the memory requirements for GPU acceleration. +If you have a GPU, but you run out of memory, please use `--segment SEGMENT` to reduce length of each split. `SEGMENT` should be changed to a integer describing the length of each segment in seconds. +A segment length of at least 10 is recommended (the bigger the number is, the more memory is required, but quality may increase). Note that the Hybrid Transformer models only support a maximum segment length of 7.8 seconds. +Create an environment variable `PYTORCH_NO_CUDA_MEMORY_CACHING=1` is also helpful. If this still does not help, please add `-d cpu` to the command line. See the section hereafter for more details on the memory requirements for GPU acceleration. Separated tracks are stored in the `separated/MODEL_NAME/TRACK_NAME` folder. There you will find four stereo wav files sampled at 44.1 kHz: `drums.wav`, `bass.wav`, `other.wav`, `vocals.wav` (or `.mp3` if you used the `--mp3` option). -All audio formats supported by `torchaudio` can be processed (i.e. wav, mp3, flac, ogg/vorbis on Linux/Mac OS X etc.). On Windows, `torchaudio` has limited support, so we rely on `ffmpeg`, which should support pretty much anything. +All audio formats supported by `torchaudio` can be processed (i.e. wav, mp3, flac, ogg/vorbis on Linux/macOS, etc.). On Windows, `torchaudio` has limited support, so we rely on `ffmpeg`, which should support pretty much anything. Audio is resampled on the fly if necessary. -The output will be a wave file encoded as int16. +The output will be a wav file encoded as int16. You can save as float32 wav files with `--float32`, or 24 bits integer wav with `--int24`. -You can pass `--mp3` to save as mp3 instead, and set the bitrate with `--mp3-bitrate` (default is 320kbps). +You can pass `--mp3` to save as mp3 instead, and set the bitrate (in kbps) with `--mp3-bitrate` (default is 320). It can happen that the output would need clipping, in particular due to some separation artifacts. Demucs will automatically rescale each output stem so as to avoid clipping. This can however break @@ -226,8 +228,8 @@ The list of pre-trained models is: but quality can be slightly worse. - `SIG`: where `SIG` is a single model from the [model zoo](docs/training.md#model-zoo). -The `--two-stems=vocals` option allows to separate vocals from the rest (e.g. karaoke mode). -`vocals` can be changed into any source in the selected model. +The `--two-stems=vocals` option allows separating vocals from the rest of the accompaniment (i.e., karaoke mode). +`vocals` can be changed to any source in the selected model. This will mix the files after separating the mix fully, so this won't be faster or use less memory. The `--shifts=SHIFTS` performs multiple predictions with random shifts (a.k.a the *shift trick*) of the input and average them. This makes prediction `SHIFTS` times @@ -248,7 +250,7 @@ If you do not have enough memory on your GPU, simply add `-d cpu` to the command ## Calling from another Python program -The main function provides a `opt` parameter as a simple API. You can just pass the parsed command line as this parameter: +The main function provides an `opt` parameter as a simple API. You can just pass the parsed command line as this parameter: ```python # Assume that your command is `demucs --mp3 --two-stems vocals -n mdx_extra "track with space.mp3"` # The following codes are same as the command above: diff --git a/docs/api.md b/docs/api.md index e6d9e873..ba31646d 100644 --- a/docs/api.md +++ b/docs/api.md @@ -5,13 +5,11 @@ Notes: Type hints have been added to all API functions. It is recommended to check them before passing parameters to a function as some arguments only support limited types (e.g. parameter `repo` of method `load_model` only support type `pathlib.Path`). 1. The first step is to import api module: - ```python import demucs.api ``` -2. Then initialize the `Separator`. Parameters which will be served as default values for methods can be passed. Model should be specified. - +1. Then initialize the `Separator`. Parameters which will be served as default values for methods can be passed. Model should be specified. ```python # Initialize with default parameters: separator = demucs.api.Separator() @@ -22,8 +20,7 @@ separator = demucs.api.Separator(model="mdx_extra", segment=12) # You can also use other parameters defined ``` -3. Separate it! - +1. Separate it! ```python # Separating an audio file origin, separated = separator.separate_audio_file("file.mp3") @@ -35,8 +32,7 @@ origin, separated = separator.separate_tensor(audio) separator.update_parameter(segment=smaller_segment) ``` -4. Save audio - +1. Save audio ```python # Remember to create the destination folder before calling `save_audio` # Or you are likely to recieve `FileNotFoundError` @@ -47,7 +43,7 @@ for file, sources in separated: ## API References -The types of each parameter and return value is not listed in this document. To know the exact type of them, please read the type hints in api.py (most modern code editors support infering types based on type hints). +The types of each parameter and return value is not listed in this document. To know the exact type of them, please read the type hints in api.py (most modern code editors support inferring types based on type hints). ### `class Separator` diff --git a/docs/mac.md b/docs/mac.md index 6e6c3d0c..62dd235e 100644 --- a/docs/mac.md +++ b/docs/mac.md @@ -1,6 +1,6 @@ -# Mac OS X support for Demucs +# macOS support for Demucs -If you have a sufficiently recent version of OS X, you can just run +If you have a sufficiently recent version of macOS, you can just run ```bash python3 -m pip install --user -U demucs @@ -10,10 +10,10 @@ python3 -m demucs -d cpu PATH_TO_AUDIO_FILE_1 demucs -d cpu PATH_TO_AUDIO_FILE_1 ``` -If you do not already have Anaconda installed or much experience with the terminal on Mac OS X here are some detailed instructions: +If you do not already have Anaconda installed or much experience with the terminal on macOS, here are some detailed instructions: -1. Download [Anaconda 3.8 (or more recent) 64 bits for MacOS][anaconda]: -2. Open [Anaconda Prompt in MacOSX][prompt] +1. Download [Anaconda 3.8 (or more recent) 64-bit for macOS][anaconda]: +2. Open [Anaconda Prompt in macOS][prompt] 3. Follow these commands: ```bash conda activate @@ -24,5 +24,5 @@ demucs -d cpu PATH_TO_AUDIO_FILE_1 **Important, torchaudio 0.12 update:** Torchaudio no longer supports decoding mp3s without ffmpeg installed. You must have ffmpeg installed, either through Anaconda (`conda install ffmpeg -c conda-forge`) or with Homebrew for instance (`brew install ffmpeg`). -[anaconda]: https://www.anaconda.com/distribution/#download-section +[anaconda]: https://www.anaconda.com/download [prompt]: https://docs.anaconda.com/anaconda/user-guide/getting-started/#open-nav-mac diff --git a/docs/windows.md b/docs/windows.md index a84e89bf..bb597027 100644 --- a/docs/windows.md +++ b/docs/windows.md @@ -54,5 +54,5 @@ If you have an error saying that `mkl_intel_thread.dll` cannot be found, you can **If you get a permission error**, please try starting the Anaconda Prompt as administrator. -[install]: https://www.anaconda.com/distribution/#windows +[install]: https://www.anaconda.com/download [prompt]: https://docs.anaconda.com/anaconda/user-guide/getting-started/#open-prompt-win From fe124077a224a36bec1bad1bc7966c9b53453f23 Mon Sep 17 00:00:00 2001 From: William Dye Date: Fri, 27 Oct 2023 11:39:50 -0400 Subject: [PATCH 2/3] Update readme --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index a1207212..5a1943ee 100644 --- a/README.md +++ b/README.md @@ -139,7 +139,7 @@ pip install -e . This will create a `demucs` environment with all the dependencies installed. -You will also need to install [soundstretch/soundtouch](https://www.surina.net/soundtouch/soundstretch.html): on Mac OSX you can do `brew install sound-touch`, +You will also need to install [soundstretch/soundtouch](https://www.surina.net/soundtouch/soundstretch.html): on macOS you can do `brew install sound-touch`, and on Ubuntu `sudo apt-get install soundstretch`. This is used for the pitch/tempo augmentation. @@ -196,7 +196,7 @@ demucs --two-stems=vocals myfile.mp3 If you have a GPU, but you run out of memory, please use `--segment SEGMENT` to reduce length of each split. `SEGMENT` should be changed to a integer describing the length of each segment in seconds. A segment length of at least 10 is recommended (the bigger the number is, the more memory is required, but quality may increase). Note that the Hybrid Transformer models only support a maximum segment length of 7.8 seconds. -Create an environment variable `PYTORCH_NO_CUDA_MEMORY_CACHING=1` is also helpful. If this still does not help, please add `-d cpu` to the command line. See the section hereafter for more details on the memory requirements for GPU acceleration. +Creating an environment variable `PYTORCH_NO_CUDA_MEMORY_CACHING=1` is also helpful. If this still does not help, please add `-d cpu` to the command line. See the section hereafter for more details on the memory requirements for GPU acceleration. Separated tracks are stored in the `separated/MODEL_NAME/TRACK_NAME` folder. There you will find four stereo wav files sampled at 44.1 kHz: `drums.wav`, `bass.wav`, `other.wav`, `vocals.wav` (or `.mp3` if you used the `--mp3` option). From 435d96aa88b743d464b17f00499c36b8584aac26 Mon Sep 17 00:00:00 2001 From: William Dye Date: Fri, 27 Oct 2023 11:42:08 -0400 Subject: [PATCH 3/3] Update api.md --- docs/api.md | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/docs/api.md b/docs/api.md index ba31646d..86ca3e89 100644 --- a/docs/api.md +++ b/docs/api.md @@ -5,11 +5,13 @@ Notes: Type hints have been added to all API functions. It is recommended to check them before passing parameters to a function as some arguments only support limited types (e.g. parameter `repo` of method `load_model` only support type `pathlib.Path`). 1. The first step is to import api module: + ```python import demucs.api ``` -1. Then initialize the `Separator`. Parameters which will be served as default values for methods can be passed. Model should be specified. +2. Then initialize the `Separator`. Parameters which will be served as default values for methods can be passed. Model should be specified. + ```python # Initialize with default parameters: separator = demucs.api.Separator() @@ -20,7 +22,8 @@ separator = demucs.api.Separator(model="mdx_extra", segment=12) # You can also use other parameters defined ``` -1. Separate it! +3. Separate it! + ```python # Separating an audio file origin, separated = separator.separate_audio_file("file.mp3") @@ -32,7 +35,8 @@ origin, separated = separator.separate_tensor(audio) separator.update_parameter(segment=smaller_segment) ``` -1. Save audio +4. Save audio + ```python # Remember to create the destination folder before calling `save_audio` # Or you are likely to recieve `FileNotFoundError`