Podcastmix: A dataset for separating music and speech in podcasts

This repository contains the instructions for downloading and using the pre-trained UNet and ConvTasNet models in the context of the ICASSP 2022 submission "Podcastmix: A dataset for separating music and speech in podcasts". If you want to download the complete dataset and train or evaluate your models, please refer to this repository.

Install

Create a conda environment:

conda env create -f environment.yml

Activate the environment:

conda activate Podcastmix

Download the UNet and ConvTasNet model from the PodcastMix repository:

curl https://raw.githubusercontent.com/MTG/Podcastmix/main/UNet_model/unet_model.py -o UNet_model/unet_model.py
curl https://raw.githubusercontent.com/MTG/Podcastmix/main/UNet_model/unet_parts.py -o UNet_model/unet_parts.py
curl https://raw.githubusercontent.com/MTG/Podcastmix/main/ConvTasNet_model/conv_tasnet_norm.py -o ConvTasNet_model/conv_tasnet_norm.py

Uncompress the pretrained models and overwrite previous file:

zip -F UNet_model/exp/tmp/best_model_splitted.zip --out UNet_model/exp/tmp/best_model.zip
unzip UNet_model/exp/tmp/best_model.zip -d UNet_model/exp/tmp/
zip -F ConvTasNet_model/exp/tmp/best_model_splitted.zip --out ConvTasNet_model/exp/tmp/best_model.zip
unzip ConvTasNet_model/exp/tmp/best_model.zip -d ConvTasNet_model/exp/tmp/

Use the model to separate podcasts:

Without GPU

python forward_podcast.py \
    --test_dir=<directory-of-the-podcastmix-real-no-reference-or-your-files> --target_model=[MODEL] \
    --exp_dir=[MODEL]_model/exp/tmp --out_dir=separations \
    --segment=18 --sample_rate=44100 --use_gpu=0

With GPU:

CUDA_VISIBLE_DEVICES=0 python forward_podcast.py \
    --test_dir=<directory-of-the-podcastmix-real-no-reference-or-your-files> --target_model=[MODEL] \
    --exp_dir=[MODEL]_model/exp/tmp --out_dir=separations \
    --segment=18 --sample_rate=44100 --use_gpu=1

Notes:

[MODEL] could be ConvTasNet or UNet.
Due to the size of the convolutions, the UNet only supports 2 + 16*i seconds segments (2, 18, 34, 50, ...). ConvTasNet supports segments of any size.
You could modify the sample_rate to fit your needs, but the published pre-trained models were trained with a sample_rate of 44100Hz.
The --out_dir folder will be created inside the --exp_dir directory.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
ConvTasNet_model/exp/tmp		ConvTasNet_model/exp/tmp
UNet_model/exp/tmp		UNet_model/exp/tmp
utils		utils
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
forward_podcast.py		forward_podcast.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Podcastmix: A dataset for separating music and speech in podcasts

Install

Use the model to separate podcasts:

Notes:

About

Releases

Packages

Contributors 2

Languages

MTG/PodcastMix-inference

Folders and files

Latest commit

History

Repository files navigation

Podcastmix: A dataset for separating music and speech in podcasts

Install

Use the model to separate podcasts:

Notes:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages