Audio segmentation toolchain

Toolchain to segment and cluster audio programmes.

The toolchain is built using INA Speech Segmenter and LIUM Speaker Diarization.

Usage

The toolchain is deployed as a Docker container, which provides a Jupyter notebook as UI. The toolchain can be started from a provided notebook, and writes results to a mounted folder.

Build

Prequisites:

NVIDIA CUDA driver installed on host system (not required during build, but at runtime)
Docker installation with GPU support
git and wget installed

Running build.sh will fetch sources and prebuilt binaries, build first the INA Speech Segmenter container, and the Audio Segmentation container on top of it. By default, the resulting container will be labelled audioseg.

Run

Starting the container

To run the container, execute run.sh. You may need to make the following adjustments to this script:

source media folder on host maching (to be mounted as volume into the container)
folder to store results on host machine (to be mounted as volume into the container)
the port number on the host machine (in case the default 8888 is already taken)

After starting up the container will print the URL to connect to the Jupyter notebook in the browser. Alternatively, you can open http://: in your browser and enter the token printed on the command line.

Running the segmenter

The notebook defines a mediafiles list with the filenames of the media files in the source folder. This list needs to be adjusted to the files present.

The algorithm uses a set of default parameters. Those can be initialised by calling segwrapper.get_default_params(), which returns a dictionary with the parameters. A parameter set can be printed using segwrapper.print_params(params). Parameters can be modified by changing the respective value in the dictionary.

segwrapper.segment_plot calls the entire pipeline, and puts result CSV files and plots into the result folder.

Acknowledgement

The research leading to these results has been funded partially by the program ICT of the Future by the Austrian Federal Ministry of Climate Action, Environment, Energy, Mobility, Innovation and Technology (BMK) in the project TailoredMedia.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
img		img
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
build.sh		build.sh
merge_visualise.py		merge_visualise.py
notebook.py		notebook.py
run.sh		run.sh
segwrapper.py		segwrapper.py
test_wrapper.ipynb		test_wrapper.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Audio segmentation toolchain

Usage

Build

Run

Starting the container

Running the segmenter

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

License

TailoredMediaProject/audio_segmentation

Folders and files

Latest commit

History

Repository files navigation

Audio segmentation toolchain

Usage

Build

Run

Starting the container

Running the segmenter

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages