Data Preparation Pipeline

In this work, we use two datasets: VGGSound and AudioSet. Here we provide the pipeline to pre-process VGGSound data. The process for AudioSet is similar.

On VGGSound, pre-process includes:

Convert .mp4 file to .wav file
Extract waveform (numpy.array) into hdf5 file
Create index on hdf5 file and distribute data

Pipeline scripts

Assume .mp4 files are store at vggsound/videos and .wav will be stored in vggsound/audios

cd vggsound
./tools/convert_mp4_to_wav.sh  \
    ./videos/ \
    ./audios/ \
    32000

Extract waveform and store in hdf5 file, with vggsound.csv downloaded here:

./tools/extract_waveform_in_h5.sh \
    ./audios/ \
    train \
    32000 \ # sample rate
    10 \ # 10 seconds
    vggsound.csv \
    features/vggsound_train.hdf5

Build index and distribute data, data will be in features/Train

python tools/create_index_distribute_data.py \
    --input_h5 features/vggsound_train.hdf5 \
    --index_file features/vggsound_train_index.hdf5 \
    --data_dir Train \
    -n 20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Data Preparation Pipeline

Pipeline scripts

Files

README.md

Latest commit

History

README.md

File metadata and controls

Data Preparation Pipeline

Pipeline scripts