This repo collects multimodal datasets and process them in a nice manner. Please let me know if you have some interesting datasets to be processed.
-
This dataset is from the tv series Friends. It's got visual, audio, and text modalities.
-
This dataset includes dyadic conversation between two people. There are 10 actors in total. It's got visual, audio, and text modalities.
-
This dataset is a conversation between the robot named Leolani and the human named Carl.
Every dataset has their own directory and it'll look something like below. If a dataset does not have all three modalities (i.e. visual, audio, and text), then obviously those directories will be empty.
DATASET
├── raw-videos
│ ├── train / val /test
├── raw-audios
│ ├── train / val /test
├── raw-texts
│ ├── train / val /test
├── face-videos
│ ├── train / val /test
├── face-features
│ ├── train / val /test
├── visual-features
│ ├── train / val /test
├── audio-features
│ ├── train / val /test
├── text-features
│ ├── train / val /test
├── foo.json
├── bar.json
└── README.txt
Every data sample belongs to one of the train, val, or test split. Beware that the numbers are not always the same. For example, there might be a video but then if its transcription is not available or if audio extraction fails, then it won't have text or audio modalities.
DATASET
is the name of the dataset.raw-videos
contains the raw non-processed videos.raw-audios
contains the raw non-processed audios.raw-texts
contains the raw non-processed texts.face-videos
contains the face videos made from the facial features.face-features
contains the detected faces and their features using this repo and repo The features are age, bounding box, face detection probability, gender, five landmarks, and 512-dimensional arcface embedding vector.visual-features
are other visual features (e.g. COCO 80 objects) that are not facial features.audio-features
contains audio features (e.g. spectrogram, encoded embeddings, etc.)text-features
contains text features (e.g. word embeddings, BERT-like model features, etc.)*.json
are some important data-specific text files (e.g. label, etc.)README.txt
briefly explains the dataset and gives you the metadata
There several levels.
Use python >= 3.8
This is about extracting the original datasets into the above mentioned structure (i.e. raw-videos, raw-audios, raw-texts)
-
Since I don't have license to all of the datasets, you should contact the dataset authors and download them yourself.
-
Install the python requirements by
pip install -r requirements-extract-dataset.txt
I highly recommend you to run it in a virtual environment.
-
Move the archive in the corresponding directory.
- Put
MELD.Raw.tar.gz
inMELD/
- Put
IEMOCAP_full_release.tar.gz
inIEMOCAP/
- Put
CarLani.zip
inCarLani/
- Put
-
In this current directory, where
README.md
is located, runpython extract-dataset.py --dataset DATASET
Replace
DATASET
with your desired dataset (e.g.python extract-dataset.py --dataset MELD
)
In the next sections, we will go through extracting features and annotating with EMISSOR. Go ahead if you want to do it youself, otherwise you can just download them from the below links.
-
In the current repo root directory, unzip what you downloaded into the directory
./MELD/
-
In the current repo root directory, unzip what you downloaded into the directory
./IEMOCAP/
-
In the current repo root directory, unzip what you downloaded into the directory
./CarLani/
This is about extracting the featrues (e.g. face-features)
- For facial features, you will pull three docker images.
- For visual features, you should build something. I'll soon make them into docker containers.
- For audio features, you should build something. I'll soon make them into docker containers.
- For text features, you should build something. I'll soon make them into docker containers.
-
Install the python requirements by
pip install -r requirements-extract-features.txt
I highly recommend you to run it in a virtual environment.
-
In this current directory, where
README.md
is located, runpython extract-features.py --dataset DATASET --face-features --face-videos --visual-features --audio-features --text-features --run-on-gpu --num-jobs NUM_JOBS
Replace
DATASET
with your desired dataset. Only add the boolean flags (i.e. --face-features, --face-videos, --visual-features, --audio-features, --text-features) that you want to extract. For example, if you only want to extract face features and audio features from the MELD dataset, the command should bepython extract-features.py --dataset MELD --face-features --audio-features
. If you want to run in parallel, you can add the gpu flag--run-on-gpu
and even add more workers--num-jobs NUM_JOBS
. Running on GPU requires you to have a NVIDIA GPU and you should build the GPU images for this. Read https://github.com/tae898/face for more information.
This is optional. The processed datasets can also be annotated in the EMISSOR annotation format. Using the EMISSOR annotation tool, users can visualize the data or even annotate the data by themselves.
-
Install the python requirements by
pip install -r requirements-annotate-emissor.txt
I highly recommend you to run it in a virtual environment.
-
In this current directory, where
README.md
is located, runpython annotate-emissor.py --dataset DATASET --num-jobs NUM_JOBS
The python script can only use the features extracted.
You can also download them from the below link.
-
In the current repo root directory, unzip what you downloaded into the directory
./MELD/
-
In the current repo root directory, unzip what you downloaded into the directory
./IEMOCAP/
-
In the current repo root directory, unzip what you downloaded into the directory
./CarLani/
The best way to find and solve your problems is to see in the github issue tab. If you can't find what you want, feel free to raise an issue. We are pretty responsive.
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Run
make style && make quality
in the root repo directory, to ensure code - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
If you have any questions, or have interesting datasets, then please let me know.