Skip to content

MSc Thesis "Audio-Visual Self-Supervised Representation Learning in-the-wild"

License

Notifications You must be signed in to change notification settings

kvilouras/AV-SSRL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AV-SSRL

MSc Thesis: "Audio-Visual Self-Supervised Representation Learning in-the-wild"

Pre-trained models

We provide checkpoints for models pre-trained on a subset of VGGSound with 50,000 videos. The former method refers to Cross-modal Instance Discrimination (xID), whereas the latter is based on the recently proposed VICReg method.

Method Checkpoint (100 epochs)
xID download link
VICReg download link

Self-supervised pre-training

To train a model using xID method run the following (assuming that DDP strategy is used):

python3 main-ssl.py configs/VGGSound-N1024.yaml --multiprocessing-distributed

For VICReg method, run:

python3 main-vicreg.py configs/VGGSound-VICReg.yaml --multiprocessing-distributed

To avoid data parallelism, discard --multiprocessing-distributed argument and set the --gpu argument on either of the aforementioned scripts to a specific id (e.g. 0 for the first GPU device).

Linear classification

For this experiment, run the following (e.g. for UCF-101 dataset and model pre-trained using xID method):

python3 eval-action-recg-linear.py configs/ucf/8at16-linear.yaml configs/VGGSound-N1024.yaml --distributed

Note that this script does not yet support multi-node evaluation.

Final results on both UCF-101 and HMDB-51 datasets are shown in the following table:

Method Top-1 Acc. (UCF-101) Top-5 Acc. (UCF-101) Top-1 Acc. (HMDB-51) Top-5 Acc. (HMDB-51)
xID 51.20% 80.91% 28.08% 61.29%
VICReg 39.75% 71.30% 21.85% 52.69%

Fine-tuning

For this experiment, run the following (e.g. for HMDB-51 dataset and model pre-trained using VICReg method):

python3 eval-action-recg.py configs/hmdb51/8at16-fold1.yaml configs/VGGSound-VICReg.yaml --distributed

Note that this script does not yet support multi-node evaluation.

Final results on both UCF-101 and HMDB-51 datasets are shown in the following table:

Method Top-1 Acc. (UCF-101) Top-5 Acc. (UCF-101) Top-1 Acc. (HMDB-51) Top-5 Acc. (HMDB-51)
xID 73.22% 92.78% 42.85% 73.69%
VICReg 59.53% 85.94% 34.65% 68.96%

Concept Generalization

In this experiment, we test the generalization performance of self-supervised models on data belonging to unknown classes (i.e. classes not found in the pre-training dataset). To perform the split on the so-called seen and unseen concepts, please use the label_similarities.ipynb notebook. Based on our results, you can find the set of unseen concepts for UCF-101 and HMDB-51 respectively in datasets/rest_classes/ directory.

To perform this experiment, run the following (e.g. for xID model and UCF-101 dataset with 20% of training data per class for tuning the linear classifier):

python3 eval-action-recg-linear.py configs/ucf/8at16-linear.yaml configs/VGGSound-N1024.yaml --distributed --few-shot-ratio 0.2 --use-rest-classes

Final results are depicted in the following plots:

References

About

MSc Thesis "Audio-Visual Self-Supervised Representation Learning in-the-wild"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published