This repository provides the code for "Improving Query-by-Vocal Imitation with Contrastive Learning and Audio Pretraining", presented at DCASE 2024. The paper addresses the challenge of audio retrieval using vocal imitations as queries, proposing a dual encoder architecture that leverages pretrained CNNs and an adapted NT-Xent loss for fine-tuning.
-
Updated
Oct 25, 2024 - Python