librosa
soundfile
accelerate
ffmpeg
torchaudio
transformers==4.45.1
2.1.1 Download the LibriSpeech dataset from LibriSpeech.
The model will be saved in checkpoints/model
.
By default, the pre-trained model laion/clap-htsat-unfused
is used.
python train.py
Note: Please remember to modify data path and other parameters in train.py
before running.
cd local
pip install -r requirements.txt
python generate.py
python clap_opt_1_minut.py
The sample code is in local/vote.py
.
cd local
python vote.py