Skip to content

AnonymousGit0/STAR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

🌠 STAR: Speech-to-Audio Generation via Representation Learning

githubio

This work presents STAR, the first end-to-end speech-to-audio generation framework, designed to enhance efficiency and address error propagation inherent in cascaded systems. It:

  • Recognize the potential of the speech-to-audio generation task and have designed the first E2E system STAR;
  • Validate E2E STA feasibility via representation learning experiments, showing that spoken sound event semantics can be directly extracted;
  • Achieve effective speech-to-audio modal alignment through a bridge network mapping mechanism and a two-stage training strategy;
  • Significantly reduces speech processing latency from 156ms to 36ms(≈ 76.9% reduction), whilesurpassing the generation performance of cascaded systems.

Table of Contents


✂️ Data Preparation

Generating corresponding speech from captions in Audiocaps, followed by feature extraction using different speech encoders (DAC, Hubert, WavLM):

git clone https://github.com/AnonymousGit0/STAR
python src/data_preparation/vits/vits_inference.py
python src/data_preparation/data_preparation/speech_encoder/hubert_extract_feature.py

💡 Stage1: Bridge Network

Pre-train the Bridge Network using sound event labels from AudioSet

python src/bridge_network/qformer_predictions.py

🌱 Stage2: STA Generation

Train end-to-end speech-to-audio generation using speech-audio data

sh src/sta_generation/bash_scripts_star/train_star_fm.sh
sh src/sta_generation/bash_scripts_star/infer_multi_gpu.sh
python src/sta_generation/evaluation/star.py --gen_audio_dir {generated_audio_folder}

Acknowledgement

Our code referred to the WavLM, fairseq, DAC, SECap, HEAR. We appreciate their open-sourcing of their code.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages