Awesome-Speech-Language-Model

Awesome-Speech-Language-Model

Awesome-Speech-Language-Model

Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.

Universal Speech, Audio and Music Understanding

Model

LTU: Listen, Think, and Understand - ICLR 2024
SALMONN: Towards Generic Hearing Abilities for Large Language Models - ICLR 2024
LTU-AS: Joint Audio and Speech Understanding - ASRU 2024
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models - arXiv 2023
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities - ICML 2024
Qwen2-Audio Technical Report - arXiv 2024
WavLLM: Towards Robust and Adaptive Speech Large Language Model - EMNLP 2024
DiVA: Distilling an End-to-End Voice Assistant Without Instruction Training Data - arXiv 2024

Benchmark

Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech - ICASSP 2024
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension - ACL 2024
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words - arXiv 2024
AudioBench: A Universal Benchmark for Audio Large Language Models - arXiv 2024
SALMon: A Suite for Acoustic Language Model Evaluation - arXiv 2024
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark - arXiv 2024
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks - ICLR 2024 open review

End2End Speech Dialogue System

Model

SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities - EMNLP 2023
GPT-4o Voice Mode -API 2024
PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems - EMNLP 2024
VITA: Towards Open-Source Interactive Omni Multimodal LLM - arXiv 2024
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming - arXiv 2024
LLaMA-Omni: Seamless Speech Interaction with Large Language Models - arXiv 2024
Moshi: a speech-text foundation model for real-time dialogue - arXiv 2024
Westlake-Omni - GitHub 2024
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions - arXiv 2024
IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities - arXiv 2024
Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities - arXiv 2024
MooER-omni - GitHub 2024
GLM-4-Voice - GitHub 2024
Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM - arXiv 2024
Hertz-dev - GitHub 2024
Fish Agent - GitHub 2024

Benchmark

VoiceBench: Benchmarking LLM-Based Voice Assistants - arXiv 2024

Full Duplex Modeling

A Full-duplex Speech Dialogue Scheme Based On Large Language Models - NeurIPS 2024
MiniCPM-duplex: Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models - EMNLP 2024
LSLM: Language Model Can Listen While Speaking - arXiv 2024
SyncLLM: Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue Agents - arXiv 2024
Enabling Real-Time Conversations with Minimal Training Costs - arXiv 2024

Survey

Towards audio language modeling -- an overview - arXiv 2024
Recent Advances in Speech Language Models: A Survey - arXiv 2024
A Survey on Speech Large Language Models - arXiv 2024
Speech Trident - Github

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome-Speech-Language-Model

Universal Speech, Audio and Music Understanding

Model

Benchmark

End2End Speech Dialogue System

Model

Benchmark

Full Duplex Modeling

Survey

About

Releases

Packages

Contributors 2

ddlBoJack/Awesome-Speech-Language-Model

Folders and files

Latest commit

History

Repository files navigation

Awesome-Speech-Language-Model

Universal Speech, Audio and Music Understanding

Model

Benchmark

End2End Speech Dialogue System

Model

Benchmark

Full Duplex Modeling

Survey

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages