Skip to content

Commit 9c338fc

Browse files
author
Yuan Gong
committed
add ltu-as paper
1 parent 5fea7b1 commit 9c338fc

File tree

1 file changed

+0
-1
lines changed

1 file changed

+0
-1
lines changed

README.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,5 @@ The ability of artificial intelligence (AI) systems to perceive and comprehend a
2929

3030
In this paper, we propose a novel audio foundation model, called LTU (Listen, Think, and Understand). To train LTU, we created a new OpenAQA-5M dataset consisting of 1.9 million closed-ended and 3.7 million open-ended, diverse (audio, question, answer) tuples, and used an autoregressive training framework and a perception-to-understanding curriculum. LTU demonstrates strong performance and generalization ability on conventional audio tasks such as classification and captioning. Moreover, it exhibits remarkable reasoning and comprehension abilities in the audio domain. To the best of our knowledge, LTU is the first audio-enabled large language model that bridges audio perception with advanced reasoning.
3131

32-
3332
**How about the code?**
3433
We plan to release the code but our institute needs to review the software release, we are working on preparing for the review.

0 commit comments

Comments
 (0)