Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about flow matching decoders #905

Closed
sphmel opened this issue Jan 19, 2025 · 4 comments
Closed

Questions about flow matching decoders #905

sphmel opened this issue Jan 19, 2025 · 4 comments
Labels

Comments

@sphmel
Copy link

sphmel commented Jan 19, 2025

Hi, thanks for open-sourcing nice work.

In cosyvoice decoder, speaker embedding is used, while there're many works(voicebox, soundstorm, e2-tts, f5-tts, etc) that does not use speaker embedding on decoder side.

In cosyvoice2, speech tokenizer's ability has been improved quite a lot, If speech token has really small speaker informations relying only on prefix prompt would work well on zero-shot cloning task. I think your team already did some experiments about dropping speaker embedding. Is there any good reason to use speaker embedding in flow matching decoder? I hope such results be in cosyvoice2 tech report, or next version of cosyvoice model's tech report.

@aluminumbox
Copy link
Collaborator

check our report ,we did experiment on drop speaker embedding in llm and cer reduced

@sphmel
Copy link
Author

sphmel commented Jan 20, 2025

@aluminumbox I mean, speaker embedding in flow matching decoder.

Copy link

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Feb 20, 2025
Copy link

github-actions bot commented Mar 6, 2025

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed Mar 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants