You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for creating e2v. how can i access the previous model that could only output a few labels instead of 9?
I find this new ckpt (the plus large) to be so much worse compared to the old one at least for Persian.
the model also hallucinates a lot with short inputs (1-2 seconds) even in English.
The text was updated successfully, but these errors were encountered:
You can modify the logits to specific emotions(such as 5) by masking the emotions you don't need. You will get similar performance with the previous model.
If I use the feature vectors ('feats') generated by the Automodel library's model.generate function on audio files as input to train a new model for Speech emotion recognition, is this process equivalent to fine-tuning or training a downstream model for speech emotion recogniton ? Are these features equivalent to embeddings or raw audio features ?
If I use the feature vectors ('feats') generated by the Automodel library's model.generate function on audio files as input to train a new model for Speech emotion recognition, is this process equivalent to fine-tuning or training a downstream model for speech emotion recogniton ? Are these features equivalent to embeddings or raw audio features ?
I did not get your idea clearly. We provide emotion2vec for extracting features and emotion2vec+ for classification. And both types of the model provide embeddings for further exploration of your tasks.
Thank you for creating e2v. how can i access the previous model that could only output a few labels instead of 9?
I find this new ckpt (the plus large) to be so much worse compared to the old one at least for Persian.
the model also hallucinates a lot with short inputs (1-2 seconds) even in English.
The text was updated successfully, but these errors were encountered: