Generalized Deep Multiset Canonical Correlation Analysis for Multiview Learning of Speech Representations
-
Updated
Apr 9, 2019 - Python
Generalized Deep Multiset Canonical Correlation Analysis for Multiview Learning of Speech Representations
Fine-tuning wav2vec2 to for Pathological Speech Processing
DNN embeddings extraction from audio and speech recordings using PyTorch.
Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity
This repository belongs to my Bachelor's thesis on predicting voice likability from pre-trained speech embeddings.
The Dis-Vector project enhances voice conversion and synthesis through disentangled embeddings, allowing for high-quality, zero-shot voice cloning across multiple languages. This model leverages separate encoders for content, pitch, rhythm, and timbre, enabling precise control over synthesized voice characteristics.
Add a description, image, and links to the speech-embeddings topic page so that developers can more easily learn about it.
To associate your repository with the speech-embeddings topic, visit your repo's landing page and select "manage topics."