Skip to content

This is a repository for our Bangla Text to speech NLP work

Notifications You must be signed in to change notification settings

sajidahmed12/Bangla-Speech-Emotion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bangla-Speech-Emotion

This repository contains the implementation of a Bangla Text-to-Speech (TTS) system based on the paper "Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention" along with additional notes, code, and works related to the project.

Paper Summary

The implementation is based on the paper mentioned above. For detailed insights and notes on the paper, refer to Bangla TTS with Guided Attention Notes.

Dataset

Training

To train a model using The LJ Speech Dataset:

  1. Download the dataset and extract it into a directory. Set the directory path in pkg/hyper.py.
  2. Run the preprocessing script:
    python3 main.py --action preprocess
    
  3. Train the Text2Mel network:
    python3 main.py --action train --module Text2Mel
    
  4. Train the SSRN network:
    python3 main.py --action train --module SuperRes
    

Samples

Synthesized samples along with their corresponding sentences are contained in the synthesis directory. The pre-trained models for Text2Mel and SuperRes (auto-saved during training at logdir/text2mel/pkg/trained.pkg and logdir/superres/pkg/trained.pkg, respectively) will be loaded during synthesis.

To synthesize samples listed in sentences.txt:

python3 main.py --action synthesis

An example of the attention matrix for a specific sentence is also provided.

Pre-trained Model

The current pre-trained model is based on 20k batches trained for Text2Mel and 19k batches trained for SuperRes. While the results are not entirely satisfying, improvements are possible by tuning hyperparameters. You can download the pre-trained model from our Google Drive.

Dependency

Ensure you have the following dependencies installed:

  • scipy, librosa, num2words, matplotlib
  • PyTorch == 1.8.1
  • CUDA 10.2
  • numpy

Relative

For TensorFlow implementation, refer to Kyubyong/dc_tts.

For any questions or suggestions, please contact Sajid Ahmed (sajid.ahmed1@northsouth.edu) or Arifuzzaman Arman (arifuzzaman.arman@northsouth.edu).


This revised README provides clearer instructions and structure for users interested in understanding and utilizing the repository.

About

This is a repository for our Bangla Text to speech NLP work

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages