GitHub - igormedeiros/voice-clone-narrator: This project is a text-to-speech (TTS) converter that uses the Coqui TTS library. It reads a text file, splits the text into chunks that do not exceed the character limit, and converts each chunk into an audio file. The project is configured to use the GPU for faster processing.

igormedeiros / voice-clone-narrator Public

Notifications You must be signed in to change notification settings
Fork 0
Star 2

This project is a text-to-speech (TTS) converter that uses the Coqui TTS library. It reads a text file, splits the text into chunks that do not exceed the character limit, and converts each chunk into an audio file. The project is configured to use the GPU for faster processing.

MIT license

2 stars 0 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
voices		voices
.gitignore		.gitignore
LICENSE		LICENSE
README.me		README.me
cuda_check.py		cuda_check.py
main.py		main.py
requirements.txt		requirements.txt

Repository files navigation

# Text-to-Speech Converter (TTS) using Coqui TTS

## Requirements

- Python 3.7 or higher
- CUDA and cuDNN configured correctly to use the GPU
- ffmpeg

## Installation

1. Clone this repository:
```bash
git clone https://github.com/igormedeiros/voice-clone-narrator.git
cd voice-clone-narrator
```

2. Create a virtual environment and activate it:
```bash
python -m venv venv
source venv/bin/activate # On Windows use: venv\Scripts\activate
```

3. Install the dependencies:
```bash
pip install -r requirements.txt
```

4. Ensure ffmpeg is installed:
```bash
sudo apt-get install ffmpeg # For Debian/Ubuntu based systems
```

## Usage

Run the script with the necessary parameters:

```bash
python main.py --text_file "path/to/your/textfile.txt" --output_path "output.wav" --speed 1.0 --language "pt" --sample_voice "./voices/sample-voice.wav" --thumb "path/to/your/thumbnail.jpg"

Parameters
--text_file: Path to the text file to be converted to speech.
--output_path: Path to save the output audio file.
--speed: Speech speed (optional, default is 1.0).
--language: Speech language (optional, default is "pt").
--sample_voice: Path to the sample voice file.
--thumb: Path to the thumbnail image for MP4 output (optional).

## Logging

Each run of the program generates a log file named "narrator" followed by the date and time of execution, with the extension .log.

## Output

- If `--thumb` is provided, the output will be an MP4 file with the audio and the thumbnail image as a static video.
- If `--thumb` is not provided, the output will be a WAV audio file.

## Project Structure

voice-clone-narrator/
├── LICENSE
├── README.md
├── requirements.txt
├── main.py
└── voices/

## Contribution

Feel free to contribute to the project! Just follow the steps below:

1. Fork the project
2. Create a new branch (git checkout -b feature/new-feature)
3. Commit your changes (git commit -am 'Add new feature')
4. Push to the branch (git push origin feature/new-feature)
5. Create a new Pull Request