Skip to content

PiSugar/melotts.axcl

 
 

Repository files navigation

melotts.axcl

This repository is a fork of melotts.axcl, which is an implementation of the MeloTTS text-to-speech model runing on LLM8850 accelerator card.

In order to provide continus audio synthesis service, we have added a server implementation in Python that interacts with the melotts C++ binary. The server listens for incoming text requests, processes them using the melotts model, and returns the generated audio files. In this way, the program does have to load the model for each request, significantly improving performance for multiple requests.

Prerequisites

Building this project requires cmake, make sure to install it first:

sudo apt update
sudo apt install -y cmake

Compile on Pi 5

Clone this repository and run the aarch64 build script:

cd
git clone https://github.com/PiSugar/melotts.axcl.git
cd melotts.axcl
sudo chmod +x build_aarch64.sh
./build_aarch64.sh

Download Models

Chinese Models

English Models

Japanese Models

Spanish Models

You can clone the model repositories and link them in arguments.json for easier management.

Start Server

Run in the root directory of the bash serve.sh, which will start the server at http://localhost:8802.

Arguments Configuration

The server uses the arguments.json file to configure the model paths and parameters. Make sure to update the paths in arguments.json to point to the correct model files you downloaded.

For example, for English models, the arguments.json should look like this:

{
  "encoder": "/home/pi/MeloTTS-English-ax650/encoder-en.onnx",
  "decoder": "/home/pi/MeloTTS-English-ax650/decoder-en-br.axmodel",
  "lexicon": "/home/pi/MeloTTS-English-ax650/lexicon-en.txt",
  "token": "/home/pi/MeloTTS-English-ax650/tokens-en.txt",
  "g": "/home/pi/MeloTTS-English-ax650/g-en-br.bin",
  "volume": "4"
}

Request Format

The server accepts POST requests with a JSON payload containing the text to be synthesized. The request format is as follows:

curl -X POST http://localhost:8802/synthesize \
     -H "Content-Type: application/json" \
     -d '{"sentence": "hello, i'm a student from some where", "outputPath": "/path/to/output.wav"}'

If outputPath is not provided, the server will create a temporary file and delete it after returning the base64 encoded audio data.

Response:

{
  "success": true,
  "base64": "wav_file_in_base64_format"
}

The base64 is always provided when outputPath is not given in the request body.

Error Response:

{
  "success": false,
  "error": "Error message here"
}

If the melotts process is not running correctly, use /restart endpoint to restart it:

curl -X POST http://localhost:8802/restart

Run as Systemd Service

A systemd service file melotts.service is provided to run the server as a background.

To enable and start the service, use the following commands:

sudo bash startup.sh

About

MeloTTS demo on Axera Card

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 91.2%
  • Python 6.1%
  • CMake 1.8%
  • Shell 0.9%