melotts.axcl

This repository is a fork of melotts.axcl, which is an implementation of the MeloTTS text-to-speech model runing on LLM8850 accelerator card.

In order to provide continus audio synthesis service, we have added a server implementation in Python that interacts with the melotts C++ binary. The server listens for incoming text requests, processes them using the melotts model, and returns the generated audio files. In this way, the program does have to load the model for each request, significantly improving performance for multiple requests.

Prerequisites

Building this project requires cmake, make sure to install it first:

sudo apt update
sudo apt install -y cmake

Compile on Pi 5

Clone this repository and run the aarch64 build script:

cd
git clone https://github.com/PiSugar/melotts.axcl.git
cd melotts.axcl
sudo chmod +x build_aarch64.sh
./build_aarch64.sh

Download Models

You can clone the model repositories and link them in arguments.json for easier management.

Start Server

Run in the root directory of the bash serve.sh, which will start the server at http://localhost:8802.

Arguments Configuration

The server uses the arguments.json file to configure the model paths and parameters. Make sure to update the paths in arguments.json to point to the correct model files you downloaded.

For example, for English models, the arguments.json should look like this:

{
  "encoder": "/home/pi/MeloTTS-English-ax650/encoder-en.onnx",
  "decoder": "/home/pi/MeloTTS-English-ax650/decoder-en-br.axmodel",
  "lexicon": "/home/pi/MeloTTS-English-ax650/lexicon-en.txt",
  "token": "/home/pi/MeloTTS-English-ax650/tokens-en.txt",
  "g": "/home/pi/MeloTTS-English-ax650/g-en-br.bin",
  "volume": "4"
}

Request Format

The server accepts POST requests with a JSON payload containing the text to be synthesized. The request format is as follows:

curl -X POST http://localhost:8802/synthesize \
     -H "Content-Type: application/json" \
     -d '{"sentence": "hello, i'm a student from some where", "outputPath": "/path/to/output.wav"}'

If outputPath is not provided, the server will create a temporary file and delete it after returning the base64 encoded audio data.

Response:

{
  "success": true,
  "base64": "wav_file_in_base64_format"
}

The base64 is always provided when outputPath is not given in the request body.

Error Response:

{
  "success": false,
  "error": "Error message here"
}

If the melotts process is not running correctly, use /restart endpoint to restart it:

curl -X POST http://localhost:8802/restart

Run as Systemd Service

A systemd service file melotts.service is provided to run the server as a background.

To enable and start the service, use the following commands:

sudo bash startup.sh

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
3rdparty		3rdparty
cmake		cmake
models		models
server		server
src		src
toolchains		toolchains
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
build.sh		build.sh
build_aarch64.sh		build_aarch64.sh
cross_compile.sh		cross_compile.sh
download_models.sh		download_models.sh
main.cpp		main.cpp
serve.sh		serve.sh
startup.sh		startup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

melotts.axcl

Prerequisites

Compile on Pi 5

Download Models

Start Server

Arguments Configuration

Request Format

Run as Systemd Service

About

Uh oh!

Releases

Packages

Languages

License

PiSugar/melotts.axcl

Folders and files

Latest commit

History

Repository files navigation

melotts.axcl

Prerequisites

Compile on Pi 5

Download Models

Start Server

Arguments Configuration

Request Format

Run as Systemd Service

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages