xphoneBR
is a library for grapheme to phoneme conversion for Brazilian Portuguese based on Transformer models.
It is intended to be used in text-to-speech production systems with high accuracy and efficiency.
You can choose between a forward Transformer model (trained with CTC) and its autoregressive
counterpart. The former is faster and more stable while the latter is slightly more accurate.
- G2P
- Normalization
- Bert Tokenizer
The library has been tested on Python 3.7 to 3.11.
To install the latest version of xphoneBR
, you can use pip:
pip install xphonebr -U
Or if you want to install from source, you can use:
pip install git+https://github.com/traderpedroso/xphoneBR.git
from xphonebr import Phonemizer
phones = Phonemizer(normalizer=True)
phones.phonemise("Dra. Ana tem 45% da empresa & cia e iniciou as 8:45 de quinta feira do ano 2024 etc. ")
output: 'dowˈtoɾə. ˈãnə ˈtẽ kwaˈɾẽtə ˈi ˈsĩkʊ ˈpox ˈsẽtʊ ˈda ẽˈpɾɛzə ˈi kõpãˈiə ˈi inisiˈow ˈas ˈoytʊ ˈɔɾəs ˈi kwaˈɾẽtə ˈi ˈsĩkʊ miˈnutʊs ˈdʒi ˈkĩtə ˈfeyɾə ˈdʊ ˈãnʊ ˈdoys ˈmiw ˈi ˈvĩtʃɪ ˈi ˈkwatɾʊ ˈit seˈteɾə.'
from xphonebr import Phonemizer
phones = Phonemizer(autoreg=True, normalizer=True)
phones.phonemise("Dra. Ana tem 45% da empresa & cia e iniciou as 8:45 de quinta feira do ano 2024 etc. ")
output: 'dowˈtoɾə. ˈãnə ˈtẽ kwaˈɾẽtə ˈi ˈsĩkʊ ˈpox ˈsẽtʊ ˈda ẽˈpɾɛzə ˈi kõpãˈiə ˈi inisiˈow ˈas ˈoytʊ ˈɔɾəs ˈi kwaˈɾẽtə ˈi ˈsĩkʊ miˈnutʊs ˈdʒi ˈkĩtə ˈfeyɾə ˈdʊ ˈãnʊ ˈdoys ˈmiw ˈi ˈvĩtʃɪ ˈi ˈkwatɾʊ ˈit seˈteɾə.'
from xphonebr import normalizer
normalizer("Dra. Ana tem 45% da empresa & cia e iniciou as 8:45 de quinta feira do ano 2024 etc. ")
output: 'doutora. Ana tem quarenta e cinco por cento da empresa e companhia e iniciou as oito horas e quarenta e cinco minutos de quinta feira do ano dois mil e vinte e quatro et cetera.'
You can easily train your own autoregressive or forward transformer model. All necessary parameters are set in a config.yaml, which you can find under:
configs/forward_config.yaml
configs/autoreg_config.yaml
for the forward and autoregressive transformer model, respectively.
Download the pretrained model: pt_br
Inside the training script prepare data in a tuple-format and use the preprocess and train API:
from dp.preprocess import preprocess
from dp.train import train
train_data = [('pt_br', 'Dra. Ana', "'dowˈtoɾə. ˈãnə"),
('pt_br', 'tem 45%', 'tẽ kwaˈɾẽtə ˈi ˈsĩkʊ ˈpox ˈsẽtʊ')
train_data = [('pt_br', 'Dra. Ana', "'dowˈtoɾə. ˈãnə"),
('pt_br', 'tem 45%', 'tẽ kwaˈɾẽtə ˈi ˈsĩkʊ ˈpox ˈsẽtʊ')
config_file = 'configs/forward_config.yaml'
preprocess(config_file=config_file,
train_data=train_data,
val_data=val_data,
deduplicate_train_data=False)
num_gpus = torch.cuda.device_count()
if num_gpus > 1:
mp.spawn(train, nprocs=num_gpus, args=(num_gpus, config_file))
else:
train(rank=0, num_gpus=num_gpus, config_file=config_file)
Model checkpoints will be stored in the checkpoints path that is provided by the config.yaml.
load model and inference custom model
from xphonebr import Phonemizer
phones = Phonemizer(autoreg=True, normalizer=True, custom_model="path/to/custom_model")
phones.phonemise("Dra. Ana tem 45% da empresa & cia e iniciou as 8:45 de quinta feira do ano 2024 etc. ")
output: 'dowˈtoɾə. ˈãnə ˈtẽ kwaˈɾẽtə ˈi ˈsĩkʊ ˈpox ˈsẽtʊ ˈda ẽˈpɾɛzə ˈi kõpãˈiə ˈi inisiˈow ˈas ˈoytʊ ˈɔɾəs ˈi kwaˈɾẽtə ˈi ˈsĩkʊ miˈnutʊs ˈdʒi ˈkĩtə ˈfeyɾə ˈdʊ ˈãnʊ ˈdoys ˈmiw ˈi ˈvĩtʃɪ ˈi ˈkwatɾʊ ˈit seˈteɾə.'
We welcome any contribution to xphoneBR
. Here are some ways to contribute:
- Report issues or suggest improvements by opening an issue.
- Contribute with code to fix issues or add features via a Pull Request.
Before submitting a pull request, please make sure your codes are well formatted and tested.
I would like to express my gratitude to @as-ideas for creating the initial project. Their work has been an invaluable starting point for my modifications and improvements for Brazilian portuguese.