Skip to content

Commit

Permalink
Merge pull request #12 from ionite34/dev
Browse files Browse the repository at this point in the history
Fixes and performance improvements
  • Loading branch information
ionite34 authored May 18, 2022
2 parents 26ca087 + 4edefd6 commit ed180c4
Show file tree
Hide file tree
Showing 19 changed files with 228 additions and 316 deletions.
4 changes: 1 addition & 3 deletions .github/codecov.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,4 @@ comment:

ignore:
- "tests/**"
- "test_*.py"
- "**/__main__.py"
- "**/__init__.py"
- "test_*.py"
63 changes: 47 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,19 +10,50 @@

### Augmented Recurrent Neural G2P with Inflectional Orthography

Grapheme-to-phoneme (G2P) conversion is the process of converting the written form of words (Graphemes) to their
pronunciations (Phonemes). Deep learning models for text-to-speech (TTS) synthesis using phoneme / mixed symbols
typically require a G2P conversion method for both training and inference.

Aquila Resolve presents a new approach for accurate and efficient English G2P resolution.
Input text graphemes are translated into their phonetic pronunciations,
using [ARPAbet](https://wikipedia.org/wiki/ARPABET) as the [phoneme symbol set](#Symbol-Set).
Aquila Resolve presents a new approach for accurate and efficient English to
[ARPAbet](https://wikipedia.org/wiki/ARPABET) G2P resolution.
The pipeline employs a context layer, multiple transformer and n-gram morpho-orthographical search layers,
and an autoregressive recurrent neural transformer base.

The current implementation offers state-of-the-art accuracy for out-of-vocabulary (OOV) words, as well as contextual
and an autoregressive recurrent neural transformer base. The current implementation offers state-of-the-art accuracy for out-of-vocabulary (OOV) words, as well as contextual
analysis for correct inferencing of [English Heteronyms](https://en.wikipedia.org/wiki/Heteronym_(linguistics)).

The package is offered in a pre-trained state that is ready for use as a dependency or in
notebook environments. There are no additional resources needed, other than the model checkpoint which is
automatically downloaded on the first usage. See [Installation](#Installation) more information.

### 1. Dynamic Word Mappings based on context:

```pycon
g2p.convert('I read the book, did you read it?')
# >> '{AY1} {R EH1 D} {DH AH0} {B UH1 K}, {D IH1 D} {Y UW1} {R IY1 D} {IH1 T}?'
```
```pycon
g2p.convert('The researcher was to subject the subject to a test.')
# >> '{DH AH0} {R IY1 S ER0 CH ER0} {W AA1 Z} {T UW1} {S AH0 B JH EH1 K T} {DH AH0} {S AH1 B JH IH0 K T} {T UW1} {AH0} {T EH1 S T}.'
```

| | 'The subject was told to read. Eight records were read in total.' |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------|
| *Ground Truth* | The `S AH1 B JH IH0 K T` was told to `R IY1 D`. Eight `R EH1 K ER0 D Z` were `R EH1 D` in total. |
| Aquila Resolve | The `S AH1 B JH IH0 K T` was told to `R IY1 D`. Eight `R EH1 K ER0 D Z` were `R EH1 D` in total. |
| [Deep Phonemizer](https://github.com/as-ideas/DeepPhonemizer)<br/>([en_us_cmudict_forward.pt](https://github.com/as-ideas/DeepPhonemizer#pretrained-models)) | The **S AH B JH EH K T** was told to **R EH D**. Eight **R AH K AO R D Z** were `R EH D` in total. |
| [CMUSphinx Seq2Seq](https://github.com/cmusphinx/g2p-seq2seq)<br/>([checkpoint](https://github.com/cmusphinx/g2p-seq2seq#running-g2p)) | The `S AH1 B JH IH0 K T` was told to `R IY1 D`. Eight **R IH0 K AO1 R D Z** were **R IY1 D** in total. |
| [ESpeakNG](https://github.com/espeak-ng/espeak-ng) <br/> (with [phonecodes](https://github.com/jhasegaw/phonecodes)) | The **S AH1 B JH EH K T** was told to `R IY1 D`. Eight `R EH1 K ER0 D Z` were **R IY1 D** in total. |

### 2. Leading Accuracy for unseen words:

```pycon
g2p.convert('Did you kalpe the Hevinet?')
# >> '{AY1} {R EH1 D} {DH AH0} {B UH1 K}, {D IH1 D} {Y UW1} {R IY1 D} {IH1 T}?'
```

| | "tensorflow" | "agglomerative" | "necrophages" |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------|------------------------------------|----------------------------------|
| Aquila Resolve | `T EH1 N S ER0 F L OW2` | `AH0 G L AA1 M ER0 EY2 T IH0 V` | `N EH1 K R OW0 F EY2 JH IH0 Z` |
| [Deep Phonemizer](https://github.com/as-ideas/DeepPhonemizer)<br/>([en_us_cmudict_forward.pt](https://github.com/as-ideas/DeepPhonemizer#pretrained-models)) | `T EH N S ER F L OW` | **AH G L AA M ER AH T IH V** | `N EH K R OW F EY JH IH Z` |
| [CMUSphinx Seq2Seq](https://github.com/cmusphinx/g2p-seq2seq)<br/>([checkpoint](https://github.com/cmusphinx/g2p-seq2seq#running-g2p)) | **T EH1 N S ER0 L OW0 F** | **AH0 G L AA1 M ER0 T IH0 V** | **N AE1 K R AH0 F IH0 JH IH0 Z** |
| [ESpeakNG](https://github.com/espeak-ng/espeak-ng) <br/> (with [phonecodes](https://github.com/jhasegaw/phonecodes)) | **T EH1 N S OW0 R F L OW2** | **AA G L AA1 M ER0 R AH0 T IH2 V** | **N EH1 K R AH0 F IH JH EH0 Z** |


## Installation

```bash
Expand All @@ -32,8 +63,8 @@ pip install aquila-resolve
> automatically downloaded on the first use of relevant public methods that require inferencing. For example,
> when [instantiating `G2p`](#Usage). You can also start this download manually by calling `Aquila_Resolve.download()`.
>
> If you are in an environment where remote file downloads are not possible, you can also download the checkpoint
> manually and instantiate `G2p` with the flag: `G2p(custom_checkpoint='path/model.pt')`
> If you are in an environment where remote file downloads are not possible, you can also transfer the checkpoint
> manually, placing `model.pt` within the `Aquila_Resolve.data` module folder.
## Usage

Expand All @@ -48,10 +79,10 @@ g2p.convert('The book costs $5, will you read it?')

> Additional optional parameters are available when defining a `G2p` instance:
| Parameter | Default | Description |
|--------------------|----------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `device` | `'cpu'` | Device for Pytorch inference model |
| `process_numbers` | `True` | Toggles conversion of some numbers and symbols to their spoken pronunciation forms. See [numbers.py](src/Aquila_Resolve/text/numbers.py) for details on what is covered. |
| Parameter | Default | Description |
|-------------------|---------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `device` | `'cpu'` | Device for Pytorch inference model. GPU is supported using `'cuda'` |
| `process_numbers` | `True` | Toggles conversion of some numbers and symbols to their spoken pronunciation forms. See [numbers.py](src/Aquila_Resolve/text/numbers.py) for details on what is covered. |

## Model Architecture

Expand Down
4 changes: 2 additions & 2 deletions setup.cfg
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
[metadata]
name = Aquila-Resolve
version = 0.1.2-dev1
version = 0.1.2
author = ionite
author_email = dev@ionite.io
description = Augmented Recurrent Neural Grapheme-to-Phoneme conversion with Inflectional Orthography.
long_description = file: README.md
long_description_content_type = text/markdown
url = https://github.com/ionite34/Aquila-Resolve'
url = https://github.com/ionite34/Aquila-Resolve
license = Apache 2.0
license_file = LICENSE
classifiers =
Expand Down
2 changes: 1 addition & 1 deletion src/Aquila_Resolve/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
Grapheme to Phoneme Resolver
"""
__version__ = "0.1.2-dev1"
__version__ = "0.1.2"

from .g2p import G2p
from .data.remote import download
2 changes: 1 addition & 1 deletion src/Aquila_Resolve/data/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

if sys.version_info < (3, 9):
# In Python versions below 3.9, this is needed
from importlib_resources import files
from importlib_resources import files # pragma: no cover
else:
# Since python 3.9+, importlib.resources.files is built-in
from importlib.resources import files
Expand Down
13 changes: 4 additions & 9 deletions src/Aquila_Resolve/g2p.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,14 @@
from nltk.stem.snowball import SnowballStemmer

from .h2p import H2p
from .h2p import replace_first
from .text.replace import replace_first
from .format_ph import with_cb
# from .dict_reader import DictReader
from .static_dict import get_cmudict
from .text.numbers import normalize_numbers
from .filter import filter_text
from .processors import Processor
from .infer import Infer
from .symbols import contains_alpha, brackets_match
from .symbols import contains_alpha, valid_braces

re_digit = re.compile(r"\((\d+)\)")
re_bracket_with_digit = re.compile(r"\(.*\)")
Expand Down Expand Up @@ -143,13 +142,9 @@ def convert(self, text: str, convert_num: bool = True) -> str | None:
:param convert_num: True to convert numbers to words
"""

# Check that every {} bracket is paired
check = brackets_match(text)
if check is not None:
raise ValueError(check)

# Normalize numbers, if enabled
# Convert numbers, if enabled
if convert_num:
valid_braces(text, raise_on_invalid=True)
text = normalize_numbers(text)

# Filter and Tokenize
Expand Down
23 changes: 7 additions & 16 deletions src/Aquila_Resolve/h2p.py
Original file line number Diff line number Diff line change
@@ -1,27 +1,18 @@
import nltk
import re
from nltk.tokenize import TweetTokenizer
from nltk import pos_tag
from nltk import pos_tag_sents
from .dictionary import Dictionary
from .filter import filter_text as ft
from .text.replace import replace_first
from . import format_ph as ph

# Check that the nltk data is downloaded, if not, download it
# Check required nltk data exists, if not, download it
try:
nltk.data.find('taggers/averaged_perceptron_tagger.zip')
except LookupError:
nltk.download('averaged_perceptron_tagger')


# Method to use Regex to replace the first instance of a word with its phonemes
def replace_first(target, replacement, text):
# Skip if target invalid
if target is None or target == '':
return text
# Replace the first instance of a word with its phonemes
# return re.sub(r'(?i)\b' + target + r'\b', replacement, text, 1)
return re.sub(r'(?<!\{)\b' + target + r'\b(?![\w\s]*[}])', replacement, text, count=1, flags=re.IGNORECASE)
from nltk.data import find
find('taggers/averaged_perceptron_tagger.zip')
except LookupError: # pragma: no cover
from nltk.downloader import download
download('averaged_perceptron_tagger', raise_on_error=True)


class H2p:
Expand Down
6 changes: 3 additions & 3 deletions src/Aquila_Resolve/infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,13 @@ def __init__(self, device='cpu'):
self.lang = 'en_us'
self.batch_size = 32

def __call__(self, words: list[str]) -> list[str]:
def __call__(self, text: list[str]) -> list[str]:
"""
Infers phonemes for a list of words.
:param words: list of words
:param text: list of words
:return: dict of {word: phonemes}
"""
res = self.model.phonemise_list(words, lang=self.lang, batch_size=self.batch_size).phonemes
res = self.model.phonemise_list(text, lang=self.lang, batch_size=self.batch_size).phonemes
# Replace all occurrences of '][' with spaces, remove remaining brackets
res = [r.replace('][', ' ').replace('[', '').replace(']', '') for r in res]
return res
2 changes: 1 addition & 1 deletion src/Aquila_Resolve/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

if sys.version_info < (3, 9):
# In Python versions below 3.9, this is needed
from importlib_resources import files
from importlib_resources import files # pragma: no cover
else:
# Since python 3.9+, importlib.resources.files is built-in
from importlib.resources import files
Expand Down
138 changes: 6 additions & 132 deletions src/Aquila_Resolve/models/dp/model/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,7 @@

import torch
import torch.nn as nn
from torch.nn import TransformerEncoderLayer, LayerNorm, TransformerEncoder
from .utils import get_dedup_tokens, _make_len_mask, _generate_square_subsequent_mask, PositionalEncoding
from .utils import _make_len_mask, _generate_square_subsequent_mask, PositionalEncoding
from ..preprocessing.text import Preprocessor


Expand All @@ -17,7 +16,7 @@ def is_autoregressive(self) -> bool:
"""
Returns: bool: Whether the model is autoregressive.
"""
return self in {ModelType.AUTOREG_TRANSFORMER}
return self in {ModelType.AUTOREG_TRANSFORMER} # pragma: no cover


class Model(torch.nn.Module, ABC):
Expand All @@ -39,91 +38,7 @@ def generate(self, batch: Dict[str, torch.Tensor]) -> Tuple[torch.Tensor, torch.
Tuple[torch.Tensor, torch.Tensor]: The predictions. The first element is a tensor (phoneme tokens)
and the second element is a tensor (phoneme token probabilities)
"""
pass


class ForwardTransformer(Model):

def __init__(self,
encoder_vocab_size: int,
decoder_vocab_size: int,
d_model=512,
d_fft=1024,
layers=4,
dropout=0.1,
heads=1) -> None:
super().__init__()

self.d_model = d_model

self.embedding = nn.Embedding(encoder_vocab_size, d_model)
self.pos_encoder = PositionalEncoding(d_model, dropout)

encoder_layer = TransformerEncoderLayer(d_model=d_model,
nhead=heads,
dim_feedforward=d_fft,
dropout=dropout,
activation='relu')
encoder_norm = LayerNorm(d_model)
self.encoder = TransformerEncoder(encoder_layer=encoder_layer,
num_layers=layers,
norm=encoder_norm)

self.fc_out = nn.Linear(d_model, decoder_vocab_size)

def forward(self,
batch: Dict[str, torch.Tensor]) -> torch.Tensor: # shape: [N, T]
"""
Forward pass of the model on a data batch.
Args:
batch (Dict[str, torch.Tensor]): Input batch entry 'text' (text tensor).
Returns:
Tensor: Predictions.
"""

x = batch['text']
x = x.transpose(0, 1) # shape: [T, N]
src_pad_mask = _make_len_mask(x).to(x.device)
x = self.embedding(x)
x = self.pos_encoder(x)
x = self.encoder(x, src_key_padding_mask=src_pad_mask)
x = self.fc_out(x)
x = x.transpose(0, 1)
return x

@torch.jit.export
def generate(self,
batch: Dict[str, torch.Tensor]) -> Tuple[torch.Tensor, torch.Tensor]:
"""
Inference pass on a batch of tokenized texts.
Args:
batch (Dict[str, torch.Tensor]): Input batch with entry 'text' (text tensor).
Returns:
Tuple: The first element is a Tensor (phoneme tokens) and the second element
is a tensor (phoneme token probabilities).
"""

with torch.no_grad():
x = self.forward(batch)
tokens, logits = get_dedup_tokens(x)
return tokens, logits

@classmethod
def from_config(cls, config: dict) -> 'ForwardTransformer':
preprocessor = Preprocessor.from_config(config)
return ForwardTransformer(
encoder_vocab_size=preprocessor.text_tokenizer.vocab_size,
decoder_vocab_size=preprocessor.phoneme_tokenizer.vocab_size,
d_model=config['model']['d_model'],
d_fft=config['model']['d_fft'],
layers=config['model']['layers'],
dropout=config['model']['dropout'],
heads=config['model']['heads']
)
pass # pragma: no cover


class AutoregressiveTransformer(Model):
Expand Down Expand Up @@ -151,42 +66,6 @@ def __init__(self,
dropout=dropout, activation='relu')
self.fc_out = nn.Linear(d_model, decoder_vocab_size)

def forward(self, batch: Dict[str, torch.Tensor]): # shape: [N, T]
"""
Foward pass of the model on a data batch.
Args:
batch (Dict[str, torch.Tensor]): Input batch with entries 'text' (text tensor) and 'phonemes'
(phoneme tensor for teacher forcing).
Returns:
Tensor: Predictions.
"""

src = batch['text']
trg = batch['phonemes'][:, :-1]

src = src.transpose(0, 1) # shape: [T, N]
trg = trg.transpose(0, 1)

trg_mask = _generate_square_subsequent_mask(len(trg)).to(trg.device)

src_pad_mask = _make_len_mask(src).to(trg.device)
trg_pad_mask = _make_len_mask(trg).to(trg.device)

src = self.encoder(src)
src = self.pos_encoder(src)

trg = self.decoder(trg)
trg = self.pos_decoder(trg)

output = self.transformer(src, trg, src_mask=None, tgt_mask=trg_mask,
memory_mask=None, src_key_padding_mask=src_pad_mask,
tgt_key_padding_mask=trg_pad_mask, memory_key_padding_mask=src_pad_mask)
output = self.fc_out(output)
output = output.transpose(0, 1)
return output

@torch.jit.export
def generate(self,
batch: Dict[str, torch.Tensor],
Expand Down Expand Up @@ -278,15 +157,10 @@ def create_model(model_type: ModelType, config: Dict[str, Any]) -> Model:
Returns: Model: Model object.
"""

if model_type is ModelType.TRANSFORMER:
model = ForwardTransformer.from_config(config)
elif model_type is ModelType.AUTOREG_TRANSFORMER:
model = AutoregressiveTransformer.from_config(config)
else:
if model_type is not ModelType.AUTOREG_TRANSFORMER: # pragma: no cover
raise ValueError(f'Unsupported model type: {model_type}. '
f'Supported types: {[t.value for t in ModelType]}')
return model
'Supported type: AUTOREG_TRANSFORMER')
return AutoregressiveTransformer.from_config(config)


def load_checkpoint(checkpoint_path: str, device: str = 'cpu') -> Tuple[Model, Dict[str, Any]]:
Expand Down
Loading

0 comments on commit ed180c4

Please sign in to comment.