PeechApp
diff --git a/‎README.md
Lines changed: 11 additions & 68 deletions b/‎README.md
Lines changed: 11 additions & 68 deletions
diff --git a/‎docs-md/review.md
Lines changed: 2 additions & 2 deletions b/‎docs-md/review.md
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/assets/1-Figure1-1.png
28.6 KB b/‎docs/assets/1-Figure1-1.png
28.6 KB
diff --git a/‎docs/assets/143lgCTyM5cTTABjC2VEHdA.webp
28.5 KB b/‎docs/assets/143lgCTyM5cTTABjC2VEHdA.webp
28.5 KB
diff --git a/‎docs/assets/20240514174225.png
56.2 KB b/‎docs/assets/20240514174225.png
56.2 KB
diff --git a/‎docs/assets/20240517155909.png
34.1 KB b/‎docs/assets/20240517155909.png
34.1 KB
diff --git a/‎docs/assets/20240520150027.png
76.1 KB b/‎docs/assets/20240520150027.png
76.1 KB
diff --git a/‎docs/assets/20240524164947.png
20.5 KB b/‎docs/assets/20240524164947.png
20.5 KB
diff --git a/‎docs/assets/20240527121619.png
104 KB b/‎docs/assets/20240527121619.png
104 KB
diff --git a/‎docs/assets/20240527161523.png
117 KB b/‎docs/assets/20240527161523.png
117 KB
diff --git a/‎docs/assets/AR_Model.png
11.3 KB b/‎docs/assets/AR_Model.png
11.3 KB
diff --git a/‎docs/assets/NAR_schema.png
6.91 KB b/‎docs/assets/NAR_schema.png
6.91 KB
diff --git a/‎docs/assets/audio-animation.gif
3.9 MB b/‎docs/assets/audio-animation.gif
3.9 MB
diff --git a/‎docs/assets/mel_loss.png
47.9 KB b/‎docs/assets/mel_loss.png
47.9 KB
diff --git a/‎docs/assets/total_loss.png
50.2 KB b/‎docs/assets/total_loss.png
50.2 KB
diff --git a/‎docs/review.md
Lines changed: 744 additions & 0 deletions b/‎docs/review.md
Lines changed: 744 additions & 0 deletions
diff --git a/‎models/config/configs.py
Lines changed: 1 addition & 26 deletions b/‎models/config/configs.py
Lines changed: 1 addition & 26 deletions
diff --git a/‎models/tts/delightful_tts/acoustic_model/acoustic_model.py
Lines changed: 0 additions & 2 deletions b/‎models/tts/delightful_tts/acoustic_model/acoustic_model.py
Lines changed: 0 additions & 2 deletions
diff --git a/‎models/tts/delightful_tts/train/train.py
Lines changed: 1 addition & 2 deletions b/‎models/tts/delightful_tts/train/train.py
Lines changed: 1 addition & 2 deletions
diff --git a/‎delightful-hifi.py renamed to ‎notebooks/delightful-hifi.py b/‎delightful-hifi.py renamed to ‎notebooks/delightful-hifi.py
diff --git a/‎web_server.py renamed to ‎server/web_server.py b/‎web_server.py renamed to ‎server/web_server.py
diff --git a/‎train.py
Lines changed: 100 additions & 0 deletions b/‎train.py
Lines changed: 100 additions & 0 deletions
@@ -1,53 +1,23 @@
 # TTS-Framework
-Modified version of DelightfulTTS and UnivNet
-
-### Conda env
-
-Create / activate env
-
-```
-conda create --name tts_framework python=3.11
-conda activate tts_framework
-```
-
-Export / import env
-
-```
-conda env export > environment.yml
 
-```
-
-By default, conda will export your environment with builds, but builds can be platform-specific.
-A solution that worked for me is to use the `--no-build` flag:
-
-```
-conda env export --no-build > environment.yml
-```
+Modified version of DelightfulTTS and UnivNet
 
-Create an env
-```
-conda env create -f environment.yml
-```
+## Install deps
 
-If you have troubles with export, like:
-```
-InvalidVersionSpec: Invalid version '3.0<3.3': invalid character(s)                                                           
+```bash
+sudo apt install ffmpeg libasound2-dev build-essential espeak-ng -y
 ```
 
-Find a problem by this way:
+Create env from the `environment.yml` file:
 
-```
-cd /mnt/Data/anaconda3/envs/tts_framework/lib/python3.11/site-packages/
-
-grep -Rnw . -e "3.0<3.3"
+```bash
+conda env create -f ./tts-framework/environment.yml python=3.11
 
+# After the setup
+conda activate tts_framework
 ```
 
-A Faster Solver for Conda: [Libmamba](https://www.anaconda.com/blog/a-faster-conda-for-a-growing-community)
-
-
-Generate docs:
-
+## Generate docs:
 
 ```
 # live preview server
@@ -57,35 +27,8 @@ mkdocs serve
 mkdocs build
 ```
 
-Test cases:
+## Test cases:
 
 ```
 python -m unittest discover -v
 ```
-
-### [Libmamba solver](https://www.anaconda.com/blog/a-faster-conda-for-a-growing-community):
-
-```
-conda update -n base conda
-```
-
-And then:
-
-```
-conda install -n base conda-libmamba-solver
-conda config --set solver libmamba
-```
-
-### Env Installation process
-
-Install separately
-
-```
-# First - pytorch
-# conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch-nightly -c nvidia
-
-pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121
-
-# Second - lightning
-pip3 install lightning
-```
@@ -701,9 +701,9 @@ Additionally, you can explore the dataset code I prepared for various experiment
 
 I firmly believe that a good project starts with comprehensive documentation, and good code is built upon a solid foundation of test cases. With this in mind, I have made concerted efforts to maintain consistent documentation and ensure thorough test coverage for my code. The repository serves as a comprehensive resource where you can explore the implementation details, review the documentation, and examine the test cases that ensure the code's reliability and correctness.
 
-You can find all the documentation markdown inside the `docs-md` directory, run the docs locally with `mkdocs serve`
+You can find all the documentation inside the `docs` directory, run the docs locally with `mkdocs serve`
 
-Also here you can the [docs online](https://storage.googleapis.com/tts-docs/index.html)
+Also here you can the [docs online](https://peechapp.github.io/tts-peech/)
 
 ### Acoustic model
 
 
@@ -1,8 +1,6 @@
 from dataclasses import dataclass, field
 from typing import List, Literal, Tuple, Union
 
-import torch
-
 PreprocessLangType = Literal["english_only", "multilingual"]
 
 
@@ -22,15 +20,10 @@ class PreprocessingConfig:
     language: PreprocessLangType
     stft: STFTConfig
     sampling_rate: int = 22050
-    val_size: float = 0.05
     min_seconds: float = 0.5
     max_seconds: float = 6.0
     use_audio_normalization: bool = True
     workers: int = 8
-    forced_alignment_batch_size: int = 200000
-    skip_on_error: bool = True
-    pitch_fmin: int = 1
-    pitch_fmax: int = 640
 
 
 @dataclass
@@ -61,10 +54,7 @@ class PreprocessingConfigHifiGAN(PreprocessingConfig):
     )
 
     def __post_init__(self):
-        r"""Post-initialization method for the `PreprocessingConfig` dataclass.
-
-        This method is automatically called after the instance is initialized.
-        It modifies the 'stft' attribute based on the 'sampling_rate' attribute.
+        r"""It modifies the 'stft' attribute based on the 'sampling_rate' attribute.
         If 'sampling_rate' is 44100, 'stft' is set with specific values for this rate.
         If 'sampling_rate' is not 22050 or 44100, a ValueError is raised.
 
@@ -84,21 +74,6 @@ def __post_init__(self):
             raise ValueError("Sampling rate must be 22050 or 44100")
 
 
-@dataclass
-class SampleSplittingRunConfig:
-    workers: int
-    device: torch.device
-    skip_on_error: bool
-    forced_alignment_batch_size: int
-
-
-@dataclass
-class CleaningRunConfig:
-    workers: int
-    device: torch.device
-    skip_on_error: bool
-
-
 @dataclass
 class AcousticTrainingOptimizerConfig:
     learning_rate: float
 
@@ -237,8 +237,6 @@ def freeze_params(self) -> None:
         for par in self.parameters():
             par.requires_grad = False
         self.speaker_embed.requires_grad = True
-        # NOTE: requires_grad prop
-        # self.pitch_adaptor.pitch_embedding.embeddings.requires_grad = True
 
     # NOTE: freeze/unfreeze params changed, because of the conflict with the lightning module
     def unfreeze_params(self, freeze_text_embed: bool, freeze_lang_embed: bool) -> None:
 
@@ -9,7 +9,7 @@
 from lightning.pytorch.tuner.tuning import Tuner
 import torch
 
-from models.tts.delightful_tts.delightful_tts_refined import DelightfulTTS
+from models.tts.delightful_tts.delightful_tts import DelightfulTTS
 
 # Node runk in the cluster
 node_rank = 0
@@ -87,7 +87,6 @@
         # NOTE: Preload the cached dataset into the RAM
         cache_dir="/dev/shm/",
         cache=True,
-        mem_cache=False,
     )
 
     trainer.fit(
 
@@ -0,0 +1,100 @@
+from datetime import datetime
+import logging
+import os
+import sys
+
+from lightning.pytorch import Trainer
+from lightning.pytorch.accelerators import find_usable_cuda_devices  # type: ignore
+from lightning.pytorch.strategies import DDPStrategy
+from lightning.pytorch.tuner.tuning import Tuner
+import torch
+
+from models.config import PreprocessingConfigUnivNet as PreprocessingConfig
+from models.tts.delightful_tts.delightful_tts import DelightfulTTS
+
+# Num nodes in the cluster
+num_nodes = 1
+# Node runk in the cluster
+node_rank = 0
+
+os.environ["WORLD_SIZE"] = f"{num_nodes}"
+os.environ["NODE_RANK"] = f"{node_rank}"
+
+# IP/Port of the master node
+os.environ["MASTER_PORT"] = "12355"
+os.environ["MASTER_ADDR"] = "10.148.0.6"
+
+# Create a logger
+# Set the level of the logger to ERROR
+logger = logging.getLogger("my_logger")
+logger.setLevel(logging.ERROR)
+
+# Format the current date and time as a string
+timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+
+# Create a file handler that logs error messages to a file with the current timestamp in its name
+handler = logging.FileHandler(f"logs/error_{timestamp}.log")
+
+# Create a formatter and add it to the handler
+formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
+handler.setFormatter(formatter)
+
+# Add the handler to the logger
+logger.addHandler(handler)
+
+print("usable_cuda_devices: ", find_usable_cuda_devices())
+
+# Set the precision of the matrix multiplication to float32 to improve the performance of the training
+torch.set_float32_matmul_precision("high")
+
+# Set the logs dir and the checkpoint paths
+default_root_dir = "logs"
+ckpt_acoustic = "./checkpoints/epoch=301-step=124630.ckpt"
+ckpt_vocoder = "./checkpoints/vocoder.ckpt"
+
+try:
+    trainer = Trainer(
+        accelerator="cuda",
+        devices=-1,
+        num_nodes=num_nodes,
+        strategy=DDPStrategy(
+            gradient_as_bucket_view=True,
+            find_unused_parameters=True,
+        ),
+        # Save checkpoints to the `default_root_dir` directory
+        default_root_dir=default_root_dir,
+        enable_checkpointing=True,
+        accumulate_grad_batches=5,
+        max_epochs=-1,
+        log_every_n_steps=10,
+        gradient_clip_val=0.5,
+    )
+
+    preprocessing_config = PreprocessingConfig("multilingual")
+    model = DelightfulTTS(preprocessing_config)
+    # NOTE: Load the model from the checkpoint file
+    # In case of loading the model from the checkpoint file, model states will be restored
+    # from the checkpoint file but the training states will be reset
+    # model = DelightfulTTS.load_from_checkpoint(ckpt_acoustic, strict=False)
+
+    tuner = Tuner(trainer)
+    # NOTE: Tune the learning rate of the model if needed
+    # tuner.lr_find(model)
+
+    train_dataloader = model.train_dataloader(
+        # NOTE: Preload the cached dataset into the RAM
+        cache_dir="/dev/shm/",
+        cache=True,
+    )
+
+    trainer.fit(
+        model=model,
+        train_dataloaders=train_dataloader,
+        # Resume training states from the checkpoint file
+        # ckpt_path=ckpt_acoustic,
+    )
+
+except Exception as e:
+    # Log the error message
+    logger.error(f"An error occurred: {e}")
+    sys.exit(1)