add structure and references

KdaiP · Apr 2, 2024 · 327379e · 327379e
1 parent c3ab841
commit 327379e
Show file tree

Hide file tree

Showing 3 changed files with 18 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -46,6 +46,20 @@ Feel free to explore and modify settings in `config.py` to modify the hyperparam
 | StableTTS | text to mel | Model is currently in training...|
 | Vocos | mel to wav | [🤗](https://huggingface.co/KdaiP/StableTTS/blob/main/vocos.pt)|
 
+## Model structure
+
+<div align="center">
+
+<p style="text-align: center;">
+  <img src="./figures/structure.jpg" height="512"/>
+</p>
+
+</div>
+
+- We use the Diffusion Convolution Transformer block from [Hierspeech++](https://github.com/sh-lee-prml/HierSpeechpp), which is a combination of original [DiT](https://github.com/sh-lee-prml/HierSpeechpp) and [FFT](https://arxiv.org/pdf/1905.09263.pdf)(Feed forward Transformer from fastspeech) for better prosody.
+
+- In flow-matching decoder, we add a [FiLM layer](https://arxiv.org/abs/1709.07871) before DiT block to condition timestep embedding into model.
+
 ## References
 
 The development of our models heavily relies on insights and code from various projects. We express our heartfelt thanks to the creators of the following:
@@ -58,7 +72,7 @@ The development of our models heavily relies on insights and code from various p
 
 [Stable Diffusion 3](https://stability.ai/news/stable-diffusion-3): Idea of combining flow-matching and DiT.
 
-[Vits](https://github.com/jaywalnut310/vits): Code style and MAS insights.
+[Vits](https://github.com/jaywalnut310/vits): Code style and MAS insights, DistributedBucketSampler.
 
 ### Additional References:
 
@@ -70,6 +84,8 @@ The development of our models heavily relies on insights and code from various p
 
 [gpt-sovits](https://github.com/RVC-Boss/GPT-SoVITS): melstyle encoder for voice clone
 
+[diffsinger](https://github.com/openvpi/DiffSinger): chinese three section phoneme scheme for chinese g2p
+
 ## TODO
 
 - [ ] Release pretrained models.

diff --git a/datas/sampler.py b/datas/sampler.py
@@ -1,5 +1,6 @@
 import torch
 
+# reference: https://github.com/jaywalnut310/vits/blob/main/data_utils.py
 class DistributedBucketSampler(torch.utils.data.distributed.DistributedSampler):
     """
     Maintain similar input lengths in a batch.

diff --git a/figures/structure.jpg b/figures/structure.jpg