Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
280 commits
Select commit Hold shift + click to select a range
d41c6e4
fix run_experiments.sh bug
poonehmousavi Dec 24, 2024
04ea1e6
add bash script for token extraction
poonehmousavi Dec 24, 2024
95333cf
fix bug
poonehmousavi Dec 24, 2024
096fc43
add hyperparam tuning
poonehmousavi Dec 25, 2024
8dc0161
fix precommit
poonehmousavi Dec 25, 2024
c0f4fee
modify hparams.sh input order
poonehmousavi Dec 25, 2024
a595cf6
only applying testing for final run HT
poonehmousavi Dec 25, 2024
78da6c1
fix bug
poonehmousavi Dec 26, 2024
6a3a7a5
fix bug
poonehmousavi Dec 26, 2024
e9ff250
add hupertun for contextnet
poonehmousavi Dec 26, 2024
3e2fe0c
add etsting to average run
poonehmousavi Dec 26, 2024
f378aec
add lr for HT for contextnet
poonehmousavi Dec 26, 2024
80238bc
Merge branch 'DASB-refactor' of https://github.com/Chaanks/benchmarks…
poonehmousavi Dec 26, 2024
b2bd316
add measuring time
poonehmousavi Dec 26, 2024
9de6934
add time measure
poonehmousavi Dec 26, 2024
c4e2738
update readme + minor changes
poonehmousavi Dec 28, 2024
279e48b
fix link in readme
poonehmousavi Dec 28, 2024
7f32f1b
update table of contnet
poonehmousavi Dec 28, 2024
30fc2d6
fix
poonehmousavi Dec 28, 2024
a576ba7
fix
poonehmousavi Dec 28, 2024
7c75515
Merge pull request #47 from Chaanks/DASB-refactor
poonehmousavi Dec 28, 2024
0fafc1c
Tokotron LJSpeech: Update to work with the new tokenizer pipeline
flexthink Dec 30, 2024
e66a00e
Tokotron: Add Tokotron integration for LibriTTS (multi-speaker recipes)
flexthink Jan 12, 2025
252f1d7
add new tokenziers andadopt to SB main repo
poonehmousavi Jan 12, 2025
f534bdf
fix precommit
poonehmousavi Jan 12, 2025
6935cb9
Merge pull request #51 from poonehmousavi/sb_new_tokenizers
poonehmousavi Jan 12, 2025
54dab67
Tokotron: Fixes
flexthink Jan 13, 2025
2552b06
DASB: Tokotron: Cosmetic changes
flexthink Jan 14, 2025
f982325
DASB: More cosmetic changes from linters
flexthink Jan 15, 2025
cc9f6cc
Merge branch 'DASB' into DASB-tts
flexthink Jan 16, 2025
1357ff1
DASB: Tokotron: Relative paths
flexthink Jan 16, 2025
958ee87
DASB: Tokotron: Add choices for the model type
flexthink Jan 16, 2025
043eb9c
DASB: Tokotron: more clean-up
flexthink Jan 16, 2025
900481d
DASB: Tokotron: Updates for hyperparameter fitting
flexthink Jan 17, 2025
4dcd1d3
DASB: Batch size updates, device fixes
flexthink Jan 17, 2025
fc08f58
DASB: Tokotron: Fixes
flexthink Jan 17, 2025
fcb37c7
DASB: Ensure UTMOS is maximized rather than minimized!
flexthink Jan 17, 2025
9563cd5
DASB: Tokotron: Fixes
flexthink Jan 20, 2025
4442b44
DASB: Tokotron: Fixes
flexthink Jan 20, 2025
d31ad9c
Update tokenizer_interface.py
poonehmousavi Jan 20, 2025
fbebd2e
Update sq_codec.py
poonehmousavi Jan 20, 2025
9f64966
add sq-codec, mimi and wavtokenizer for librispeech
poonehmousavi Jan 20, 2025
1e18ead
DASB: VALL-E: Initial import
flexthink Jan 20, 2025
3ce4d4d
DASB: Fixes
flexthink Jan 20, 2025
57d68cf
DASB: Fixes
flexthink Jan 20, 2025
c1c3b52
DASB: Add a "brokenness check" to ensure that tokens runs that produc…
flexthink Jan 21, 2025
123e124
DASB: Tokotron/VALL-E: Work in progress
flexthink Jan 21, 2025
0eb8d53
Merge branch 'DASB' into DASB-tts
flexthink Jan 21, 2025
b1ca7ad
DASB: Tokotron: Implement SQCodec, Mimi and WavTokenizer (single-spea…
flexthink Jan 21, 2025
2daeaa5
DASB: Cosmetic changes (pre-commit hooks)
flexthink Jan 21, 2025
99395f8
DASB: Update sample rates
flexthink Jan 22, 2025
0d70359
Merge branch 'DASB-tts-tmp-valle' into DASB-tts
flexthink Jan 22, 2025
0971b8e
fix bug and update LibriSPeech recepie
poonehmousavi Jan 23, 2025
2a8f9d9
Merge pull request #54 from poonehmousavi/update_libirspeech_dasb
poonehmousavi Jan 23, 2025
3b6a99f
Merge branch 'DASB' into DASB-tts
flexthink Jan 23, 2025
3d3e04c
DASB: Tokotron: Add validation batch size customization (to avoid OOM)
flexthink Jan 23, 2025
8535392
Update README.md
poonehmousavi Jan 23, 2025
16912d5
DASB: Tokotron: Minor fixes
flexthink Jan 23, 2025
ec47b0d
DASB: Fixes
flexthink Jan 23, 2025
5dba59d
DASB: Tokotron: Update priors
flexthink Jan 24, 2025
f7116a8
DASB: Fixes
flexthink Jan 24, 2025
199a37c
DASB: Tokotron: Fixes
flexthink Jan 27, 2025
dd7f3d3
DASB: Tokotron: Fix layer selection for Discrete SSL
flexthink Jan 27, 2025
46c8ba4
DASB: VALL-E: Add LibriTTS
flexthink Jan 28, 2025
ba6bddb
DASB: VALL-E: Fixes/Updates
flexthink Jan 28, 2025
d8a720c
DASB: VALL-E: Fixes
flexthink Jan 28, 2025
11c427b
DASB: VALL-E: Fixes
flexthink Jan 28, 2025
2d1a46a
DASB: Fix ST extraction
flexthink Jan 29, 2025
e53d7c6
DASB: Add support for using Orion Trial IDs instead of randomness
flexthink Jan 30, 2025
602f41f
Update run_experiments.sh
poonehmousavi Jan 30, 2025
4f5153e
Update run_hparam_optimization.sh
poonehmousavi Jan 30, 2025
1d0aec0
DASB: Disable random directory name generation for the final test phase
flexthink Jan 30, 2025
e0bb265
DASB: Fixed the codebook count
flexthink Jan 31, 2025
5f5105f
DASB: Extraction fixes/updates
flexthink Jan 31, 2025
d02e870
DASB: Clean-up
flexthink Jan 31, 2025
0f2561d
DASB: Tokotron: Config updates
flexthink Jan 31, 2025
c9578e8
DASB: Cosmetic changes (pre-commit hooks)
flexthink Jan 31, 2025
7270d4e
DASB: Add the ability to turn off evaluation for debugging purposes.
flexthink Jan 31, 2025
2b22169
DASB: Add the ability to turn off evaluation
flexthink Jan 31, 2025
6eaa206
DASB: Tokotron: SQCodec update to use ternary coding
flexthink Feb 3, 2025
a99fddb
DASB: Device fix
flexthink Feb 3, 2025
650cf2e
DASB: Tokotron: Add the ability to add an "initialization model" when…
flexthink Feb 3, 2025
b43b565
DASB: A small fix for cases where strides are not compatble (not nece…
flexthink Feb 3, 2025
693d499
DASB: Extra logging
flexthink Feb 4, 2025
7b79ffc
DASB: Fix maximum validation set size
flexthink Feb 4, 2025
24bebfe
DASB: Add the ability to change the saved folder for Encodec
flexthink Feb 4, 2025
7ede118
DASB: Fixes
flexthink Feb 5, 2025
123248d
DASB: Tokotron: Fixes
flexthink Feb 5, 2025
0b11188
DASB: Tokotron: Fixes
flexthink Feb 5, 2025
4eaa7cd
DASB: Fixes
flexthink Feb 5, 2025
60e7d9e
DASB: Tokotron: Fixes
flexthink Feb 5, 2025
54df7ed
DASB: Tokotron LibriTTS: Fixes
flexthink Feb 5, 2025
3aa7de3
DASB: Fixes
flexthink Feb 6, 2025
10f8202
DASB: Fixes
flexthink Feb 6, 2025
2cd7c6a
DASB: Fixes
flexthink Feb 6, 2025
7e1bf0f
DASB: Fixes
flexthink Feb 6, 2025
2c72caf
DASB: Fixes
flexthink Feb 6, 2025
4b51644
VALL-E: Cosmetic changes, hparams updates
flexthink Feb 6, 2025
748cc86
DASB: Fixes
flexthink Feb 6, 2025
858b5d4
DASB: Fixes
flexthink Feb 6, 2025
7a5ea84
DASB: Fixes
flexthink Feb 6, 2025
30ee0c0
DASB: Fix prefix masking for VALL-E
flexthink Feb 6, 2025
3d89d2d
DASB: Update loss calculation to match ESPNet
flexthink Feb 6, 2025
779bf99
DASB: VALL-E: Fixes
flexthink Feb 6, 2025
92c40b6
VALL-E: Hyperparameter updates
flexthink Feb 6, 2025
5618797
DASB: Fix the sample rate
flexthink Feb 7, 2025
71cd316
DASB: Fixes
flexthink Feb 7, 2025
9e4c550
DASB: Encodec: Small fix
flexthink Feb 7, 2025
165eaac
DASB: Add Mimi, fix defaults for VALL-E Encodec
flexthink Feb 7, 2025
c1b30db
DASB: mimi fixes
flexthink Feb 7, 2025
c3b647e
DASB: add init_from
flexthink Feb 7, 2025
f27ebad
DASB: small updates
flexthink Feb 7, 2025
9840824
DASB: small updates
flexthink Feb 7, 2025
b4afc68
DASB: Add support for alignments
flexthink Feb 9, 2025
cbea7f7
DASB: Fixed
flexthink Feb 9, 2025
e48a91f
VALL-E: Fixes, add encodec
flexthink Feb 10, 2025
45d6130
DASB: Add encodec
flexthink Feb 10, 2025
e1635df
DASB: fixes
flexthink Feb 10, 2025
64b73e7
DASB: Fixes
flexthink Feb 10, 2025
79ca7a6
DASB: Vall-E: Multi-GPU inference fix
flexthink Feb 10, 2025
c6c6cf6
DASB: Fixes
flexthink Feb 10, 2025
e25d146
DASB: Fixes
flexthink Feb 10, 2025
45b3d1b
DASB: CPU/GPU fixes
flexthink Feb 10, 2025
370ab8e
DASB: Minor fixes
flexthink Feb 10, 2025
256fa35
DASB: Fixes
flexthink Feb 11, 2025
9f27332
DASB: Review debugging code
flexthink Feb 11, 2025
bad8999
VALL-E: Update token sequence initialization to account for special t…
flexthink Feb 11, 2025
39ddfd1
DASB: hparam file updates, new hparams for additional tokenizers
flexthink Feb 11, 2025
5acd1d3
VALL-E: Add files for multiple configurations
flexthink Feb 12, 2025
a78f011
DASB: Add Lifeteng-style curriculum, some config updates
flexthink Feb 13, 2025
953540b
DASB: Add init_from
flexthink Feb 13, 2025
f8b9a67
DASB: Add init_from
flexthink Feb 13, 2025
4f8cc9c
DASB: VALL-E: Implement checkpoint retention based on dWER
flexthink Feb 13, 2025
856df20
DASB: ESPNet Encodec support
flexthink Feb 13, 2025
be174df
DASB: Inference mode, remove an unused evaluator
flexthink Feb 14, 2025
750f3a4
DASB: Add customization for the validation batch size
flexthink Feb 14, 2025
55fc383
DASB: VALL-E: Add ESPNET Encodec
flexthink Feb 14, 2025
0730254
DASB: Add the ability to skip resampling
flexthink Feb 14, 2025
41afc01
DASB: Add the switch for LM head training
flexthink Feb 15, 2025
f529e62
DASB: Undo the gradient change - it did not help
flexthink Feb 15, 2025
554e52a
DASB: VALL-E: Add the ability to disable fixed batches, add the abili…
flexthink Feb 16, 2025
e2d7440
DASB: Fixes
flexthink Feb 16, 2025
d7fc323
DASB: Update wav2vec2
flexthink Feb 16, 2025
8a9e873
DASB: Add back LM head freezing (with a toggle)
flexthink Feb 17, 2025
a1f5e94
DASB: Fix for data parallel
flexthink Feb 17, 2025
e752146
DASB: Fix padding
flexthink Feb 17, 2025
c6d5883
DASB: VALL-E: Fix a crash
flexthink Feb 17, 2025
99588e3
DASB: VALL-E: Add LM head freezing
flexthink Feb 17, 2025
dad02cb
DASB: Vall-E: Fix data-parallel
flexthink Feb 18, 2025
63e9972
DASB: VALL-E: Update hyperparameters
flexthink Feb 18, 2025
bacc9f9
DASB: VALL-E: Add data scaling support
flexthink Feb 18, 2025
b1e270a
DASB: Tokotron: Add scaling + selction based on dWER (for comparison)
flexthink Feb 18, 2025
ef35a2f
DASB: Fixes
flexthink Feb 19, 2025
a6073f5
DASB: Add support for test set filtering
flexthink Feb 19, 2025
1be28c7
DASB: Add support for test set filtering
flexthink Feb 19, 2025
4e5f4eb
DASB: Add filtering (useful when some samples aren't present, e.g. wh…
flexthink Feb 19, 2025
8dadf96
DASB: Fixes
flexthink Feb 19, 2025
5272a73
DASB: Fixes
flexthink Feb 20, 2025
b0df9ac
DASB: VALL-E: Fixes for WavTokenizer (AR-only)
flexthink Feb 21, 2025
cf24b23
DASB: VALL-E: Update/add test stage logging
flexthink Feb 24, 2025
b6224d6
DASB: Fix extraction for clusters with no internet connection on comp…
flexthink Feb 24, 2025
d0900e0
DASB: VALL-E: Add layer selection, hpopt updates
flexthink Feb 24, 2025
c5a3f3a
DASB: Add support for eval_run flags
flexthink Feb 24, 2025
3ddbc57
DASB: VALL-E: Fixes
flexthink Feb 25, 2025
e1bfb7e
DASB: VALL-E: Fixes
flexthink Feb 25, 2025
851bd7d
DASB: VALL-E: Update max length
flexthink Feb 25, 2025
7463474
DASB: Fix WavTokenizer
flexthink Feb 25, 2025
05f8014
DASB: VALL-E: Add speaker prompt resampling
flexthink Feb 25, 2025
f94c61b
DASB: VALL-E: Add SQCodec
flexthink Feb 25, 2025
398304e
DASB: Tokotron: Update SQ-Codec ternary coding
flexthink Feb 25, 2025
c90037c
DASB: Add the ability to disable test runs
flexthink Feb 26, 2025
131eea3
DASB: Tokotron: Update ternary loss aggregation
flexthink Feb 27, 2025
7c5e82f
DASB: Fix an issue with contiguous tensors
flexthink Feb 27, 2025
7046db0
DASB: Tokotron: SQ-Codec Add the ability to bypass additional ternary…
flexthink Feb 28, 2025
ebe1811
DASB: Tokotron: Fixes
flexthink Mar 1, 2025
dae8bcb
DASB: Fixes: SQ-Codec refactoring (decouple from Tokotron, simplify)
flexthink Mar 1, 2025
9b09d20
DASB: VALL-E: Fixes
flexthink Mar 4, 2025
4c4663d
DASB: Update VALL-E for SQCodec
flexthink Mar 5, 2025
6af2d83
DASB: Fixes / clean-up
flexthink Mar 5, 2025
8c6a886
DASB: SQ-Codec: Make the special loss optional
flexthink Mar 5, 2025
583f42a
DASB: SQ Codec: Fixes
flexthink Mar 6, 2025
7a011eb
DASB: SQCodec: Fixes
flexthink Mar 6, 2025
24a4014
DASB: VALL-E: SQ-Codec updates
flexthink Mar 6, 2025
7e5d15d
DASB: SQCodec: Fixes
flexthink Mar 6, 2025
0f14a23
DASB: SQ-Codec: Fully implement ternary mode
flexthink Mar 6, 2025
10f8fdb
DASB: Fix SpeechTokenizer
flexthink Mar 6, 2025
c00962e
Fixes for SQCodec: Make offsets optional, align the shift with ternary
flexthink Mar 7, 2025
50ef659
DASB: SQ-Codec: Add chunking to avoid OOM
flexthink Mar 7, 2025
08b14ff
DASB: SQ-Codec: Update LibriTTS
flexthink Mar 8, 2025
d1ce08a
DASB: Add a mulltitrack ternary language model head (a separate proje…
flexthink Mar 8, 2025
15f096c
DASB: Vall-E: Multitrack fixes
flexthink Mar 8, 2025
981fe93
DASB: SQ-Codec: Fixes
flexthink Mar 9, 2025
de4aaaa
DASB: SQ-Codec: Remove the multi-track ternary head (it did not help)
flexthink Mar 10, 2025
e8af899
DASB: VALL-E Fix ternary loss masking
flexthink Mar 10, 2025
851eb84
DASB: SQCodec: Fixes
flexthink Mar 10, 2025
38ed432
DASB: Add the ability to filter priors
flexthink Mar 10, 2025
d5aea40
DASB: Removed debugging code
flexthink Mar 11, 2025
6cef549
DASB: VALL-E: SQ-Codec fixes
flexthink Mar 11, 2025
9fe48e4
DASB: SQ-Codec: Fix the sample rate
flexthink Mar 11, 2025
263f8b5
VALL-E: SQ-Codec: Add target dropout (optional, disabled by default)
flexthink Mar 11, 2025
fb2d573
DASB: SQ-Codec updates
flexthink Mar 12, 2025
51438b9
DASB: SQ-Codec: Add argmax mode
flexthink Mar 12, 2025
acbcfcf
DASB: SQ-Codec: Add argmax mode
flexthink Mar 12, 2025
b38c1cc
DASB: Fixes
flexthink Mar 12, 2025
44e93fd
SQCodec: Fixes
flexthink Mar 14, 2025
f875cd9
DASB: SQCodec: Fixes
flexthink Mar 14, 2025
69b346b
DASB: SQCodec: Update to predict everything autoregressively
flexthink Mar 15, 2025
f51b3a8
DASB: VALL-E: Fixes
flexthink Mar 15, 2025
1f05e76
DASB: SQCodec: Fixes, add LibriTTS
flexthink Mar 16, 2025
9011781
DASB: SQCodec updates
flexthink Mar 16, 2025
9a75652
DASB: VALL-E fixes
flexthink Mar 16, 2025
add349a
DASB: Fixes
flexthink Mar 17, 2025
331bad0
DASB: Train dataset data loader fix
flexthink Mar 18, 2025
17ebf5d
DASB: Add a fallback for hparams files
flexthink Mar 28, 2025
47744ab
DASB: Fixes
flexthink Mar 28, 2025
fa87f1d
DASB: Fix the summary.json check
flexthink Apr 16, 2025
bb08a3a
DASB: Fixes
flexthink Apr 16, 2025
33daea8
DASB: Fixes
flexthink Apr 20, 2025
d27a9ef
DASB: Add memory fraction (to share a large GPU)
flexthink Apr 21, 2025
4ae6e86
DASB: Fix kmeans path conflicts
flexthink Apr 21, 2025
27a4608
DASB: Mimi fix
flexthink Apr 27, 2025
6de5acb
DASB: WER/CER fix
flexthink May 14, 2025
7210b3c
WER/CER fixes
flexthink May 14, 2025
4c5dba5
DASB: VALL-E: Added an option to do preparation only without training
flexthink May 16, 2025
7fec49f
DASB: VALL-E: Add a duration filter
flexthink May 17, 2025
3d67a55
DASB: A fix for broken annotations
flexthink May 17, 2025
bd47e64
DASB: Minor fix for backward compatibility
flexthink May 20, 2025
42ecf13
DASB: Add inference grid search and micro dWER
flexthink May 22, 2025
d6150c3
DASB: Add ASR-based selection + minor updates
flexthink May 24, 2025
5804bba
DASB: Fix the max validation set size
flexthink May 24, 2025
dd65c62
DASB: Evaluations and fit fixes
flexthink May 24, 2025
427be64
DASB: Add sampling temperature
flexthink May 25, 2025
1a9aed4
DASB: Fix the WER calculation bug
flexthink May 26, 2025
8e95fd1
DASB: VALL-E: Add sample selection to other tokenizers
flexthink May 27, 2025
6f60f5a
DASB: VALL-E: Add sample selection
flexthink May 27, 2025
1c7e41f
DASB: VALL-E: Device fixes
flexthink May 28, 2025
c7d8866
DASB: Device fixes
flexthink May 28, 2025
d3e94a0
DASB: Inference Fit: Device Fix
flexthink Jun 3, 2025
fc1b0ce
DASB: add resume logic
flexthink Jun 5, 2025
0de5ad2
DASB: Add top_k customization
flexthink Jun 5, 2025
b59d0e4
DASB: Remove a duplicate setting
flexthink Jun 5, 2025
62be9d2
DASB: Add a generator saver/loader for better reproducibility when in…
flexthink Jun 11, 2025
13f1345
DASB: Fixed the saveable generator wrapper to account for CUDA deprec…
flexthink Jun 12, 2025
cf90559
DASB: Fix an issue with Discrete SSL + generators
flexthink Jun 16, 2025
b9488e4
DASB: Cosmetic changes
flexthink Jul 7, 2025
c6a20d8
DASB: TTS: Fix docstrings
flexthink Jul 8, 2025
ac7d6d6
DASB: Update a docstring
flexthink Jul 10, 2025
f32c99f
Merge branch 'main' into DASB
pplantinga Jul 15, 2025
2cec700
Merge branch 'DASB' into DASB-tts-clean
pplantinga Jul 15, 2025
17bde9d
DASB: Cosmetic changes to pass pre-commit
flexthink Jul 18, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ The SpeechBrain Benchmarks currently include the following:

- [MOABB](https://github.com/speechbrain/benchmarks/tree/main/benchmarks/MOABB) - A benchmark designed for evaluating neural models in well-known EEG tasks like motor imagery, P300, and SSVEP.

- [DASB](https://github.com/speechbrain/benchmarks/tree/main/benchmarks/DASB) - A benchmark designed for evaluating discrete audio tokens across a wide range of discriminative
- [DASB](https://github.com/speechbrain/benchmarks/tree/DASB/benchmarks/DASB) - A benchmark designed for evaluating discrete audio tokens across a wide range of discriminative
and generative tasks.


Expand Down
1 change: 0 additions & 1 deletion benchmarks/DASB/LJSpeech/TTS/tokotron/audio_tokens.py

This file was deleted.

43 changes: 2 additions & 41 deletions benchmarks/DASB/LJSpeech/TTS/tokotron/evaluate.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,17 +51,7 @@ def __init__(self, hparams, create_waveform_fn, device):
else:
self.evaluators = {}

bulk_evaluators = getattr(self.hparams, "bulk_evaluators", {})
if bulk_evaluators:
self.bulk_evaluators = {
key: evaluator_f()
for key, evaluator_f in bulk_evaluators.items()
if key in self.enabled_evaluators
}
else:
self.bulk_evaluators = {}

if not self.evaluators and not self.bulk_evaluators:
if not self.evaluators:
logger.warn(
"No evaluators were defined - this run will produce samples only"
)
Expand Down Expand Up @@ -98,9 +88,7 @@ def on_evaluate_start(self, stage, epoch):
self.create_reports()
self.modules.model.show_inference_progress = False
self.item_ids = []
details_keys = list(self.evaluators.keys()) + list(
self.bulk_evaluators.keys()
)
details_keys = list(self.evaluators.keys())
self.details = {evaluator_key: [] for evaluator_key in details_keys}
self.sample_text = []
self.sample_file_names = []
Expand Down Expand Up @@ -141,7 +129,6 @@ def on_evaluate_end(self):
dataset : speechbrain.dataio.dataset.DynamicItemDataset
a dataset
"""
self.evaluate_bulk()
self.write_summary()
logger.info("Evaluation done")

Expand Down Expand Up @@ -182,19 +169,6 @@ def get_report_columns(self, evaluator_key):
wavs_ref=bogus_wavs,
length_ref=bogus_length,
)
else:
bogus_file_name = self.output_folder / "bogus.wav"
evaluator = self.bulk_evaluators[evaluator_key]
sb.dataio.dataio.write_audio(
str(bogus_file_name),
bogus_wavs[0].cpu(),
samplerate=self.hparams.model_sample_rate,
)
result = evaluator.evaluate_files(
file_names=[bogus_file_name],
text=["BOGUS"],
file_names_ref=[bogus_file_name],
)

return ["uttid"] + list(result.details.keys())

Expand Down Expand Up @@ -228,19 +202,6 @@ def evaluate_batch(self, batch):
self.write_result(evaluator_key, batch.uttid, details)
self.details[evaluator_key].extend(details)

def evaluate_bulk(self):
"""Runs all configured bulk evaluators, which evaluate a directory
of files - rather than one file at a time"""
for evaluator_key, evaluator in self.bulk_evaluators.items():
result = evaluator.evaluate_files(
file_names=self.sample_file_names,
text=self.sample_text,
file_names_ref=self.ref_file_names,
)
self.details[evaluator_key].append(result.details)
details = undo_batch(result.details)
self.write_result(evaluator_key, self.item_ids, details)

def write_result(self, evaluator_key, uttid, details):
"""Outputs the result details to the report for the specified evaluator

Expand Down
76 changes: 41 additions & 35 deletions benchmarks/DASB/LJSpeech/TTS/tokotron/hparams/eval.yaml
Original file line number Diff line number Diff line change
@@ -1,50 +1,56 @@
# ############################################################################
# Evaluation Hyperparameters
# Common to old models, appended to main hyperparameters
#
# Authors: Artem Ploujnikov
# ############################################################################

eval_enabled: True
eval_sample_rate: 16000
eval_samples: null
eval_interval: 1
eval_asr_type: whisper
eval_asr_source: !apply:speechbrain.utils.hparams.choice
value: !ref <eval_asr_type>
choices:
encoder_decoder: speechbrain/asr-transformer-transformerlm-librispeech
whisper: openai/whisper-small
eval_asr_source: openai/whisper-small
evaluations: utmos,asr
tmp_folder: null
utmos_batch_size: 8
utmos_model_path: ./utmos
utmos_ckpt_name: epoch=3-step=7459.ckpt
utmos_ckpt_path: !ref <utmos_model_path>/<utmos_ckpt_name>
utmos_use_python: True
utmos_script: predict.py


eval_asr: !apply:speechbrain.utils.hparams.choice
value: !ref <eval_asr_type>
choices:
encoder_decoder: !name:eval.EncoderDecoderASRSpeechEvaluator
source: !ref <eval_asr_source>
sample_rate: !ref <eval_sample_rate>
overrides:
lm_weight: 0.0
whisper: !name:eval.WhisperASRSpeechEvaluator
source: !ref <eval_asr_source>
sample_rate: !ref <eval_sample_rate>
savedir: !ref <pretrained_model_save_folder>
eval_utmos_source: chaanks/wav2vec2-small
eval_utmos_save_path: !ref <pretrained_model_save_folder>/utmos
eval_utmos_model_name: utmos.ckpt
eval_utmos_model_url: https://huggingface.co/chaanks/UTMOS/resolve/main
eval_utmos_domain_id: null
eval_utmos_judge_id: null
eval_perf: False


eval_utmos: !name:eval.UTMOSSpeechEvaluator
source: !ref <eval_utmos_source>
save_path: !ref <eval_utmos_save_path>
model_name: !ref <eval_utmos_model_name>
model_url: !ref <eval_utmos_model_url>
domain_id: !ref <eval_utmos_domain_id>
judge_id: !ref <eval_utmos_judge_id>

eval_asr: !name:eval.WhisperASRSpeechEvaluator
source: !ref <eval_asr_source>
sample_rate: !ref <eval_sample_rate>
savedir: !ref <pretrained_model_save_folder>

evaluators:
utmos: !ref <eval_utmos>
asr: !ref <eval_asr>

bulk_evaluators:
utmos: !name:eval.UTMOSSpeechEvaluator
model_path: !ref <utmos_model_path>
output_folder: !ref <output_folder>
ckpt_path: !ref <utmos_ckpt_path>
batch_size: !ref <utmos_batch_size>
script: !ref <utmos_script>
use_python: !ref <utmos_use_python>
tmp_folder: !ref <tmp_folder>

eval_summary:
asr:
descriptive: ["wer", "cer", "wer_ref", "cer_ref", "dwer", "dcer"]
utmos:
descriptive: ["utmos"]

eval_summary_log:
utmos: utmos_utmos_mean
dwer: asr_dwer_median

eval_threshold:
dwer_max: 90.0

eval_threshold_set:
utmos: 0.0
79 changes: 41 additions & 38 deletions benchmarks/DASB/LJSpeech/TTS/tokotron/hparams/train_dac.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,18 +8,23 @@ experiment_name: tokotron/dac
# Seed needs to be set at top of yaml, before objects with parameters are made
seed: 74443
__set_seed: !apply:torch.manual_seed [!ref <seed>]
run_name: !PLACEHOLDER
output_folder: !ref results/<experiment_name>/<seed>
save_folder: !ref <output_folder>/save
train_log: !ref <output_folder>/train_log.txt
testing: True # If set to True, the test evlaution is done, otherwise skipped.


token_model_src: "facebook/encodec_24khz"
g2p_src: flexthink/soundchoice-g2p
vocoder_type: encodec
vocoder_src: "charactr/vocos-encodec-24khz"

# Model type
representation_mode: discrete

# Data files
data_folder: !PLACEHOLDER # e.g., /path/to/LibriSpeech
prepare_save_folder: !ref <data_folder>/prepared/dac
data_folder: !PLACEHOLDER
cached_data_folder: !PLACEHOLDER
prepare_save_folder: !ref <cached_data_folder>
pretrained_model_save_folder: !ref <prepare_save_folder>
prepare_archive_path: null
prepare_skip_ignore_folders: False
Expand All @@ -29,16 +34,27 @@ test_json: !ref <prepare_save_folder>/test.json
frozen_split_path: null
sample_path: null
progress_folder: !ref <output_folder>/progress
progress_archive: !ref <progress_folder>/progress.tar
progress_current: !ref <progress_folder>/current
progress_meta: !ref <progress_folder>/meta.yaml
num_audio_samples: 32
samples_interval: 5

splits: ["train", "valid", "test"]
split_ratio: [90, 5, 5]

tokens_folder: !PLACEHOLDER # Path to the folder where extracted tokens are saved.

tokens_loader: !new:utils.tokens.TokensLoader
data_path: !ref <tokens_folder>

token_model_kwargs:
n_quantizers: !ref <audio_tokens_per_step>

splits: ["train", "valid", "test"]
split_ratio: [90, 5, 5]
ckpt_key: dwer
ckpt_key_kind: min
ckpt_keep: 2
test_key: null
test_key_kind: min
ckpt_interval_minutes: 30 # save checkpoint every N min

# Training parameters
Expand All @@ -61,7 +77,7 @@ bos_index: 0
bos_width: 1

# stages related parameters
lr: 0.001
lr: 0.001 # @orion_step1: --lr~"loguniform(0.00001,0.005)"
lr_warmup_steps: 10000
lr_annealing_mode: step
guided_attention_weight: 50.0
Expand All @@ -85,33 +101,22 @@ model_bitrate: 8kbps

# Label encoder
label_encoder: !new:speechbrain.dataio.encoder.TextEncoder
token_list_file_text: ./hparams/char_en.txt
token_list_file_phn: ./hparams/arpabet.txt
token_list_file_text: char_en.txt
token_list_file_phn: arpabet.txt
token_list_file: !apply:speechbrain.utils.hparams.choice
value: !ref <input>
choices:
text: !ref <token_list_file_text>
phonemes: !ref <token_list_file_phn>

# Gate offset
gate_offset: !apply:Tokotron.distance_diff_loss_ramp
gate_offset: !apply:model.Tokotron.distance_diff_loss_ramp
beta: !ref <gate_loss_beta>
gamma: !ref <gate_loss_gamma>
max_weight: !ref <gate_loss_max_weight>

silence_padding: !ref <gate_offset>

# Token model (pretrained)
dac: !new:speechbrain.lobes.models.discrete.dac.DAC
sample_rate: !ref <model_sample_rate>
model_type: !ref <model_type>
model_bitrate: !ref <model_bitrate>
load_pretrained: True

# Token model (pretrained)
token_model: !new:Tokotron.DACFeatureExtractor
dac: !ref <dac>
n_quantizers: !ref <audio_tokens_per_step>

# Dataloader options
train_dataloader_opts:
Expand Down Expand Up @@ -143,20 +148,13 @@ sample_dataloader_opts:
padding_kwargs:
value: !ref <pad_index>

extract_features_opts:
dataloader_opts:
batch_size: !ref <batch_size>
token_model: !ref <token_model>
sample_rate: !ref <sample_rate>
model_sample_rate: !ref <model_sample_rate>


####################### Model parameters ###########################
# Transformer
d_model: 512
nhead: 4
enc_num_layers: 6
dec_num_layers: 12
enc_num_layers: 6 # @orion_step1: --enc_num_layers~"choices([3, 6, 12])"
dec_num_layers: 12 # @orion_step1: --dec_num_layers~"choices([3, 6, 12])"
d_ffn: 2048
transformer_dropout: 0.2
target_dropout: 0.2
Expand All @@ -165,6 +163,7 @@ audio_num_tokens: 1024
audio_emb_size: 1024
audio_emb_freeze: False
audio_emb_pretrained: False
audio_token_offsets: False
text_num_tokens: 39
phn_num_tokens: 52
input_num_tokens: !apply:speechbrain.utils.hparams.choice
Expand All @@ -178,7 +177,7 @@ attention_type: regularMHA

############################## models ################################

model: !new:Tokotron.TokotronTransformerModel # yamllint disable-line rule:line-length
model: !new:model.Tokotron.TokotronTransformerModel # yamllint disable-line rule:line-length
input_num_tokens: !ref <input_num_tokens>
audio_num_tokens: !ref <audio_num_tokens>
audio_tokens_per_step: !ref <audio_tokens_per_step>
Expand All @@ -198,15 +197,23 @@ model: !new:Tokotron.TokotronTransformerModel # yamllint disable-line rule:line
max_audio_length: !ref <max_audio_length>
infer_max_audio_length: !ref <infer_max_audio_length>

tokenizer: !new:utils.tokenizer_interface.DACTokenizer
model_type: !ref <model_type>
model_bitrate: !ref <model_bitrate>
n_codebooks: !ref <audio_tokens_per_step>
load_pretrained: True
tag: latest


modules:
model: !ref <model>
dac: !ref <dac>
tokenizer: !ref <tokenizer>

# define two optimizers here for two-stage training
opt_class: !name:torch.optim.Adam
lr: !ref <lr>

compute_cost: !new:Tokotron.TokotronLoss
compute_cost: !new:model.Tokotron.TokotronLoss
guided_attention_weight: !ref <guided_attention_weight>
guided_attention_sigma: !ref <guided_attention_sigma>
gate_weight: !ref <gate_loss_weight>
Expand All @@ -226,10 +233,6 @@ checkpointer: !new:speechbrain.utils.checkpoints.Checkpointer
lr_scheduler: !ref <lr_annealing>
counter: !ref <epoch_counter>

freezer: !new:preparation.Freezer
save_path: !ref <prepare_save_folder>
archive_path: !ref <prepare_archive_path>

epoch_counter: !new:speechbrain.utils.epoch_loop.EpochCounter
limit: !ref <number_of_epochs>

Expand Down
Loading