Configs for wav2vec experiments #259

AndreasPlt · 2025-01-07T15:28:05Z

Previously, all wav2vec experiments were located in a single file. After adding more and more experiments, this got more chaotic and less modular, hindering flexible development and experimentation. I therefore split the previous config_phoneme_pretrain_finetune.py into different files for each wav2vec modification. Furthermore, I added configs for the experiments with hard negatives and positive samples.

Note: I need to test-run the new configs to see if I transferred everything correctly. Until then, this PR remains as a draft, but please feel welcome to give feedback on the current config structure :)

vieting

Looks mostly good, some comments.

...speech_100_ctc/fairseq_finetuning/ctc_standalone/sisyphus_configs/config_phoneme_boundary.py

vieting · 2025-01-21T19:04:58Z

...speech_100_ctc/fairseq_finetuning/ctc_standalone/sisyphus_configs/config_phoneme_boundary.py

+model_conf_w2v = base_model_conf.copy()
+model_conf_w2v["w2v_path"] = phon_boundary_pretrain_job.out_models[CHECKPOINT].model
+model_conf_w2v["mask_strategy"] = "phoneme"
+model_conf_w2v["mask_length"] = 1
+eow_phon_ls100_ctc_base(
+    model_conf_w2v=model_conf_w2v,
+    train_name_suffix=os.path.join(
+        "w2v_phoneme_boundary_masking",
+        "1_phoneme_spec",
+        f"checkpoint_{CHECKPOINT}"
+        ),
+    fairseq_root=fairseq_root,
+)


Isn't this the same as above?

Yes, this is just for adding all relevant jobs under the new alias 1_phoneme_spec. Since the amount of jobs that is saved under the train_name_suffix, I thought this approach is easier than calling job.add_alias(...) on all relevant jobs. Feel free to suggest any other approach to accomplish this :)

But do we need the old alias?

The base model represents four different models in the ablations:

Testing the w2v modification itself (i.e. just the normal "w2v_phoneme_boundary_masking" alias prefix)

Testing masking strategies in finetuning (there, for negatives_other, the base model is the same for the ablation with random masking strategy; for negatives_other_boundary_masking and boundary_masking, the baseline model is the same as the ablation with phoneme masking strategy)

Testing phoneme mask length in finetuning (only for negatives_other_boundary_masking and boundary_masking, where it is the same as the ablation with mask_len = 1)

Testing masking probability in finetuning (base model is the same as ablation with mask probability 0.65)

I wanted to make this structure clear in the alias folder as well - otherwise, it is not clear from the outside that the base model also represents the "masking probability = 0.65" model. So in that sense, yes, we need the old alias and all other ones as well

Okay, then maybe put a comment above the block like "repeat same experiment to obtain additional alias in line with following experiments" and it should be fine.

...speech_100_ctc/fairseq_finetuning/ctc_standalone/sisyphus_configs/config_phoneme_boundary.py

...ispeech_100_ctc/fairseq_finetuning/ctc_standalone/sisyphus_configs/config_negatives_other.py

...speech_100_ctc/fairseq_finetuning/ctc_standalone/sisyphus_configs/config_phoneme_boundary.py

...h/librispeech_100_ctc/fairseq_finetuning/ctc_standalone/sisyphus_configs/config_positives.py

AndreasPlt added 2 commits January 7, 2025 16:23

single config file for each experiment

3551d74

remove prev config

0f5a16b

vieting self-assigned this Jan 7, 2025

vieting requested changes Jan 21, 2025

View reviewed changes

AndreasPlt added 2 commits January 28, 2025 13:11

some fixes

0a9c5f7

remove outdated comment

b7a1b90

AndreasPlt marked this pull request as ready for review January 28, 2025 14:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configs for wav2vec experiments #259

Configs for wav2vec experiments #259

AndreasPlt commented Jan 7, 2025

vieting left a comment

vieting Jan 21, 2025

AndreasPlt Jan 28, 2025 •

edited

Loading

vieting Jan 28, 2025

AndreasPlt Jan 28, 2025

vieting Jan 29, 2025

Configs for wav2vec experiments #259

Are you sure you want to change the base?

Configs for wav2vec experiments #259

Conversation

AndreasPlt commented Jan 7, 2025

vieting left a comment

Choose a reason for hiding this comment

vieting Jan 21, 2025

Choose a reason for hiding this comment

AndreasPlt Jan 28, 2025 • edited Loading

Choose a reason for hiding this comment

vieting Jan 28, 2025

Choose a reason for hiding this comment

AndreasPlt Jan 28, 2025

Choose a reason for hiding this comment

vieting Jan 29, 2025

Choose a reason for hiding this comment

AndreasPlt Jan 28, 2025 •

edited

Loading