diff --git a/longform_reconstitution/README.md b/longform_reconstitution/README.md index 402cd20..0ba97a3 100644 --- a/longform_reconstitution/README.md +++ b/longform_reconstitution/README.md @@ -12,10 +12,10 @@ All data in this repository made available as Lhotse supervisions - see https:// Currently available: Lhotse supervisions for four versions of the GigaSpeech M subset. -gigaspeech/gigaspeech_supervisions_m.jsonl.gz - original segmentation of the transcriptions of the GigaSpeech M subset (Note: not time-aligned). Average segment length 4s -gigaspeech/gigaspeech_supervisions_m_lf.jsonl.gz - full reconstitutions of the transcriptions of the GigaSpeech M subset. Average segment length 10s -gigaspeech/gigaspeech_supervisions_m_lf_15.jsonl.gz - full reconstitution, followed by re-segmentation with a maximum segment length of 15s. Average segment length 7s -gigaspeech/gigaspeech_supervisions_m_lf_30.jsonl.gz - full reconstitution, followed by re-segmentation with a maximum segment length of 30s. Average segment length 9s +1. `gigaspeech/gigaspeech_supervisions_m.jsonl.gz` - original segmentation of the transcriptions of the GigaSpeech M subset (Note: not time-aligned). Average segment length 4s +2. `gigaspeech/gigaspeech_supervisions_m_lf.jsonl.gz` - full reconstitutions of the transcriptions of the GigaSpeech M subset. Average segment length 10s +3. `gigaspeech/gigaspeech_supervisions_m_lf_15.jsonl.gz` - full reconstitution, followed by re-segmentation with a maximum segment length of 15s. Average segment length 7s +4. `gigaspeech/gigaspeech_supervisions_m_lf_30.jsonl.gz` - full reconstitution, followed by re-segmentation with a maximum segment length of 30s. Average segment length 9s ## TED-LIUM