-
Notifications
You must be signed in to change notification settings - Fork 8
/
Copy pathREADME
75 lines (53 loc) · 2.08 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
End2End chinese-english code-switch speech recognition in pytorch
## This is a mixed project borrowing from many awesome projects opened recently.
With pytorch-lightning, experiments can be carried out easily.
And i will try to make evey calculation in a batched and cleaned way.
(such as add bos & eos into batched target and spec augment) Any ideas can be put into the issues,
and welcome for discussion. (This project is still being building and reorganizing)
project features:
joint attention & ctc beam search decode with rnn lm
multi dataset
using pytorch lightning for 16bit training
Chinese-char level & English-word level tokenizer
sentence piece tokenizer for english tokenizing
rnn_lm training
label smoothing
customized transformer encoder and decoder see: src/model/modules/transformer_encoder...
*rezero transformer for some converge problem with half precision and speed consideration
feature:
log fbank with sub sample
speed augment
a spec augment using gpu as a layer in model
customized feature filtering , see src/loader/utils/build_fbank remove_empty_line_2d
optimizer:
Ranger
model:
rezero transformer
restricted encoder field
better mask (may be a little slower than other project but effective)
loss:
lambda * ce loss + (1-lambda * ctc loss) + code switch loss
requirement:
see docker/
references:
https://github.com/ZhengkunTian/OpenTransformer
https://github.com/espnet/espnet
https://github.com/jadore801120/attention-is-all-you-need-pytorch
https://github.com/alphadl/lookahead.pytorch
https://github.com/LiyuanLucasLiu/RAdam
https://github.com/vahidk/tfrecord
https://github.com/kaituoxu/Speech-Transformer
https://github.com/majumderb/rezero
https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer
data:
aishell1 170h
aishell2 1000h
magic data 750h
prime 100h not used
stcmd 100h not used
datatang 200h
datatang 500h
datatang mix 200h
librispeech 960h
train step
english -> eng(sub) + mix + chinese -> chinese + mix -> mix