Skip to content

Commit

Permalink
fix(tn): 全角数字 (#157)
Browse files Browse the repository at this point in the history
  • Loading branch information
xingchensong authored Nov 15, 2023
1 parent 3d3cb80 commit 2c0e38f
Show file tree
Hide file tree
Showing 5 changed files with 24 additions and 1 deletion.
9 changes: 9 additions & 0 deletions tn/chinese/data/number/digit.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,12 @@
7
8
9
9 changes: 9 additions & 0 deletions tn/chinese/data/number/teen.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,12 @@
7
8
9
1 change: 1 addition & 0 deletions tn/chinese/data/number/zero.tsv
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
0
2 changes: 1 addition & 1 deletion tn/chinese/rules/cardinal.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ def build_tagger(self):
sign = string_file('tn/chinese/data/number/sign.tsv')
dot = string_file('tn/chinese/data/number/dot.tsv')

rmzero = delete('0')
rmzero = delete('0') | delete('0')
rmpunct = delete(',').ques
digits = zero | digit
self.digits = digits
Expand Down
4 changes: 4 additions & 0 deletions tn/chinese/test/data/normalizer.txt
Original file line number Diff line number Diff line change
Expand Up @@ -42,3 +42,7 @@ B2B => B to B
当场票数≥100万 => 当场票数大于等于一百万
独得300w张 => 独得三百万张
面积是10km² => 面积是十平方千米
仅仅是2015年 => 仅仅是二零一五年
包含3000余件 => 包含三千余件
查处450余名 => 查处四百五十余名
查处450余名 => 查处四百五十余名

0 comments on commit 2c0e38f

Please sign in to comment.