Skip to content

Commit

Permalink
[fix] fix itn 三四十万 一万六七 (#234)
Browse files Browse the repository at this point in the history
* [fix] fix itn 三四十万 一万六七

* [fix] fix itn 三四十万 一万六七
  • Loading branch information
xingchensong authored Jun 5, 2024
1 parent d96d2b2 commit 25d5938
Show file tree
Hide file tree
Showing 4 changed files with 15 additions and 24 deletions.
12 changes: 1 addition & 11 deletions itn/chinese/data/number/special_dash.tsv
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
一二 1-2
二三 2-3
三四 3-4
三五 3-5
四五 4-5
五六 5-6
六七 6-7
Expand Down Expand Up @@ -37,14 +38,3 @@
六七千 6000-7000
七八千 7000-8000
八九千 8000-9000
一二万 1-2万
一两万 1-2万
二三万 2-3万
两三万 2-3万
三四万 3-4万
三五万 3-5万
四五万 4-5万
五六万 5-6万
六七万 6-7万
七八万 7-8万
八九万 8-9万
12 changes: 1 addition & 11 deletions itn/chinese/data/number/special_tilde.tsv
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
一二 1~2
二三 2~3
三四 3~4
三五 3~5
四五 4~5
五六 5~6
六七 6~7
Expand Down Expand Up @@ -37,14 +38,3 @@
六七千 6000~7000
七八千 7000~8000
八九千 8000~9000
一二万 1~2万
一两万 1~2万
二三万 2~3万
两三万 2~3万
三四万 3~4万
三五万 3~5万
四五万 4~5万
五六万 5~6万
六七万 6~7万
七八万 7~8万
八九万 8~9万
10 changes: 8 additions & 2 deletions itn/chinese/rules/cardinal.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,9 +42,13 @@ def build_tagger(self):
special_tilde = string_file(
get_abs_path(
'../itn/chinese/data/number/special_tilde.tsv')) # 七八十->70~80
special_tilde = special_tilde + add_weight(
(accep("万") | accep("亿")), -0.1).ques
special_dash = string_file(
get_abs_path(
'../itn/chinese/data/number/special_dash.tsv')) # 七八十->70-80
special_dash = special_dash + add_weight(
(accep("万") | accep("亿")), -0.1).ques
sign = string_file(
get_abs_path('../itn/chinese/data/number/sign.tsv')) # + -
dot = string_file(
Expand Down Expand Up @@ -101,12 +105,14 @@ def build_tagger(self):
(number + accep('亿') + delete('零').ques).ques + number)
# 负的xxx 1.11, 1.01
number = sign.ques + number + (dot + digits.plus).ques
# 五六万 => 5~6万,三五千 => 3000~5000,六七百 => 600~700,三四十 => 30~40
# 五六万 => 5~6万,三五千 => 3000~5000,六七百 => 600~700,三四十 => 30~40, 三四十亿 => 30~40亿
number |= special_tilde
# 十七八 => 17-8, 四十五六 => 45-6, 三百七八十 => 370-80
# 十七八 => 17-8, 四十五六 => 45-6, 三百七八十 => 370-80, 四十五六万 => 45-6万, 一万六七 => 16000-7000
_special_dash = cross('十', '1') + special_dash
_special_dash |= digit + delete('十') + special_dash
_special_dash |= digit + delete('百') + special_dash
_special_dash |= digit + delete('万') + digit + insert(
'000-') + digit + insert('000')
number |= _special_dash

self.number = number.optimize()
Expand Down
5 changes: 5 additions & 0 deletions itn/chinese/test/data/cardinal.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,16 @@
三五万 => 3~5万
三四万 => 3~4万
五六十 => 50~60
三四十万 => 30~40万
三四十亿 => 30~40亿
十五六 => 15-6
四十五六 => 45-6
四十五六万 => 45-6万
七百三四十 => 730-40
十七八万 => 17-8万
六十三四万 => 63-4万
一万六七 => 16000-7000
三万四五 => 34000-5000
我的身份证号是三四零二零三一九三七零幺零幺零五幺七 => 我的身份证号是340203193701010517
我的身份证号是三四零二零三一九三七零幺零幺零五幺X => 我的身份证号是34020319370101051X
给一三三四五三一二二二一打电话 => 给13345312221打电话
Expand Down

0 comments on commit 25d5938

Please sign in to comment.