Skip to content

Commit

Permalink
fix(tn): 300w张 50000票 (#156)
Browse files Browse the repository at this point in the history
  • Loading branch information
xingchensong authored Nov 15, 2023
1 parent e91e6f8 commit 3d3cb80
Show file tree
Hide file tree
Showing 3 changed files with 6 additions and 2 deletions.
1 change: 1 addition & 0 deletions tn/chinese/data/measure/units_zh.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
平方
立方
公里
Expand Down
5 changes: 3 additions & 2 deletions tn/chinese/rules/measure.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
from tn.processor import Processor

from pynini import accep, cross, string_file
from pynini.lib.pynutil import delete, insert
from pynini.lib.pynutil import delete, insert, add_weight


class Measure(Processor):
Expand All @@ -29,7 +29,8 @@ def __init__(self):
def build_tagger(self):
units_en = string_file('tn/chinese/data/measure/units_en.tsv')
units_zh = string_file('tn/chinese/data/measure/units_zh.tsv')
units = units_en | units_zh
units = add_weight((cross("k", "千") | cross("w", "万")), 0.1).ques + \
(units_en | units_zh)
rmspace = delete(' ').ques
to = cross('-', '到') | cross('~', '到') | accep('到')

Expand Down
2 changes: 2 additions & 0 deletions tn/chinese/test/data/normalizer.txt
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,5 @@ B2B => B to B
给12315打个电话 => 给幺二三幺五打个电话
人均200以内 => 人均两百以内
当场票数≥100万 => 当场票数大于等于一百万
独得300w张 => 独得三百万张
面积是10km² => 面积是十平方千米

0 comments on commit 3d3cb80

Please sign in to comment.