Skip to content

Commit

Permalink
I don't like that zfill(3)
Browse files Browse the repository at this point in the history
but realistically, a sentence with more than 1000 protected patterns?
  • Loading branch information
jelmervdl committed Oct 30, 2023
1 parent 23571f7 commit 97747e9
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions sacremoses/tokenize.py
Original file line number Diff line number Diff line change
Expand Up @@ -458,6 +458,7 @@ def tokenize(
for protected_pattern in protected_patterns
for match in protected_pattern.finditer(text)
]
assert len(protected_tokens) <= 1000 # so we don't run out of the zfill(3) space.

# Apply the protected_patterns, longest match first.
for i, token in sorted(enumerate(protected_tokens), key=lambda pair:len(pair[1]), reverse=True):
Expand Down

0 comments on commit 97747e9

Please sign in to comment.