Releases: CanCLID/ToJyutping
Releases · CanCLID/ToJyutping
3.2.0
What's Changed
- Update Dictionary Data: Slightly reduced dictionary size.
Full Changelog: 3.1.0...3.2.0
3.1.0
What's Changed
- Changes to the
g2p
method:- We removed the restriction that did not allow patching
unknown_id
without supplyingpuncts_map
since the built-in punctuation mapping can already be patched by theextra_puncts
option. - Fixed: The
lengths
attribute of the outputPhonemesList
now agrees with the original input. Each element oflengths
is now the number of elements ofsegmentals
ortones
that correspond to each character of the input instead of each element of thePhonemesList
.
- We removed the restriction that did not allow patching
Full Changelog: 3.0.0...3.1.0
3.0.0
What's Changed
- Breaking Change: Internal methods are no longer exposed by the default entrypoint.
- Changes to the
g2p
method:- Breaking Change: The output list now includes fillers for unknown characters (1), punctuations (from 2 to 7) in addition to syllable components (from 8 to 94). Unknown character fillers and punctuations are output as singletons (1-tuples). The values can be adjusted by the
offset
andpuncts_offset
arguments. - The output list now contains useful properties, namely
segmentals
,tones
andlengths
. - Punctuations can be added by the
extra_puncts
argument or customized by thepuncts_map
+unknown_id
arguments. - Read the documentation for more info.
- Breaking Change: The output list now includes fillers for unknown characters (1), punctuations (from 2 to 7) in addition to syllable components (from 8 to 94). Unknown character fillers and punctuations are output as singletons (1-tuples). The values can be adjusted by the
- The new
customize
method:- Adds the ability to include custom entries and override or exclude built-in entries.
- The constructed converters can be chained without affecting each other.
- Read the documentation for more info.
- The
get_jyutping
andget_ipa
methods are slightly optimized.
Full Changelog: 2.0.0...3.0.0
2.0.0
What's Changed
- Breaking Change:
g2p
now outputs tones as integers from 1 to 6 (instead of 87 to 92) by default- To retain the old behavior, pass the argument
tone_same_seq=True
- To retain the old behavior, pass the argument
- Accept a triplet as the
offset
argument ofg2p
- Slightly optimize the two
get_*_candidates
methods
Full Changelog: 1.0.0...2.0.0
1.0.0
What's Changed
- Completely rewrite the codebase in an object-oriented manner
- Add
g2p
(grapheme-to-phoneme) conversion function for machine learning purposes - Optimize performance & memory usage (#7)
Full Changelog: 0.3.0...1.0.0
0.3.0
What's Changed
- Use new dictionary to increase accuracy and reduce module size
- Add methods for retrieving all possible pronunciations of a character
- Drop dependency
- Drop Python 3.5, Add 3.11 & 3.12
Full Changelog: 0.2.3...0.3.0
0.2.3
0.2.2
What's Changed
m
andng
alone should be treated as coda by @graphemecluster in #2- Handle Punctuation in
get_*_text
by @graphemecluster in #3
Full Changelog: 0.2.1...0.2.2
0.2.1
- Update dictionary to rime/rime-cantonese@5b6d334