Added a fallible try_from_code
method to obtain a hyphenation::Language
from BCP 47 tags. (Contributed by Tuomas Laakkonen)
Supported languages now include:
- Albanian (
sq
) - Scholastic Finnish (
fi-x-school
)
Updated source hyphenation patterns to CTAN version 2021-03-22:
- Updates and fixes to Macedonian (
mk
) - Updates to German (
de-1901
,de-1996
,de-ch-1901
)
unsafe
code is now forbidden crate-wide.- The
Standard
dictionary implementsClone
. - Moved to Rust 2018.
- Updated dependencies.
Language support:
- New language: Macedonian [mk]
hyph-utf8
patterns updated to CTAN version 2020.03.25:- Major update to Spanish [es] patterns
- Major update to Classical Latin patterns [la-x-classic]
Library changes:
- Dictionaries now have a much more compact representation. Although hyphenation is ~3x slower, there should be no discernible difference in typical workloads.
hyphenator::exact
has been renamed tohyphenator::exception
, in accordance with TeX conventions.- Exceptions can now be removed from dictionaries with
Hyphenator::remove_exception()
Fixes:
hyphenation
will no longer attempt to rebuild (#19) upon compilation of a dependant.Hyphenator::exception()
(néeexact()
) no longer ignores hyphenation boundaries (#20) set by the dictionary. The old behavior can be replicated withexception_within()
.
- Backward compatibility with Rust 1.21.0 and earlier.
- Downstream dependency:
num-traits
0.1 → 0.2
This release brings extended hyphenation, a substantial API reform, and a more flexible approach to dictionary loading.
-
API reform.
The
hyphenation
API is now based on concrete dictionary types (“hyphenators”), and includes a richer set of methods and capabilities. Refer to the documentation for more. -
Full-text hyphenation has been removed. To hyphenate words within text runs, see the notes on segmentation.
-
Hyphenation patterns are not normalized by default. Normalization is now behind feature flags.
-
Hyphenation dictionaries are bundled with the artifact, and need only be rebuilt if normalization is required.
-
Hyphenation dictionaries are not embedded by default; to enable embedding, use the
embed_all
feature.
- Extended (“non-standard”) hyphenation, available for Hungarian and Catalan. Extended hyphenation can improve orthography by altering or inserting characters around word breaks.
- Supported languages now include:
- Belarusian (be)
- Ecclesiastical Latin (la-x-liturgic)
- Pāli (pi)
-
Special casing
Words containing characters that change byte size when lowercased (the Turkish “İ”) are now hyphenated correctly.
- Hyphenation dictionaries are ~40% smaller.
- Hyphenation is ~20% faster.
serde
→ 1.0bincode
→ 1.0
- Slightly reduced embedded pattern size.
- Unicode normalization can be configured via build flags.
- Patterns are now serialized by
bincode
at build time. - Hyphenation is faster by:
- 10–20%, generally;
- 40–60%, for lowercase ASCII words (in a mostly ASCII language).
- Embedded pattern size increased by ~25%.
- The
Standard
hyphenating iterator implementssize_hint()
andExactSizeIterator
.
- Reverted from previous change in 0.3.0: the
Hyphenation
trait only performs word hyphenation. Full text hyphenation is now part of theFullTextHyphenation
trait. - Language support: added Church Slavonic.
-
Updated patterns to hyph-utf8.json v0.1.0.0.
Previously, patterns containing multibyte letters with combining marks could be mapped to mismatching scores of incorrect length, due to normalization issues. The following languages were affected:
- Assamese
- Bengali
- Panjabi
- Sanskrit
- The pattern repository is now bundled with the crate, and no longer requires manual initialization.
- Hyphenation of lowercase words is ~10% faster.
Patterns
andExceptions
now return preexisting values oninsert()
, analogously to HashMap.
- Language loading time is halved.
- Word hyphenation is ~15% faster.
- The
Hyphenation
trait accepts full text as well as individual words. Standard
hyphenators expose apunctuate_with()
method.
- Loading a language is ~30% faster.
- Hyphenation patterns are stored as JSON.