Releases: PyThaiNLP/pythainlp
PyThaiNLP v5.0.4 Released!
PyThaiNLP v5.0.4
is a bug fix release of PyThaiNLP v5.0.3
.
Install: pip install pythainlp
Upgrade: pip install -U pythainlp
- Documentation: https://pythainlp.github.io/docs/5.0
- Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 5.0 Change Log: #788.
What's Changed
- Fixed #914 by @wannaphong in #917
Full Changelog: v5.0.3...v5.0.4
PyThaiNLP v5.0.3 Released!
PyThaiNLP v5.0.3
is a bug fix release of PyThaiNLP v5.0.2
.
Install: pip install pythainlp
Upgrade: pip install -U pythainlp
- Documentation: https://pythainlp.github.io/docs/5.0
- Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 5.0 Change Log: #788.
What's Changed
- Create .editorconfig by @bact in #909
- Fix empty string ('') added (in some cases) when using word_tokenize with join_broken_num=True by @S2P2 in #912
New Contributors
Full Changelog: v5.0.2...v5.0.3
PyThaiNLP v5.0.2 Released!
PyThaiNLP v5.0.2
is a bug fix release of PyThaiNLP v5.0.1
.
Install: pip install pythainlp
Upgrade: pip install -U pythainlp
- Documentation: https://pythainlp.github.io/docs/5.0
- Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 5.0 Change Log: #788.
What's Changed
- Update README and license header by @bact in #902
- Updated crfcut.py by @varunkatiyar819 in #905
New Contributors
- @varunkatiyar819 made their first contribution in #905
Full Changelog: v5.0.1...v5.0.2
Contributors
Thanks all the contributors. (Image made with contributors-img)
PyThaiNLP v5.0.1 Released!
PyThaiNLP v5.0.1
is a bug fix release of PyThaiNLP v5.0.0
.
Install: pip install pythainlp
Upgrade: pip install -U pythainlp
- Documentation: https://pythainlp.github.io/docs/5.0
- Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 5.0 Change Log: #788.
What's Changed
- Fixed bug: ImportError pycrfsuite #901
Full Changelog: v5.0.0...v5.0.1
Contributors
Thanks all the contributors. (Image made with contributors-img)
PyThaiNLP v5.0.0 Released!
We are excited to announce the latest release of PyThaiNLP - version 5.0! PyThaiNLP is a Python library for Thai natural language processing (NLP). We are welcome to release PyThaiNLP 5.0!
With PyThaiNLP 5.0, you can expect improved performance and accuracy for NLP tasks in Thai. We have also added new functions to make your NLP tasks even easier and more efficient.
Install: pip install pythainlp
Upgrade: pip install -U pythainlp
- Documentation: https://pythainlp.github.io/docs/5.0
- Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 5.0 Change Log: #788.
What is new?
License information
- Use SPDX license identifier at the header of source code #876
Deprecation and other API changes
- Change default NER to thainer-v2 5e97e7c
- Move
pythainlp.util.is_native_thai
topythainlp.morpheme.is_native_thai
524759a
Dependency
- Add tzdata as a dependency on Windows by @BLKSerene in #841
New API
- Add
pythainlp.coref
for Thai coreference resolution #802 - Add
wtpsplit
to sentence segmentation & paragraph segmentation #804 and addparagraph_threshold
intoparagraph_tokenize()
function #806 - Add word approximation to
pythainlp.soundex.sound
#809 by @wannaphong - Add
pythainlp.wsd
for Thai word sense disambiguation #818 by @wannaphong - Add
pythainlp.chat
andWangChanGLM
topythainlp.generate
#819 by @wannaphong - Add
pythainlp.cls
a param-free classification model #821 by @c4n - Add
pythainlp.el
entity linking #822 by @wannaphong - Add
pythainlp.ancient
by @wannaphong in #833 - Add
pythainlp.util.rhyme
by @wannaphong in #849 - Add
remove_trailing_repeat_consonants
by @konbraphat51 in #862 - Add
pythainlp.util.to_idn
by @wannaphong in #875 - Add
pythainlp.corpus.find_synonyms
by @wannaphong in #890 - Add
pythainlp.util.morse
by @wannaphong in #891 - Add
pythainlp.morpheme
by @wannaphong in #896
Improve
- Update code comments and clean up codes by @BLKSerene in #845
- Improving the documentation byt fixing the typos, adding necesarry details and explanation of the code and the missing necessary details about model and example. by @Saharshjain78 in #850
- Fix tests of khavee functions by @BLKSerene in #854
- Update Git Actions versions by @bact in #878
- Fix ruff args in workflow by @bact in #880
- Revise ruff args in workflow by @bact in #881
- Fix coref return type and add fallback by @bact in #883
- Fix wrong/incompatible types, code readability by @bact in #884
- Bump protobuf from 3.20 to 3.20.2 by #885
- Add license info to /tests and README_TH.md by @bact in #886
- phayathaibert, khavee, parse: Code clean up by @bact in #889
- ruff: docstring-code-format = true by @bact in #892
Tokenizer
- Add wtpsplit engine to sentence_tokenize #804
- New
paragraph_tokenize
funtion to split Thai text to a paragraph #804 - Add
paragraph_threshold
intoparagraph_tokenize()
function #806 by @pavaris-pm in - Add 🪿 Han-solo by @wannaphong in #830
- Fix
newmm
to better handle non-Thai characters in tokens #856 by @konbraphat51 - Fix incorrect passing of flags to re.split by @hauntsaninja in #832
- Add syllable_tokenize by @wannaphong in #834
- Add wanchanberta_thai_grammarly by @wannaphong in #836
- Add extra segmentation style for paragraph_tokenize function by @pavaris-pm in #844
- Improve: [newmm tokenizer] Change regular expression of "non-thai-characters" by @konbraphat51 in #856
Tag
- Add function for pos tag with transformers by @MpolaarbearM in #857
- Update pos_tag_transformers function by @pavaris-pm in #865
- Add PhayaThaiBERT engine with new features by @pavaris-pm in #873
Chat
- Fixed bug #828
Translate
- Add small100 to
pythainlp.translate
#815 by @wannaphong
Transliterate
- Fix duplicate keys in ISO 11940 and IPA-RTGS phoneme mapping #851 #852 by @BLKSerene and @bact
- Fix duplicate key in IPA to RTGS phoneme mapping by @BLKSerene in #852
Corpus
- Add
pythainlp.corpus.thai_orst_words()
Thai word list from Royal Society of Thailand (ORST) #810 by @wannaphong - Add
pythainlp.corpus.thai_wikipedia_titles()
Thai word list (noun and noun phrases) from Thai Wikipedia titles #869 by @konbraphat51 - Add
pythainlp.corpus.thai_volubilis_words()
Thai word list from Volubilis dictionary #870 by @konbraphat51 - Add
pythainlp.corpus.thai_icu_words()
Thai word list from ICU BreakIterator dictionary #879 by @pavaris-pm - Rename Volubilis/Wikipedia corpus function names for consistency / Fix types by @bact in #882
Util
- Add
pythainlp.util.encoding
#813 by @wannaphong - Add
pythainlp.util.spell_words
#817 by @wannaphong - Add
pythainlp.util.remove_trailing_repeat_consonants()
#862 by @konbraphat51
New Contributors
- @pavaris-pm made their first contribution in #806
- @hauntsaninja made their first contribution in #832
- @Saharshjain78 made their first contribution in #850
- @konbraphat51 made their first contribution in #856
- @MpolaarbearM made their first contribution in #857
Full Changelog: v4.0.2...v5.0.0
Contributors
Thanks all the contributors. (Image made with contributors-img)
PyThaiNLP v5.0.0-beta1
Schedule
- First Beta release: 5 February 2024
- Production release: 10 February 2024
See 5.0 Milestone.
What is new?
License information
- Use SPDX license identifier at the header of source code #876
Deprecation and other API changes
- Change default NER to thainer-v2 5e97e7c
- Move
pythainlp.util.is_native_thai
topythainlp.morpheme.is_native_thai
524759a
Dependency
- Add tzdata as a dependency on Windows by @BLKSerene in #841
New API
- Add
pythainlp.coref
for Thai coreference resolution #802 - Add
wtpsplit
to sentence segmentation & paragraph segmentation #804 and addparagraph_threshold
intoparagraph_tokenize()
function #806 - Add word approximation to
pythainlp.soundex.sound
#809 by @wannaphong - Add
pythainlp.wsd
for Thai word sense disambiguation #818 by @wannaphong - Add
pythainlp.chat
andWangChanGLM
topythainlp.generate
#819 by @wannaphong - Add
pythainlp.cls
a param-free classification model #821 by @c4n - Add
pythainlp.el
entity linking #822 by @wannaphong - Add
pythainlp.ancient
by @wannaphong in #833 - Add
pythainlp.util.rhyme
by @wannaphong in #849 - Add:
remove_trailing_repeat_consonants
by @konbraphat51 in #862 - Add
pythainlp.util.to_idn
by @wannaphong in #875 - Add
pythainlp.corpus.find_synonyms
by @wannaphong in #890 - Add
pythainlp.util.morse
by @wannaphong in #891 - Add
pythainlp.morpheme
by @wannaphong in #896
Improve
- Update code comments and clean up codes by @BLKSerene in #845
- Improving the documentation byt fixing the typos, adding necesarry details and explanation of the code and the missing necessary details about model and example. by @Saharshjain78 in #850
- Fix tests of khavee functions by @BLKSerene in #854
- Update Git Actions versions by @bact in #878
- Fix ruff args in workflow by @bact in #880
- Revise ruff args in workflow by @bact in #881
- Fix coref return type and add fallback by @bact in #883
- Fix wrong/incompatible types, code readability by @bact in #884
- Bump protobuf from 3.20 to 3.20.2 by #885
- Add license info to /tests and README_TH.md by @bact in #886
- phayathaibert, khavee, parse: Code clean up by @bact in #889
- ruff: docstring-code-format = true by @bact in #892
Tokenizer
- Add wtpsplit engine to sentence_tokenize #804
- New
paragraph_tokenize
funtion to split Thai text to a paragraph #804 - Add
paragraph_threshold
intoparagraph_tokenize()
function #806 by @pavaris-pm in - Add 🪿 Han-solo by @wannaphong in #830
- Fix
newmm
to better handle non-Thai characters in tokens #856 by @konbraphat51 - Fix incorrect passing of flags to re.split by @hauntsaninja in #832
- Add syllable_tokenize by @wannaphong in #834
- Add wanchanberta_thai_grammarly by @wannaphong in #836
- Add extra segmentation style for paragraph_tokenize function by @pavaris-pm in #844
- Improve: [newmm tokenizer] Change regular expression of "non-thai-characters" by @konbraphat51 in #856
Tag
- add function for pos tag with transformers by @MpolaarbearM in #857
- Update pos_tag_transformers function by @pavaris-pm in #865
- Add PhayaThaiBERT engine with new features by @pavaris-pm in #873
Chat
- Fixed bug #828
Translate
- Add small100 to
pythainlp.translate
#815 by @wannaphong
Transliterate
- Fix duplicate keys in ISO 11940 and IPA-RTGS phoneme mapping #851 #852 by @BLKSerene and @bact
- Fix duplicate key in IPA to RTGS phoneme mapping by @BLKSerene in #852
Corpus
- Add
pythainlp.corpus.thai_orst_words()
Thai word list from Royal Society of Thailand (ORST) #810 by @wannaphong - Add
pythainlp.corpus.thai_wikipedia_titles()
Thai word list (noun and noun phrases) from Thai Wikipedia titles #869 by @konbraphat51 - Add
pythainlp.corpus.thai_volubilis_words()
Thai word list from Volubilis dictionary #870 by @konbraphat51 - Add
pythainlp.corpus.thai_icu_words()
Thai word list from ICU BreakIterator dictionary #879 by @pavaris-pm - Rename Volubilis/Wikipedia corpus function names for consistency / Fix types by @bact in #882
Util
- Add
pythainlp.util.encoding
#813 by @wannaphong - Add
pythainlp.util.spell_words
#817 by @wannaphong - Add
pythainlp.util.remove_trailing_repeat_consonants()
#862 by @konbraphat51
New Contributors
- @pavaris-pm made their first contribution in #806
- @hauntsaninja made their first contribution in #832
- @Saharshjain78 made their first contribution in #850
- @konbraphat51 made their first contribution in #856
- @MpolaarbearM made their first contribution in #857
PyThaiNLP v5.0.0-dev2
What's Changed
- Add pythainlp.morpheme by @wannaphong in #896
Full Changelog: v5.0.0-dev1...v5.0.0-dev2
PyThaiNLP v5.0.0-dev1
What's Changed
- Add Thai word list from Volubilis dictionary by @konbraphat51 in #870
- Add Thai word list from Thai Wikipedia titles by @konbraphat51 in #869
- switch PyThaiNLP source code to SPDX license ID by @pavaris-pm in #876
- Add pythainlp.util.to_idn by @wannaphong in #875
- Update Git Actions versions by @bact in #878
- Fix ruff args in workflow by @bact in #880
- Revise ruff args in workflow by @bact in #881
- Add Thai word list from ICU BreakIterator dictionary by @pavaris-pm in #879
- Rename Volubilis/Wikipedia corpus function names for consistency / Fix types by @bact in #882
- Fix coref return type and add fallback by @bact in #883
- Fix wrong/incompatible types, code readability by @bact in #884
- Bump protobuf from 3.20 to 3.20.2 by @dependabot in #885
- Add license info to /tests and README_TH.md by @bact in #886
- Add PhayaThaiBERT engine with new features [WIP] by @pavaris-pm in #873
- phayathaibert, khavee, parse: Code clean up by @bact in #889
- Add pythainlp.corpus.find_synonyms by @wannaphong in #890
- ruff: docstring-code-format = true by @bact in #892
- Add pythainlp.util.morse by @wannaphong in #891
Full Changelog: v5.0.0-dev0...v5.0.0-dev1
PyThaiNLP v5.0.0-dev0
What's Changed
- Add extra segmentation style for
paragraph_tokenize
function by @pavaris-pm in #844 - Update code comments and clean up codes by @BLKSerene in #845
- Improving the documentation byt fixing the typos, adding necesarry details and explanation of the code and the missing necessary details about model and example. by @Saharshjain78 in #850
- Fix ISO 11940 duplicate keys by @bact in #851
- Add pythainlp.util.rhyme by @wannaphong in #849
- Fix duplicate key in IPA to RTGS phoneme mapping by @BLKSerene in #852
- Fix tests of khavee functions by @BLKSerene in #854
- Improve: [newmm tokenizer] Change regular expression of "non-thai-characters" by @konbraphat51 in #856
- add function for pos tag with transformers by @MpolaarbearM in #857
- Add: remove_trailing_repeat_consonants() by @konbraphat51 in #862
- Update
pos_tag_transformers
function by @pavaris-pm in #865
New Contributors
- @Saharshjain78 made their first contribution in #850
- @konbraphat51 made their first contribution in #856
- @MpolaarbearM made their first contribution in #857
Full Changelog: v4.1.0-beta5...v5.0.0-dev0
PyThaiNLP v4.1.0-beta5
Docs: https://pythainlp.github.io/dev-docs/
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
Install: pip install --pre pythanlp
See 4.1 Milestone.
What's Changed
- Fix "List of possible extras" in README by @BLKSerene in #839
- Add tzdata as a dependency on Windows by @BLKSerene in #841
Full Changelog: v4.1.0-beta4...v4.1.0-beta5