Skip to content

PyThaiNLP v3.0.0-dev0

Pre-release
Pre-release
Compare
Choose a tag to compare
@wannaphong wannaphong released this 27 Dec 11:21
· 1333 commits to dev since this release
3414373

PyThaiNLP v3.0.0-dev0 is The first development release of PyThaiNLP 3.0 (For development only)

Docs: https://pythainlp.github.io/dev-docs/index.html
Report bug: https://github.com/PyThaiNLP/pythainlp/issues
GitHub: https://github.com/PyThaiNLP/pythainlp

News

Since PyThaiNLP 2.4, We will end support PyThaiNLP on Python 3.6. Python 3.6 users can use PyThaiNLP 2.3.1
We have updated the dict & rule for newmm. If you use newmm for word tokenization in your model, we recommend you retrain your model.

What is new?

Deprecation and other API changes

  • #550 Deprecated syllable_tokenize. syllable_tokenize is deprecated, use subword_tokenize instead
  • 701fb3a pythainlp.tag.named_entity.ThaiNameTagger is change to pythainlp.tag.thainer.ThaiNameTagger. This old class will be deprecated in PyThaiNLP version 2.5.

Augment

  • #580 Add Thai Text Augmentation

Corpus

  • #557 Fix lots of misspellings in dictionary (words_th.txt)
  • #576 Add get_corpus_default_db and thainer 1.5 model. Now, You can add corpus on default_db.json and you dont load last thainer model from Internet.

Tag

  • #599 Add tltk (pos_tag and ner) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
  • #600 Add NER class - NER class for Named-entity recognizer tasks.

Translate

  • #589 Add pythainlp.translate.Translate Class
  • #588 Add Chinese-Thai Machine Translation

Tokenization

  • #562 Tokenize repeating dots and commas from numbers
  • #585 Fix token_max_len bug that makes it always zero
  • #562 Tokenize repeating dots and commas from numbers (fix #461)
  • #594 Retrained sentenceseg_crfcut.model for PyThaiNLP 2.4
  • 3144110 Add SEFR CUT to pythainlp
  • #599 Add tltk (sentence_tokenize and word_tokenize) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
  • #622 Add nlpo3

Transliterate

  • #566 Refactor Royin Transliterate: Avoid embedded if blocks and simplified consonant replacing operations
  • #585 Manually merge update-royin branch with dev branch to add O-ANG rule
  • #599 Add tltk (g2p and ipa) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
  • #624 Add pythainlp.transliterate.puan

Word Vector

  • #573 Fix token_max_len bug that makes it always zero
  • #583 Add pythainlp.word_vector.WordVector

Spell

  • #591 Add more spelling engine
  • #599 Add tltk (spell) - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.

Generate

  • #579 Add pythainlp.generate

Tool

  • #614 Add misspell module

Other

  • #599 Add tltk - add tltk wrapper to pythainlp functions ex ner, word_tokenize and more.
  • e357cf8 Update requirements from ssg 0.0.6 to ssg 0.0.8
  • Spoonerism: Add supports words more 3 syllables #631
  • Add maiyamok #623 This function is preprocessing MaiYaMok in Thai sentence.