Skip to content

The .NET project contains several implementations of Romanization of Thai text.

License

Notifications You must be signed in to change notification settings

dotnetthailand/ThaiRomanizationSharp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ThaiRomanizationSharp

This .NET project contains several implementations of Romanization of Thai text.

For example, สวัสดี is "romanized" to sawatdi.

There are currently two romanization algorithms:

  1. The Thai Language Toolkit (TLTK) algorithm originally from here.
    • This implementation invokes Python code, and requires python to be installed.
  2. The Thai2Rom algorithm originally from here.
    • This implementation runs native code (using the Torch machine learning framework -- available for Mac OS / Windows / Linux). The native code for these 3 platforms is bundled in the project and no installation is required.
    • Currently does not do word-separation; the romanized characters follow the same spacing as the input Thai text.

Usage

using ThaiRomanizationSharp.Tltk;
IThaiRomanizationService romanizer = new ThaiRomanizationService();
string english = romanizer.Romanize("สวัสดี"); 

// or

using ThaiRomanizationSharp.Thai2Rom;
IThaiRomanizationService romanizer = new Thai2RomService();
string english = romanizer.Romanize("สวัสดี"); 

Credits for Thai Language Toolkit

Credits for Thai2Rom project

The C# code of the Thai2Rom algorithm is based on the Python code from the PyThaiNLP project.

How to run the project locally

  1. For running the ThaiRomanizationSharp.Thai2Rom library, either reference it from your project, or run the unit tests as normal via the dotnet command line, Visual Studio Code, or Visual Studio.
  • See the README.md in the ThaiRomanizationSharp.Thai2Rom subdirectory for more information.
  1. For running the ThaiRomanizationSharp.Thai2Rom library Thai Language Toolkit Project, there are some setup steps you need to do first. The rest of the README is devoted to these steps.
  • See the README.md in the ThaiRomanizationSharp.Tltk subdirectory for more information.

Run the project

  • In VS Code open integrated terminal by pressing ctrl+`.
  • The terminal should start from the root of the project.
  • Run the project with the following command:
    $ dotnet run
  • Wait for a while and you should find an output message in the integrated terminal.

Reference & useful resources

Todo

  • More details what code changes in nlp.py
  • Convert project to a class library
  • Unit test with xUnit
  • GitHub Actions to run a unit test
  • GitHub Actions to deploy a library to Nuget and release page
  • Custom Docker image
  • Deploy example project to Azure App Service container

Removed function in nlp.py

read_thaidict
reset_thaidict
check_thaidict
edits2
pos_tag
pos_tag_wordlist
pos_load
change_tag
chunk
ner
ner_load
wrd_len
g2p_all
sylparse_all
th2ipa
word_segmentX
wordseg_w2v
word_segment_nbest
wordsegmm_bn
chartparse_mm_bn
word_segment_mm
wordseg_mm

About

The .NET project contains several implementations of Romanization of Thai text.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published