Skip to content
forked from koskenni/pytwolc

Experimental two-level rule compilation using Python HFST

Notifications You must be signed in to change notification settings

keeleleek/pytwolc

This branch is 72 commits behind koskenni/pytwolc:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

64c8204 · Dec 6, 2018
Dec 4, 2018
Nov 11, 2018
Dec 2, 2018
Dec 6, 2018
Mar 26, 2018
Dec 3, 2018
Nov 25, 2018
Jan 20, 2018
Nov 30, 2018
May 9, 2018
Dec 2, 2018
Mar 29, 2018
Dec 2, 2018
Dec 2, 2018
Nov 9, 2018
Nov 24, 2018
Nov 6, 2018
Mar 21, 2018
Dec 6, 2018
Apr 20, 2018
Nov 25, 2018
Mar 21, 2018
Dec 6, 2018
Dec 5, 2018
Apr 3, 2018
Dec 2, 2018
Dec 2, 2018
Dec 5, 2018
Nov 10, 2018

Repository files navigation

pytwolc

Experimental two-level rule compilation using Python HFST. For more information, see https://github.com/hfst/python

Rule compiler: twol.py

The Python program twol.py is a rule compiler and tester for rules of simplified two-level model, see https://pytwolc.readthedocs.io/en/latest/formalism.html for more information on the rule formalism and the compiler. The HST package can be loaded using the command:

$ python3 -m pip install hfst

The program twol.py uses and depend on the 'tatsu' Python parser generator by Juancarlo Añez, seee http://tatsu.readthedocs.io/en/stable/index.html for detailed documentation. You can load and install TaTsu from the net using a command:

$ python3 -m pip install tatsu

The compiler needs two files: (1) examples as a FST and (2) a rule file. The human readable examples must be converted into a FST using twexamp.py program.

The compiler is normally executed as follows:

$ python3 twol.py examples.fst rules.twolc

One can get more information by using the --help parameter. More documentation on twol.py can be found at https://pytwolc.readthedocs.io/en/latest/compiletest.html

Converting examples from pair string format into a FST: twexamp.py

The module twexamp.py handles various tasks for the compiler during the compilation process. It is also needed for converting human readable examples into a FST so that ti is not necessary recompile it at every step of testing rules. A recompilation is only needed when the examples are changed. In order to convert examples from a pair string format into a fst you can e.g.:

$ python3 twexamp.py examples.pstr examples.fst

Morphophonemic representations

The sequence of programs parad2words.py, words2zerofilled.py, zerofilled2raw.py and raw2named.py is intended for determining the underlying or morphophonemic representations for word stems. It starts from a table of word forms or paradigms where morphs are separated from each other e.g. by a period (.). See https://pytwolc.readthedocs.io/en/latest/morphophon.html for more information on their use. Each program is run from the command line, and one can get detailed information on the parameters by running the command with a --help argument, e.g.

$ python3 words2zerofilled.py --help

Some of the programs of this sequence need the package orderedset which one can get from the net by

$ python3 -m pip install orderedset

Discovering raw rules: twdiscov.py

This program builds tentative or raw rules out of a set of examples. The examples must be given one example per line as a space-separated list of symbol pairs. See https://pytwolc.readthedocs.io/en/latest/twdiscov.html for more information.

About

Experimental two-level rule compilation using Python HFST

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.8%
  • Makefile 3.2%