deepspin

History

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
deepspin.sh		deepspin.sh
requirements.txt		requirements.txt
run.sh		run.sh
run_all.sh		run_all.sh
tokenize.py		tokenize.py

README.md

About

This allows for training and predicting morphemes on the 2022SigmorphonST task using either an LSTM or Transformer architecture, along with either a character-level tokenization or a subword tokenization (via sentencepiece).

Training and Evaluation

To train, segment and evaluate the command arguments are:

./run.sh <language code> <architecture> <tokenization> <OPTIONAL: subword vocab size>

Example:

./run.sh hun lstm chars
./run.sh hun transformer subwords
# OR
./run.sh hun lstm subwords
./run.sh hun transformer chars
# OR to change subword vocab target size:
./run.sh hun lstm subwords 200 # Default will be 6000

Language options include:

Language	Language code
English	eng
French	fra
Hungarian	hun
Italian	ita
Latin	lat
Mongolian	mon
Russian	rus
Spanish	spa

output will be something like:

category: all
distance	0.34
f_measure	95.44
precision	95.05
recall	95.83

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

deepspin

deepspin

README.md

About

Training and Evaluation

Files

deepspin

Directory actions

More options

Directory actions

More options

Latest commit

History

deepspin

Folders and files

parent directory

README.md

About

Training and Evaluation