Skip to content

Latest commit

 

History

History
24 lines (17 loc) · 2.25 KB

README.md

File metadata and controls

24 lines (17 loc) · 2.25 KB

little-pos-tagger

Simple part of sentence tagger in Java, ported from a Python version explained and linked here.

This is something I did when trying to get a better understanding of POS taggers and build one to use myself with some Java code I have.

There area also several more mature POS taggers for Java such as OpenNLP and Stanford versions.

This is maybe a bit more simple and has some useful explanation you can follow from the Python link above. So maybe easier to get an idea of at least the basic concepts.

I originally tried this with the Finnish language and used the FinnTreeBank data to train the tagger. However, any language and similar datasets should probably work.

There is a Python script in the source tree that was used to parse the FinnTreeBank to suitable format for what this eats. There is also another Python script there that takes the same data and outputs a format suitable for OpenNLP. You can then try the different approaches if you like. And use them as a basis for some other treebanks I guess.. I couldn't quite figure out a good configuration for the Stanford tagger but it should be able to take one of the above inputs as well if you can create the config.

There is are examples in the examples package on how to train the tagger and how to use it for predictions.

The process of building this and trying to figure out what it is all about is explained in more detail here.