Skip to content

Commit 39918ee

Browse files
committed
Update README with more information
1 parent 4976852 commit 39918ee

File tree

3 files changed

+34217
-4
lines changed

3 files changed

+34217
-4
lines changed

.gitignore

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
bin/
2-
WSJ_24.pos
32
.vscode
43
.settings
54
.classpath

README.md

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
1-
# Viterbi POS Tagger
1+
# HMM POS Tagger with Viterbi Decoding
22

3-
Tested using Java 11.
3+
## To Run
4+
5+
This was primarily tested on Java 11.
6+
7+
### Compile
48

59
To run from the command line, make sure you are in the root directory `viterbi`. This is the directory that contains `src` and `WSJ_POS_CORPUS_FOR_STUDENTS`.
610
Run
@@ -9,11 +13,22 @@ Run
913
javac src/viterbi/*.java
1014
```
1115

16+
### Run
17+
1218
Then to train and evaluate
1319

1420
```bash
1521
java -cp src viterbi.WSJPOSTagger WSJ_POS_CORPUS_FOR_STUDENTS/WSJ_02-21.pos TEST_FILE MAX_SUFFIX_LENGTH MAX_WORD_FREQUENCY
1622
```
1723

1824
where `TEST_FILE` is the file with sentences that you want to tag, `MAX_SUFFIX_LENGTH` is the maximum suffix length to use for the suffix tree and
19-
`MAX_WORD_FREQUENCY` is the maximum word frequency as found in the training set of the words to use for the suffix tree.
25+
`MAX_WORD_FREQUENCY` is the maximum word frequency as found in the training set of the words to use for the suffix tree.
26+
27+
## Implementation Details
28+
29+
This is a Hidden Markov Model part of speech tagger that uses the Viterbi algorithm for decoding.
30+
The model is trained on the Wall Street Journal POS corpus and attempts to handle unknown words by performing suffix analysis using suffix trees as described by [(Brants, 2000)](#brants).
31+
32+
## References
33+
34+
<a id="brants"></a> Brants, T. (2000). TnT: A statistical part-of-speech tagger. In *ANLP 2000*, Seattle, WA, pp. 224–231.

0 commit comments

Comments
 (0)