Natural Language Processing

Project: Cipher / Decipher Katakana and English

Following Kevin Knight's tradition, understand and implement followings:

finite-state machines (weighted FSAs and FSTs)
syntactic structures (weighted context-free grammars and parsing algorithms)
machine learning methods (maximum likelihood and expectation-maximization)
modern quantitative techniques in NLP that use large corpora and statistical learning
various dynamic programming algorithms (Viterbi, CKY, Forward-Backward, and Inside-Outside)
Japanese language as a running example to demonstrate the linguistic diversity, to illustrate transliteration and translation, and to understand the Viterbi and EM algorithms
For the linguistic background of Japanese, please see this video.
For finite-state toolkit, USC ISI's CARMEL is used.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
0. n-gram language model		0. n-gram language model
1. FSAs_FSTs_recovering spaces and vowels		1. FSAs_FSTs_recovering spaces and vowels
2. English pronunciation, part-of-speech tagging as composition, Katakana-to-English (back)transliteration		2. English pronunciation, part-of-speech tagging as composition, Katakana-to-English (back)transliteration
3. Viterbi decoding for POS tagging and Katakana-to-English (back)transliteration		3. Viterbi decoding for POS tagging and Katakana-to-English (back)transliteration
4. EM to learn Katanana-English correspondence; EM for decipherment		4. EM to learn Katanana-English correspondence; EM for decipherment
5. Syntax, CFG, and CKY parsing		5. Syntax, CFG, and CKY parsing
6. Recurrent Neural Language Models_Beam Search		6. Recurrent Neural Language Models_Beam Search
README.md		README.md