ModifiedKneserNey

As part of an independent research project in natural language processing, I implemented a modified, interpolated Kneser-Ney smoothing algorithm. Looking online, I could not find a Kneser-Ney smoothing algorithm that met my exact needs, so I created my own.

What's special about my version:

It has a correction for out-of-vocabulary words, necessary for scoring probabilities for unseen n-grams
It estimates discount values based on training data instead of setting them to a fixed value of the typically used .75
It is super easy to use

Example

# let corpus represent a large string of training data
# let sentence represent a string that you wish to score

kn = ModifiedKneserNey()
kn.train(corpus)
kn.log_score_per_ngram(sentence)

# Done!:)

Requirements

Python 3, including:
- nltk
- numpy

References:

Stanley F. Chen, Joshua Goodman (1999), ”An empirical study of smoothing techniques for language modeling,” in Computer Speech and Language, vol. 13, Issue 4, pp. 359-394.
P. Taraba (2007), ”Kneser-Ney Smoothing With a Correcting Transformation for Small Data Sets,” in IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 6, pp. 1912-1921.
Heafield, Kenneth and Pouzyrevsky, Ivan and H Clark, Jonathan and Koehn, Philipp. (2013). ”Scalable Modified Kneser-Ney Language Model Estimation” in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistic, vol. 2, pp. 690-696.
Kneser, Reinhard and Hermann Ney (1995), ”Improved backing-off for M-gram language modeling.” ICASSP. D. Jurafsky and J. H. Martin (2017), ”Speech and Language Processing,” (Third Edition draft)

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
LICENSE		LICENSE
ModifiedKneserNey.py		ModifiedKneserNey.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ModifiedKneserNey

What's special about my version:

Example

Requirements

References:

About

Releases

Packages

Languages

License

epeake/ModifiedKneserNey

Folders and files

Latest commit

History

Repository files navigation

ModifiedKneserNey

What's special about my version:

Example

Requirements

References:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages