Skip to content

Get the conditional probability of next word for a given Context using Ngram Model

Notifications You must be signed in to change notification settings

rikenshah/predict-next-word

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Predict Next Word

This is a N-gram language model that predicts the next word based on a precalculated conditional probability. This model is trained on bigrams from COCA corpus which can be downloaded from this link.

Steps to run this script

  1. Download the bigram dataset from above provided link. You will need to register with an email ID for doing so. There are three files for bigram model.
  • Non case sensitive - w2_.zip
  • Case sensitive - w2.zip
  • Case sensitive with POS tagging - w2c.zip

We need only the Non case sensitive zip.

  1. Extract the Non case sensitive zip. You will get w2_.txt file. Put this file in the same folder where the scripts 'getTopBigram.py' and 'bigramConditionalProbability.py' are there.

  2. Make a new directory named pickleDumps.

  3. Run 'bigramConditionalProbability.py' file using python bigramConditionalProbability.py. Note that you will need to have pickle module installed before that. You can install it using pip install pickle. (It would be recommended to have a virtual environment for such stuff). Previous command will create some pickle dumps in pickleDumps directory.

  4. Run python getTopBigram.py and enjoy :p.

Screenshot of Output

output

About

Get the conditional probability of next word for a given Context using Ngram Model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages