An introductory tutorial on how to do Natural Language Processing using NLTK (Natural Language Toolkit) in Python.
intro2NLP.ipynb: a Jupyter notebook which shows how to access, clean, and analyze a corpus using the nltk
library.
After accessing Jane Austen's Sense and Sensibility on the nltk.corpus
package, I preprocess the text by e.g. removing stopwords and punctuations, then plot the distribution of word frequency and apply sentiment analysis
using the textblob
library. To show how to do sentiment analysis using a classifier, I train a Naive Bayes classifier
on the movie_review
dataset available on nltk.corpus
.