Twitter US Airline Sentiment Classifier (Natural Language Processing #2)

You can find the dataset here

Objective

To classify tweets made by travelers in February 2015 as Neutral, Positive or Negative.

Random Forest

I used the random forest classifier as the problem dealt with a relatively large dataset. Random Forests are also great classifiers when it comes to dealing with a large number of features.

Using max_features = 1600 and n_estimators = 550

I got an accuracy of 75.4% (2208/2928)

However, is there a way to get better results?

Artificial Neural Networks

I decided to use an ANN with:

2 hidden layers (adding a third did not have a significant effect in this case)

Overall, the ANN resulted in better classifications with an accuracy ranging between

Accuracy ~ 76-77%

Conclusion

Finally, the results were not bad for the given dataset which contained many ambiguous/abbreviated tweets that would be difficult for a machine to interpret.

Walking through the Code

Random Forest

The steps taken were as follows:

Get the Dataset
Pre-process the text
Create the Bag of Words Model
Label Encode and OneHot Encode the Dependent Variable
Split the data into Test and Training sets
Train the Random Forest Classifier
Get the Predicted values of test set
Compare the predicted and test values and use a confusion matrix to calculate the accuracy of the model.
Accuracy = (number of correct predictions on testing data / total number of testing data)

ANN

The steps taken were as follows:

Get the Dataset
Pre-process the text
Create the Bag of Words Model
Label Encode and OneHot Encode the Dependent Variable
Split the data into Test and Training sets
Add Layers to your ANN
Compile the ANN
Get the Predicted values of test set
Compare the predicted and test values and use a confusion matrix to calculate the accuracy of the model.
Accuracy = (number of correct predictions on testing data / total number of testing data)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Flight_ANN.py		Flight_ANN.py
Flight_RandomForest.py		Flight_RandomForest.py
README.md		README.md
cm.JPG		cm.JPG

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter US Airline Sentiment Classifier (Natural Language Processing #2)

Objective

Random Forest

Using max_features = 1600 and n_estimators = 550

I got an accuracy of 75.4% (2208/2928)

Artificial Neural Networks

2 hidden layers (adding a third did not have a significant effect in this case)

Accuracy ~ 76-77%

Conclusion

Walking through the Code

Random Forest

ANN

About

Releases

Packages

Languages

DaveSyiemlieh/Air-Tweet-Classifier

Folders and files

Latest commit

History

Repository files navigation

Twitter US Airline Sentiment Classifier (Natural Language Processing #2)

Objective

Random Forest

Using max_features = 1600 and n_estimators = 550

I got an accuracy of 75.4% (2208/2928)

Artificial Neural Networks

2 hidden layers (adding a third did not have a significant effect in this case)

Accuracy ~ 76-77%

Conclusion

Walking through the Code

Random Forest

ANN

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages