Twitter_Sentiment_NLP

Overview

Sentiment Analysis is a NLP (Natural Language Processing) problem to determine whether the sentiment is positive or negative. In this case, we use twitter's sentiment to deterimine whether is it positive, negative, neutral or irrelevant.

App

I built this application using several tools, libraries and frameworks. Especially for notebook I used Google Colab to help me built the model from notebook

Tensorflow
pandas
seaborn
matplotlib
sklearn
numpy
zipfile
html
FastAPI

Run App

Go to web directory in terminal
Run the application using this command uvicorn app:app
Go to the link http://127.0.0.1:8000/

App Overview

Dataset

The dataset can be download in kaggle - Twitter Sentiment Analysis

There are 2 csv in this zipfile:

twitter_training.csv: for training model
twittter_validation.csv: for validation data

In this notebook I only use twitter_training.csv

Notebook

Exploratory Data Analysis

1. Show first five records in dataset

From that picture, we can see that there are no columns name, therefore we add name for each columns. From adding columns name can help us explore this dataset easily.

2. We check shape of dataset that there are 74681 rows and 4 columns

3. Check missing values in the dataset

We found that there are 686 missing values in tweet_content column, therefore we need to handle it, in this case I remove them and got 73995 rows and 4 columns after removing the missing values

4. Drop unnecessary columns

I dropped tweet_id and entity columns because we did not need that.

5. Check label

I checked label and sum of the values to prevent imbalanced data and the data seems balanced after checked the label data.

Data Preprocessing

1. One Hot Encoding

First thing that I did in preprocessing steps is one hot encoding the label data, because label data is categorical data and not numerical. To handle this problem I used pd.get_dummies() to one hot encoding label or sentiment column.

2. Change column into numpy array

To process the dataset we need to change each columns into numpy array to helps us tokenize them.

3. Split data

Then I splitted the dataset into train_tweet, test_tweet, train_label and test_label with train size 80% and test size 20% and random_state = 42. Then we got this shape:

4. Tokenizer

After that I am using tokenizer to tokenize train_tweet with num_words is 10000 and change unknown character with . After fit tokenizer into train_tweet, I am using tokenizer.texts_to_sequences() to change texts in train_tweet and test_tweet into sequences.

5. Add padding

To handle different length in each sequences I used pad_sequences in train_sequences and test_sequences to padding each sequence with parameter max_len = 150, padding='post' therefore the additional values or padding can be in the back of sequence, and truncating = 'post' therefore we crop sequence from back.

Build Model

I build model with tensorflow and using Embedding layers and Bidirectional LSTM layers to help me train my model I used input_dim = 10000, output_dim = 16, and input_length = 150.

Evaluate Model

I am using callback and my callback stop the training in 9 epochs and I got 92% accuracy, 84% val_accuracy, 19% loss and 59% val_loss.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
__pycache__		__pycache__
model		model
web		web
README.md		README.md
twitter_sentiment_analysis.ipynb		twitter_sentiment_analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter_Sentiment_NLP

Overview

App

Run App

App Overview

Dataset

Notebook

Exploratory Data Analysis

Data Preprocessing

Build Model

Evaluate Model

About

Releases

Packages

Languages

lixx21/Twitter_Sentiment_NLP

Folders and files

Latest commit

History

Repository files navigation

Twitter_Sentiment_NLP

Overview

App

Run App

App Overview

Dataset

Notebook

Exploratory Data Analysis

Data Preprocessing

Build Model

Evaluate Model

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages