CIL-text-classification

The code contains common practises, widely used in nlp tasks, that lead to a well performing sentiment analysis system, for ETH CIL Text classification challenge.

Preprocessing

The preprocessing step aims to reduce noise in the training and test data. It also groups training data in sentence-label pairs when exporting the final dataset.

Model

The model implements a dual-path architecture that uses both LSTMs and self implemented attention techniques in order to extract rich information from the input data.

Extra info

A bucketing technique is used to feed the data to the model. That way, different sentence lengths are handled efficiently without padding which slows down training time.
Glove 300d vectors are used to initialize word embeddings. Good results can be obtained by randomly intiailizing and training the ebedding matrix too.

Execute

In order to execute the code you need to set the required data paths on top of process_data.py. Then, sentiment.py can be executed, after process_data.py has created the required files (embeddings.pickle, final_train_dataset.txt, final_test_dataset.txt, word_index.txt)

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
process_data.py		process_data.py
sentiment.py		sentiment.py
sentiment_model.py		sentiment_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CIL-text-classification

Preprocessing

Model

Extra info

Execute

About

Uh oh!

Releases

Packages

Languages

alexnikop/cil-text-classification

Folders and files

Latest commit

History

Repository files navigation

CIL-text-classification

Preprocessing

Model

Extra info

Execute

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages