Skip to content

alexnikop/cil-text-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CIL-text-classification

The code contains common practises, widely used in nlp tasks, that lead to a well performing sentiment analysis system, for ETH CIL Text classification challenge.

Preprocessing

The preprocessing step aims to reduce noise in the training and test data. It also groups training data in sentence-label pairs when exporting the final dataset.

Model

The model implements a dual-path architecture that uses both LSTMs and self implemented attention techniques in order to extract rich information from the input data.

Extra info

  • A bucketing technique is used to feed the data to the model. That way, different sentence lengths are handled efficiently without padding which slows down training time.
  • Glove 300d vectors are used to initialize word embeddings. Good results can be obtained by randomly intiailizing and training the ebedding matrix too.

Execute

In order to execute the code you need to set the required data paths on top of process_data.py. Then, sentiment.py can be executed, after process_data.py has created the required files (embeddings.pickle, final_train_dataset.txt, final_test_dataset.txt, word_index.txt)

About

An well performing example algorithm for ETH CIL Text classification challenge.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages