Instructions

Run the model in command prompt using: python trainModel.py python predictModel.py

Load Project

Open Project in PyCharm IDE & Set project interpreter as Anaconda

Set-up Data

To Divide the training data into train & evaluate files, Run Script “splitData.py” a. Input file: train_test.csv b. Output file: train.csv, evaluate.csv

Variables to enter:

Randomize the division of data: yes/no, if selected yes ‘y’, enter any number greater than 0 Training data fraction: any fraction between 0-1 (e.g 0.8 divides the data into 80% training and 20% evaluation samples)

To inflate data by creating duplicates, Run script “InflateAndSampleData.py”. a. Input file: train.csv b. Output file: train_samples.csv c. Change the train_sample.csv file format as per the train data template, add ID column and save file as train.csv

Variables to set:

samples_count : Number of training samples per class REMOVE_EXTRA: remove extra samples if samples count for any class is greater than the given samples_count number

Model parameters

All the model parameters can be set/changed using the “settings.json” file:

EPOCHS_DEFAULT: Default epochs count for training
TOP_WORDS : Maximum number of words in Bag of words
BATCH_SIZE: Training batch size
MAX_WORDS_LIMIT: Maximum number of words in one text sample/answer
MINIMUM_WORDS_LENGTH: Minimum length of a word to be added to Bag of words
BASE_LR: Base learning rate
OPTIMIZER: Training optimizer
EMBEDDING_VECTOR_LENGTH: Length of embedding vector
CNN_NO_OF_FILTER: Number of filter in CNN
CNN_FILTER_LENGTH: filter length in CNN
CNN_POOL_LENGTH: Pooling size for max pooling
LSTM_CELLS_COUNT: Number of LSTM cells
DROPOUT: Drop out in the model

Train Model

Run Script “trainModel.py” a. Input file: train.csv b. Output file: Model & Data Pickles to be used to predictions (Model & PickleJar folder)

Once the trainModel file is executed user is prompted to enter following options:

Train new Model or Continue training previously trained Model: To continue training using the previously trained Model, enter “y” in the console. To train new model enter “n”
Number of Epochs Enter an integer number to set epochs for training the model. Leave blank to select default value from the settings.json file.

More classes can be added for training by adding more classes in the “CLASS” column of train.csv file. And the run trainModel script to train the Model with the updated classes structure.

Test/Predict

Run Script “predictModel.py” a. Input file: test.csv, b. Default inputs: Trained Model & Pickled data (Model & PickleJar folder) c. Output file: predictions.csv

hsakas/TextClassification

ErrorLooks like something went wrong!

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Error
Looks like something went wrong!

Packages