Run the model in command prompt using: python trainModel.py python predictModel.py
- Open Project in PyCharm IDE & Set project interpreter as Anaconda
- To Divide the training data into train & evaluate files, Run Script “splitData.py” a. Input file: train_test.csv b. Output file: train.csv, evaluate.csv
Randomize the division of data: yes/no, if selected yes ‘y’, enter any number greater than 0 Training data fraction: any fraction between 0-1 (e.g 0.8 divides the data into 80% training and 20% evaluation samples)
- To inflate data by creating duplicates, Run script “InflateAndSampleData.py”. a. Input file: train.csv b. Output file: train_samples.csv c. Change the train_sample.csv file format as per the train data template, add ID column and save file as train.csv
samples_count : Number of training samples per class REMOVE_EXTRA: remove extra samples if samples count for any class is greater than the given samples_count number
All the model parameters can be set/changed using the “settings.json” file:
- EPOCHS_DEFAULT: Default epochs count for training
- TOP_WORDS : Maximum number of words in Bag of words
- BATCH_SIZE: Training batch size
- MAX_WORDS_LIMIT: Maximum number of words in one text sample/answer
- MINIMUM_WORDS_LENGTH: Minimum length of a word to be added to Bag of words
- BASE_LR: Base learning rate
- OPTIMIZER: Training optimizer
- EMBEDDING_VECTOR_LENGTH: Length of embedding vector
- CNN_NO_OF_FILTER: Number of filter in CNN
- CNN_FILTER_LENGTH: filter length in CNN
- CNN_POOL_LENGTH: Pooling size for max pooling
- LSTM_CELLS_COUNT: Number of LSTM cells
- DROPOUT: Drop out in the model
Run Script “trainModel.py” a. Input file: train.csv b. Output file: Model & Data Pickles to be used to predictions (Model & PickleJar folder)
- Train new Model or Continue training previously trained Model: To continue training using the previously trained Model, enter “y” in the console. To train new model enter “n”
- Number of Epochs Enter an integer number to set epochs for training the model. Leave blank to select default value from the settings.json file.
More classes can be added for training by adding more classes in the “CLASS” column of train.csv file. And the run trainModel script to train the Model with the updated classes structure.
Run Script “predictModel.py” a. Input file: test.csv, b. Default inputs: Trained Model & Pickled data (Model & PickleJar folder) c. Output file: predictions.csv