You will need the following things properly installed on your computer.
git clone https://github.com/smorzhov/hour_of_code_2019.git
-
Download pretrained glove.840B.300d model (2.03 GB). Unzip it into
src/data
directory. -
If you plan to use nvidia-docker, you need to build nvidia-docker image first. Otherwise, you can skip this step
nvidia-docker build -t sm_keras_tf_py3:gpu .
Run container (run this command in the same directory where Dockerfile is)
nvidia-docker run --user $(id -u):$(id -g) -dt --name sm_hoc -m 50GB -v $(pwd)/src:/$(basename $(pwd)) -w /$(basename $(pwd)) sm_keras_tf_py3:gpu /bin/bash
-
Cleaning dataset
nvidia-docker exec --env CUDA_VISIBLE_DEVICES='0' sm_hoc python3 -u nlp.py prepare-data
-
Training
By default, only the 0th GPU is visible for the docker container. You can change this by passing
--env
option toexec
. For example:nvidia-docker exec --env CUDA_VISIBLE_DEVICES='0' sm_hoc python3 -u nlp.py train --data-path ./processed_data
You can add some custom stop words. They must be placed in ~src/data/stopwords.txt
file (one word per line).