LDA topic modeling for Polar Deep Insights.
Copy the Dockerfile to the project foolder and run the following commands.
docker build -t pdi-topics .
To run a container we use the following command.
docker run -d -t -p 8888:8888 --name pdi-topics pdi-topics
or if we want to run notebooks from a particular location we can just mount a volume
docker run -d -t -p 8888:8888 -v $MY_LOCAL_PATH:/opt/pdi-topics/notebooks --name pdi-topics pdi-topics
You'll need the jupyter token in order to access the notebooks, you can get it by inspecting the logs in the docker container
docker logs pdi-topics
- Follow steps on https://github.com/USCDataScience/sparkler to run Sparkler on a seed url or file.
- After execution completes, you can find the data indexed on http://localhost:8983/solr/#/crawldb/query
- Build the docker image and run it using the following command. You need to replace HOST-IP with your system’s IP address.
docker run -d -t --add-host=docker:{HOST-IP} -p 8888:8888 --name pdi-topics pdi-topics
- Run sparkler-pdi-topics.ipynb and sparkler-pdi-scikit-topics.ipynb notebooks to view results for Sparkler data.
If we want to avoid using Docker we can also run the topic notebooks by creating an environment using conda3 or miniconda3
conda env create -f environment.yml
now to use the notebooks we need to activate the environment and run jupyter
source activate pdi-topics
jupyter notebook --allow-root --notebook-dir=$MY_DIR --ip='0.0.0.0' --port=8888 --no-browser