I deal with unstructured stackoverflow issues data (~60 000 collected ml related questions). I process it using NLP techniques and do a short data visualization. Then write a model based on Word2Vec's Skip-Gram model to find k the most similar to main query questions and estimate these models on a small test dataset with HitsCount and nDCG scores.
There are several interactive plots made with plotly in the notebook and they don't show on GitHub, but you can use nbviewer or run it locally in trusted mode to see them all.
To create virtual environment with all dependecies needed for notebook:
conda env create -n ENV_NAME --file environment.yml
Create virtual environment using python module venv, pipenv or virtualenv and install packages with the following command:
pip install -r requirements.txt
For more details about metrics see in the notebook.