Status: In Progress
Purpose: The purpose of this repo is to create and deploy an app that recommends similar Hiking Upward hikes based on a Hiking Upward Hike (using a URL as an identifier).
I decided to do this project because I wasn't satisfied by the simple filtering page that Hiking Upward has here. I wanted to find similar hikes to the ones that I enjoyed and I wanted to learn about recommender systems, which is why I chose to take on this project.
App version 0 (complete) : Data scraping and recommendation done in notebooks, results exported in a csv. Streamlit app uses the csvs to display recommendations.
App version 1 (in progress): Convert notebooks to functions (maybe later classes), streamlit app runs the functions and caches the results and displays recommendations.
To see the app, please visit this link.
Data Source: webscraping Hiking Upward
TO DO:
- load data into a vector database (maybe Annoy (Approximate Nearest Neighbors Oh Yeah) ??)
- update docker file and streamlit to pull from vector database
- on front end have an optional selection button for use
- Web scraping using
requests
andbeautifulsoup
- Validating a dataframe using
pandera
- Content-based recommender systems using
Apple'sTuricreate
- Creating reproducible virtual environments using
poetry
Docker
izing the virtual environment- Deploying the recommender app using
streamlit
- The turicreate package doesn't run on windows (only mac, Linux, or Windox Linux subsystem)
- This required the use of docker container.
- Eventually, I decided against using turicreate. Some of the reasons:
- Turicreate uses its own data types (such as SFrame and SArray)
- Tucireate only supports up to python 3.8
- There were issues with dependency resolution related to turicreate (mainly coremltools version 3.3 and tensorflow version)
- It was difficult to figure out how to get a recommendation from the model using new data (maybe because of the data types)
- Poetry Docs
- https://realpython.com/dependency-management-python-poetry/
- https://mungingdata.com/python/jupyter-workflow-poetry-pandas/
- https://pythonspeed.com/articles/poetry-vs-docker-caching/
Useful command line command for poetry
poetry cache clear --all pypi ##(seem to speed up dependency resolution)
poetry env remove python ## delete poetry virtual env based on this [link](https://stackoverflow.com/questions/65783409/poetry-install-fails-with-envcommanderror-looks-for-version-2020-12-21-3-lambda)