Skip to content

annnvv/hiking_upward_recommender

Repository files navigation

Hiking Upward Recommender

Status: In Progress

Purpose: The purpose of this repo is to create and deploy an app that recommends similar Hiking Upward hikes based on a Hiking Upward Hike (using a URL as an identifier).

I decided to do this project because I wasn't satisfied by the simple filtering page that Hiking Upward has here. I wanted to find similar hikes to the ones that I enjoyed and I wanted to learn about recommender systems, which is why I chose to take on this project.

App version 0 (complete) : Data scraping and recommendation done in notebooks, results exported in a csv. Streamlit app uses the csvs to display recommendations.

App version 1 (in progress): Convert notebooks to functions (maybe later classes), streamlit app runs the functions and caches the results and displays recommendations.

To see the app, please visit this link.

Data Source: webscraping Hiking Upward

TO DO:

  • load data into a vector database (maybe Annoy (Approximate Nearest Neighbors Oh Yeah) ??)
  • update docker file and streamlit to pull from vector database
  • on front end have an optional selection button for use

Skills used:

  • Web scraping using requests and beautifulsoup
  • Validating a dataframe using pandera
  • Content-based recommender systems using Apple's Turicreate
  • Creating reproducible virtual environments using poetry
  • Dockerizing the virtual environment
  • Deploying the recommender app using streamlit

Lessons learned:

  • The turicreate package doesn't run on windows (only mac, Linux, or Windox Linux subsystem)
    • This required the use of docker container.
  • Eventually, I decided against using turicreate. Some of the reasons:
    • Turicreate uses its own data types (such as SFrame and SArray)
    • Tucireate only supports up to python 3.8
    • There were issues with dependency resolution related to turicreate (mainly coremltools version 3.3 and tensorflow version)
    • It was difficult to figure out how to get a recommendation from the model using new data (maybe because of the data types)

Useful resources:

Poetry:

Useful command line command for poetry

poetry cache clear --all pypi ##(seem to speed up dependency resolution)

poetry env remove python ## delete poetry virtual env based on this [link](https://stackoverflow.com/questions/65783409/poetry-install-fails-with-envcommanderror-looks-for-version-2020-12-21-3-lambda) 

Docker:

Streamlit:

Pandera:

Turicreate: