Skip to content

An app for folks to get personalized trip recommendations and itineraries based on their profile and interests.

Notifications You must be signed in to change notification settings

Diptonil/wandermatch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WanderMatch

1: Features

  1. Ability to create a profile that would be used to make trip recommendations.
  2. Ability to create, update, join, and leave various trips posted on the app.
  3. Ability to perform advanced trip searches powered by ElasticSearch and Kafka.
  4. Ability to generate audit logs for every action concerning business logic powered by Kafka.
  5. Ability to auto-generate an itinerary for a given trip under a given budget using a lightweight LLama3.2 model, powered by Kafka.
  6. Ability to perform trip recommendations for a particular user to suit their profile (under their budget as well as calendar availability) where they get matches to a trip that align with their interests as well as have a trip organizer they are likely to get along well with. This is done by a two-step customized recommendation system using filtering and ranking on the basis of weighted heuristics (geospatial scoring and Jaccard similarity).
  7. Ability to run BI queries on top of business data.

2: Components

  1. Postgres: The SQL DB.
  2. ElasticSearch: To assist in advanced search queries.
  3. Kafka: For event-driven architecture.
  4. LLaMa3.2:1b: LLM for itinerary planning.
  5. Metabase: For BI.

3: Installation

  1. Just run a docker compose up -d to get the project up and running.
  2. Start the API with ./wandermatch-api.
  3. Fire some cURL requests. Endpoints and samples are up on ENDPOINTS.md.

4: Kafka Uses

Kafka is used as a Pub/Sub queue for ingestion of events issued across various components of the system.

  1. AI Itinerary Generation: Trip itineraries based on the budget and calendars happen via an LLM call (the LLM is hosted in the stack). The process of generation as well as committing it to the DB takes quite a few seconds. In an effort to decouple the execution of such heavy-weight parts of the application from the core request-response cycle, an event is issued to Kafka, which gets consumed async to make the LLM calls happen.
  2. Advanced Searches: Fuzzy word search happens via ElasticSearch. However, insertion of data to ES requires the data to be segmented and transformed to a format acceptable by the search engine, which happens async via event consumption by Kafka.
  3. Audit Trails: Events are published to Kafka, which saves audit trails and relevant data associated to the event async using Kafka consumers after careful transformations and evaluations.

5: Metabase Uses

Metabase is used to provide business intelligence and insights into the data for analytical purposes. Below we have insights on the following business use-cases:

  • Trends for creation of new trips.
  • Geographic information on users.
  • Most popular actions on the app, shown from audit logs.

Trip Analytics User Analytics Audit Trails

6: Recommendation System

  • Recommendation works in a two-stage Funnel process:
    • Filtering Phase: Scoring functions contain a lot of math and are expensive to be run on the entire database. Hence, hard filters are taken (such as preferred budget range as well as number of available days) to narrow down the number of results on which scoring needs to be done.
    • Scoring & Ranking: Based on the algorithm below, trips are scored and sorted and then presented to the user as recommendations.
  • A Content-Based Filtering System with weighted heuristics justifies the current use case. The weighted factors are:
    • Jaccard Similarity Score of interests of the user as well as the trip tags. Weight = 0.3
    • Jaccard Similarity Score of interests of the user as well as the interests of the trip organizer. Weight = 0.3
    • Distance Decay Score of the user against the trip organizer. Weight = 0.2
    • Monetary Minimization score. Weight = 0.2
  • Jaccard Similarity is a famous classic measure that the common interests of both parties (set intersection) divided by the total distinct interests of both parties (set union). This is used in many cases for score-based filtering.
  • Distance Decay Scoring technique generates a score which is higher if the distance between two coordinates are lower. So it penalizes greater distance. This is implemented using the Haversine function.
  • Big Tech generally uses deep learning and neural networks in these cases for the best results. The reason of not implementing it is due to the insuccifient data. DL algorithms need to be trained on large sets of data to have any efficacy. That is not the case here, so a content-based filtering with weights is the next best thing.

7: Database Problems

  1. N + 1 Problem: The n+1 database problem concerns the situation where we consider storing lists or objects within database tables for easy referencing. The ideal solution to this is to use JOINs in code. This is how we get a list of associated items using foreign keys. This is also how I solved the problem for getting a list of Trips from the foreign keys associated with the list of TripMembers.
  2. N + 1 Problem, With Need of Referencing: In these cases it is important to "preload" and not join. This essentially means to create a field to have a nested object built into it so that referencing becomes easy later down the line. This comes with the con of bloating up payloads so should be done sparingly, only if the advantage is worth it. In my case, I saved N + 1 API calls and a huge, complex join that would have needed way too many fields.

8: Improvements (In Progress)

  1. Versioned Migrations: Running auto-migrate is good for rapid development but for production, it is best to use versioned migrations. In database/migrations, we create, for every addition or modification, 2 SQL files using this command: migrate create -ext sql -dir db/migrations -seq create_users_table. .up.sql for specifying what actually happens when we run the query. .down.sql for rollbacks. Everything happens via the migrate container in the Compose setup. A very clean way of operating DBs.
  2. Switch to UUIDs: Of course, incremental IDs pose great security risks as well as issues with how distributed systems work. This is an easy fix: replace all such IDs with UUIDs.
  3. Separate Users & Profiles: There is no need to load the entire information associated to a user's social profile at the time of authentication or other basic security settings, so this needs to be separated.
  4. Normalized Interests: To address analytics, it is good to have a separate table for interests instead of string with comma separated values, as per standard practices. This enables more business-forward analytical questions to be answered such as how many people have "painting" as an interest?
  5. DB Indexes: Currently, only the IDs are needed to be indexed. Down the line there may be some analytical queries (business-forward) such as "How many users are from Paris?". To address queries of such nature, it is best to add indexes to these columns, as sparingly as possible.
  6. Native-Postgres Types: Postgres uses some special data types to speed up geospatial calculations, like geography. That can be used instead of latitude and longitude.
  7. Containerize the API: This would just entail adding a docker-compose section to containerize the app as well.

About

An app for folks to get personalized trip recommendations and itineraries based on their profile and interests.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors