Skip to content

Hybrid MySQL + MongoDB recommendation system for books in amazon. Data obtained by carefully merging and processing Kaggle datasets.

License

Notifications You must be signed in to change notification settings

PFans-201/Book_rec_system

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Book Recommendation Project

Hybrid MySQL + MongoDB recommendation system for books in amazon for the Advanced Database course.

Python License

Highlights

  • Hybrid persistence: MySQL for transactional data, MongoDB for flexible user/book profiles.
  • Automated ingestion: Kaggle download + split loaders for both databases.
  • Recommendation engine: Content-based, collaborative, geographic and demographic recommendations. Ranging from simple and complex queries performed within the same database using one or multiple data sources (tables/collections), respectively, to hybrid queries combining both databases.
  • Performance analysis: Query execution plans and optimizations for both databases.
  • Optimization: Techniques applied to improve query performance and system efficiency in general, by applying indexing, query rewriting and possible schema adjustments.

Repository map

  • data/raw/, interim/, processed/ (most kept locally, ignored in git).
  • db/ – MySQL table schemas, MongoDB schema docs.
  • docs/ – professor's guideline and architecture notes.
  • notebooks/ – EDA, cleaning, merging, DB loading, etc.

Quickstart

  1. Setup env
    python -m venv .venv
    # for Linux/Mac
    source .venv/bin/activate
    
    # for Windows
    .venv\Scripts\activate
    
    pip install -r requirements.txt
  2. Configure .env Example:
    MYSQL_HOST=localhost
    MYSQL_PORT=3306
    MYSQL_USER=root
    MYSQL_PASSWORD=root
    MYSQL_DATABASE=bookrec
    MONGO_HOST=localhost
    MONGO_PORT=27017
    MONGO_DATABASE=bookrec

Note: Ensure your .env file is correctly configured with your database credentials. Check also .env.example for reference.

  1. Download + ingest
    python -m bookrec.cli download-kaggle
    python -m bookrec.cli ingest --data-dir data/raw --drop-existing

Usage

Open the notebooks in order to:

  • 0. download and explore original datasets;
  • 1. Preprocess datasets before merging;
  • 2. Merging of datasets and final cleaning;
  • 3. Load data into MySQL and MongoDB databases.
  • 4. Generate recommendations for a user.
  • TODO - rest of sections: concurency testing; performance analysis and optimization

Queries

  • queries.md: contains a description of available queries and their usage.
  • Query_execution.md: contains a description of how to execute the queries in the scripts folder.
  • query_helper.py: contains query definitions for both MySQL and MongoDB, as well as functions to help in the queries' execution, performance and query plan analysis.

Contributing

Open issues for improvements; follow course requirements for hybrid design.

About

Hybrid MySQL + MongoDB recommendation system for books in amazon. Data obtained by carefully merging and processing Kaggle datasets.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published