Hybrid MySQL + MongoDB recommendation system for books in amazon for the Advanced Database course.
- Hybrid persistence: MySQL for transactional data, MongoDB for flexible user/book profiles.
- Automated ingestion: Kaggle download + split loaders for both databases.
- Recommendation engine: Content-based, collaborative, geographic and demographic recommendations. Ranging from simple and complex queries performed within the same database using one or multiple data sources (tables/collections), respectively, to hybrid queries combining both databases.
- Performance analysis: Query execution plans and optimizations for both databases.
- Optimization: Techniques applied to improve query performance and system efficiency in general, by applying indexing, query rewriting and possible schema adjustments.
data/–raw/,interim/,processed/(most kept locally, ignored in git).db/– MySQL table schemas, MongoDB schema docs.docs/– professor's guideline and architecture notes.notebooks/– EDA, cleaning, merging, DB loading, etc.
- Setup env
python -m venv .venv # for Linux/Mac source .venv/bin/activate # for Windows .venv\Scripts\activate pip install -r requirements.txt
- Configure
.envExample:MYSQL_HOST=localhost MYSQL_PORT=3306 MYSQL_USER=root MYSQL_PASSWORD=root MYSQL_DATABASE=bookrec MONGO_HOST=localhost MONGO_PORT=27017 MONGO_DATABASE=bookrec
Note: Ensure your .env file is correctly configured with your database credentials. Check also .env.example for reference.
- Download + ingest
python -m bookrec.cli download-kaggle python -m bookrec.cli ingest --data-dir data/raw --drop-existing
Open the notebooks in order to:
- 0. download and explore original datasets;
- 1. Preprocess datasets before merging;
- 2. Merging of datasets and final cleaning;
- 3. Load data into MySQL and MongoDB databases.
- 4. Generate recommendations for a user.
- TODO - rest of sections: concurency testing; performance analysis and optimization
- queries.md: contains a description of available queries and their usage.
- Query_execution.md: contains a description of how to execute the queries in the scripts folder.
- query_helper.py: contains query definitions for both MySQL and MongoDB, as well as functions to help in the queries' execution, performance and query plan analysis.
Open issues for improvements; follow course requirements for hybrid design.