Knowledge-Based Recommender System with Multi-Armed Bandit Optimization
A hybrid movie recommendation system that combines semantic knowledge from DBpedia/Wikidata with adaptive Multi-Armed Bandit strategies for personalized recommendations.
- Overview
- Features
- Architecture
- Installation
- Quick Start
- Project Structure
- How It Works
- Evaluation
- Results
- Documentation
- License
CineWisdom is a research project that implements a Knowledge-Based Recommender System (KBRS) enhanced with Multi-Armed Bandit (MAB) algorithms for adaptive strategy selection. The system leverages:
- Semantic Knowledge: Movie metadata enriched from DBpedia and Wikidata (directors, actors, genres, themes)
- Hybrid Strategies: Multiple recommendation approaches (collaborative filtering, content-based, exploration)
- Adaptive Learning: Thompson Sampling MAB to dynamically select the best strategy per user
- Real-time Feedback: Online learning from user interactions
- Semantic Filtering: Recommendations based on shared directors, actors, or genres
- Exploration vs Exploitation: Balances familiar recommendations with discovery
- Online Adaptation: Learns user preferences in real-time without retraining
- Knowledge Graph Integration: Enriches movie features with linked open data
- β Knowledge-Based Recommendations: Semantic similarity using movie metadata
- β Multi-Armed Bandit: Thompson Sampling for strategy selection
- β
Hybrid Strategies:
- Exploitation (high similarity)
- Exploration (low similarity for discovery)
- Semantic filtering (director, cast, genre)
- β DBpedia/Wikidata Integration: Automated SPARQL queries for metadata enrichment
- β Offline & Online Evaluation: Comprehensive metrics (RMSE, MAE, Reward, Coverage)
- β Visualization: Learning curves, strategy selection, reward evolution
- π Parallel Processing: Multi-core data preprocessing
- π Cosine Similarity Matrix: Pre-computed for fast recommendations
- π¨ Feature Engineering: One-hot encoding for genres, cast, directors
- π Performance Tracking: Real-time metrics during simulation
- πΎ Persistent Storage: Saves models, results, and history
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CineWisdom System β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββ ββββββββββββββββ βββββββββββββ β
β β Data β β KBRS β β MAB β β
β β Manager βββββββΆβ Engine βββββββΆβ Selector β β
β ββββββββββββββββ ββββββββββββββββ βββββββββββββ β
β β β β β
β β β β β
β ββββββΌβββββ βββββββΌββββββ βββββββΌββββββ β
β βDBpedia/ β β Cosine β β Thompson β β
β βWikidata β βSimilarity β β Sampling β β
β βββββββββββ βββββββββββββ βββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Online Simulator β β
β β β’ User Interaction Loop β β
β β β’ Strategy Selection (MAB) β β
β β β’ Reward Calculation β β
β β β’ Real-time Learning β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Evaluator β β
β β β’ Offline Metrics (RMSE, MAE) β β
β β β’ Online Metrics (Reward, Regret) β β
β β β’ Visualization & Reports β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- Python 3.8+
- 8GB+ RAM (for similarity matrix computation)
- Internet connection (for DBpedia/Wikidata enrichment)
# Clone repository
git clone https://github.com/yourusername/CineWisdom.git
cd CineWisdom
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Download MovieLens dataset (if not included)
# Place in datasets/ml-small-100k/raw/pandas>=1.5.0
numpy>=1.23.0
scikit-learn>=1.2.0
scipy>=1.10.0
matplotlib>=3.6.0
seaborn>=0.12.0
SPARQLWrapper>=2.0.0
tqdm>=4.64.0
python kbrs_pipeline.py --dataset ml-small-100kThis will:
- Load MovieLens data
- Enrich movies with DBpedia/Wikidata metadata
- Normalize features and create similarity matrix
- Split data (train/val/test/online)
- Evaluate KBRS offline
- Run online MAB simulation
- Generate results and plots
If you already have enriched data:
python kbrs_pipeline.py --dataset ml-small-100k --skip-enrichmentFor quick testing:
python kbrs_pipeline.py --dataset ml-small-100k --limit 500# Check results directory
ls -lh results/ml-small-100k/semantic_mab/
# View experiment report
cat results/ml-small-100k/semantic_mab/EXPERIMENT_REPORT.md
# View plots
open results/ml-small-100k/semantic_mab/plots/CineWisdom/
βββ kbrs_pipeline.py # Main entry point
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ TESTING.md # Testing documentation
βββ PROJECT_STRUCTURE.md # Detailed structure
β
βββ src/ # Source code
β βββ recommender/
β β βββ kbrs.py # KBRS core engine
β βββ simulation/
β β βββ kbrs_simulator.py # Online MAB simulator
β βββ evaluation/
β β βββ kbrs_evaluator.py # Evaluation metrics
β β βββ metrics.py # Generic metrics (RMSE, MAE, NDCG)
β βββ data/
β β βββ manager.py # Data loading & preprocessing
β β βββ split_manager.py # Train/val/test splitting
β β βββ sparql.py # DBpedia/Wikidata queries
β β βββ templates.py # SPARQL query templates
β β βββ mapping.py # Data mapping utilities
β βββ viz/
β β βββ plot_manager.py # Visualization tools
β βββ core/
β βββ config.py # Configuration classes
β
βββ datasets/ # Data storage
β βββ ml-small-100k/
β βββ raw/ # Original MovieLens CSVs
β βββ processed/ # Enriched & normalized data
β βββ splits/ # Train/val/test/online splits
β
βββ models/ # Saved models
β βββ kbrs/
β βββ ml-small-100k/
β βββ cosine_sim_matrix.npy
β βββ movie_ids.csv
β
βββ results/ # Experiment results
βββ ml-small-100k/
βββ semantic_mab/
βββ online_history.csv
βββ offline_evaluation.json
βββ online_evaluation.json
βββ EXPERIMENT_REPORT.md
βββ plots/
Movies are enriched with semantic metadata from DBpedia/Wikidata:
# SPARQL query to DBpedia
SELECT ?director ?actor ?genre ?abstract
WHERE {
?film owl:sameAs wd:Q12345 .
?film dbo:director ?director .
?film dbo:starring ?actor .
...
}- One-hot encoding for genres, directors, actors
- TF-IDF for text features (abstracts, themes)
- Normalization of numerical features (runtime, release year)
- Cosine similarity matrix (9742 Γ 9742 for ml-small-100k)
For a user with history H = {m1, m2, ..., mn}:
- Aggregate user profile: Average feature vectors of liked movies
- Compute similarity: Cosine similarity between profile and all movies
- Apply strategy:
- Exploitation: Top-K most similar movies
- Exploration: Bottom-K similar movies (for discovery)
- Semantic: Filter by shared director/cast/genre
- Predict rating: Weighted average based on similarity
Thompson Sampling selects the best strategy:
# For each strategy i:
Ξ±_i = successes + 1
Ξ²_i = failures + 1
ΞΈ_i ~ Beta(Ξ±_i, Ξ²_i)
# Select strategy with highest ΞΈ
strategy = argmax(ΞΈ_i)For each user interaction:
1. MAB selects strategy
2. KBRS generates recommendations
3. User rates a movie
4. Compute reward (based on rating accuracy)
5. Update MAB statistics
6. Repeat
Evaluated on test set (unseen user-movie pairs):
- RMSE (Root Mean Square Error): Prediction accuracy
- MAE (Mean Absolute Error): Average prediction error
- Coverage: % of items that can be recommended
- Precision@K: Relevant items in top-K
- NDCG@K: Ranking quality
Evaluated during simulation:
- Cumulative Reward: Total reward over time
- Mean Reward: Average reward per interaction
- Exploration Rate: % of exploration strategies selected
- Exploitation Rate: % of exploitation strategies selected
- Cumulative Regret: Difference from optimal strategy
- Strategy Performance: Per-strategy reward statistics
- Learning curves (RMSE, reward over time)
- Strategy selection evolution
- Reward distribution by strategy
- Cumulative regret
- Error distribution
RMSE: 0.93
MAE: 0.72
Coverage: 100%
Final RMSE: 0.89
Mean Reward: 0.85
Exploration Rate: 28%
Exploitation Rate: 22%
Semantic (Genre): 30%
Semantic (Cast): 20%
- MAB Adaptation: System learns to prefer semantic strategies for most users
- Exploration Value: 25-30% exploration maintains diversity without hurting accuracy
- Semantic Filtering: Genre-based filtering performs best (30% selection rate)
- Online Improvement: RMSE improves from 0.93 (offline) to 0.89 (online)
- TESTING.md: Complete testing guide and results
- PROJECT_STRUCTURE.md: Detailed architecture
- TECHNICAL_DOCS.md: Algorithm details and theory
- docs/plans/: Design documents and planning
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- MovieLens: GroupLens Research for the dataset
- DBpedia/Wikidata: Linked open data for movie metadata
- Thompson Sampling: Classic MAB algorithm for exploration-exploitation
For questions or feedback:
- Author: Marcello Russo
- Email: [your-email@example.com]
- GitHub: [github.com/marcellorussox/CineWisdom]
Built with β€οΈ for intelligent movie recommendations