Skip to content

marcello-russo/CineWisdom

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

105 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎬 CineWisdom

Knowledge-Based Recommender System with Multi-Armed Bandit Optimization

A hybrid movie recommendation system that combines semantic knowledge from DBpedia/Wikidata with adaptive Multi-Armed Bandit strategies for personalized recommendations.


πŸ“‹ Table of Contents


🎯 Overview

CineWisdom is a research project that implements a Knowledge-Based Recommender System (KBRS) enhanced with Multi-Armed Bandit (MAB) algorithms for adaptive strategy selection. The system leverages:

  • Semantic Knowledge: Movie metadata enriched from DBpedia and Wikidata (directors, actors, genres, themes)
  • Hybrid Strategies: Multiple recommendation approaches (collaborative filtering, content-based, exploration)
  • Adaptive Learning: Thompson Sampling MAB to dynamically select the best strategy per user
  • Real-time Feedback: Online learning from user interactions

Key Innovations

  1. Semantic Filtering: Recommendations based on shared directors, actors, or genres
  2. Exploration vs Exploitation: Balances familiar recommendations with discovery
  3. Online Adaptation: Learns user preferences in real-time without retraining
  4. Knowledge Graph Integration: Enriches movie features with linked open data

✨ Features

Core Capabilities

  • βœ… Knowledge-Based Recommendations: Semantic similarity using movie metadata
  • βœ… Multi-Armed Bandit: Thompson Sampling for strategy selection
  • βœ… Hybrid Strategies:
    • Exploitation (high similarity)
    • Exploration (low similarity for discovery)
    • Semantic filtering (director, cast, genre)
  • βœ… DBpedia/Wikidata Integration: Automated SPARQL queries for metadata enrichment
  • βœ… Offline & Online Evaluation: Comprehensive metrics (RMSE, MAE, Reward, Coverage)
  • βœ… Visualization: Learning curves, strategy selection, reward evolution

Technical Features

  • πŸš€ Parallel Processing: Multi-core data preprocessing
  • πŸ“Š Cosine Similarity Matrix: Pre-computed for fast recommendations
  • 🎨 Feature Engineering: One-hot encoding for genres, cast, directors
  • πŸ“ˆ Performance Tracking: Real-time metrics during simulation
  • πŸ’Ύ Persistent Storage: Saves models, results, and history

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     CineWisdom System                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚   Data       β”‚      β”‚    KBRS      β”‚      β”‚    MAB    β”‚ β”‚
β”‚  β”‚  Manager     │─────▢│   Engine     │◀────▢│  Selector β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚         β”‚                      β”‚                     β”‚       β”‚
β”‚         β”‚                      β”‚                     β”‚       β”‚
β”‚    β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”          β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β” β”‚
β”‚    β”‚DBpedia/ β”‚          β”‚ Cosine    β”‚        β”‚ Thompson  β”‚ β”‚
β”‚    β”‚Wikidata β”‚          β”‚Similarity β”‚        β”‚ Sampling  β”‚ β”‚
β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚              Online Simulator                         β”‚  β”‚
β”‚  β”‚  β€’ User Interaction Loop                             β”‚  β”‚
β”‚  β”‚  β€’ Strategy Selection (MAB)                          β”‚  β”‚
β”‚  β”‚  β€’ Reward Calculation                                β”‚  β”‚
β”‚  β”‚  β€’ Real-time Learning                                β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚              Evaluator                                β”‚  β”‚
β”‚  β”‚  β€’ Offline Metrics (RMSE, MAE)                       β”‚  β”‚
β”‚  β”‚  β€’ Online Metrics (Reward, Regret)                   β”‚  β”‚
β”‚  β”‚  β€’ Visualization & Reports                           β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“¦ Installation

Requirements

  • Python 3.8+
  • 8GB+ RAM (for similarity matrix computation)
  • Internet connection (for DBpedia/Wikidata enrichment)

Setup

# Clone repository
git clone https://github.com/yourusername/CineWisdom.git
cd CineWisdom

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Download MovieLens dataset (if not included)
# Place in datasets/ml-small-100k/raw/

Dependencies

pandas>=1.5.0
numpy>=1.23.0
scikit-learn>=1.2.0
scipy>=1.10.0
matplotlib>=3.6.0
seaborn>=0.12.0
SPARQLWrapper>=2.0.0
tqdm>=4.64.0

πŸš€ Quick Start

1. Run the Full Pipeline

python kbrs_pipeline.py --dataset ml-small-100k

This will:

  1. Load MovieLens data
  2. Enrich movies with DBpedia/Wikidata metadata
  3. Normalize features and create similarity matrix
  4. Split data (train/val/test/online)
  5. Evaluate KBRS offline
  6. Run online MAB simulation
  7. Generate results and plots

2. Skip Enrichment (Faster)

If you already have enriched data:

python kbrs_pipeline.py --dataset ml-small-100k --skip-enrichment

3. Limit Online Simulation

For quick testing:

python kbrs_pipeline.py --dataset ml-small-100k --limit 500

4. View Results

# Check results directory
ls -lh results/ml-small-100k/semantic_mab/

# View experiment report
cat results/ml-small-100k/semantic_mab/EXPERIMENT_REPORT.md

# View plots
open results/ml-small-100k/semantic_mab/plots/

πŸ“ Project Structure

CineWisdom/
β”œβ”€β”€ kbrs_pipeline.py              # Main entry point
β”œβ”€β”€ requirements.txt              # Python dependencies
β”œβ”€β”€ README.md                     # This file
β”œβ”€β”€ TESTING.md                    # Testing documentation
β”œβ”€β”€ PROJECT_STRUCTURE.md          # Detailed structure
β”‚
β”œβ”€β”€ src/                          # Source code
β”‚   β”œβ”€β”€ recommender/
β”‚   β”‚   └── kbrs.py              # KBRS core engine
β”‚   β”œβ”€β”€ simulation/
β”‚   β”‚   └── kbrs_simulator.py    # Online MAB simulator
β”‚   β”œβ”€β”€ evaluation/
β”‚   β”‚   β”œβ”€β”€ kbrs_evaluator.py    # Evaluation metrics
β”‚   β”‚   └── metrics.py           # Generic metrics (RMSE, MAE, NDCG)
β”‚   β”œβ”€β”€ data/
β”‚   β”‚   β”œβ”€β”€ manager.py           # Data loading & preprocessing
β”‚   β”‚   β”œβ”€β”€ split_manager.py     # Train/val/test splitting
β”‚   β”‚   β”œβ”€β”€ sparql.py            # DBpedia/Wikidata queries
β”‚   β”‚   β”œβ”€β”€ templates.py         # SPARQL query templates
β”‚   β”‚   └── mapping.py           # Data mapping utilities
β”‚   β”œβ”€β”€ viz/
β”‚   β”‚   └── plot_manager.py      # Visualization tools
β”‚   └── core/
β”‚       └── config.py            # Configuration classes
β”‚
β”œβ”€β”€ datasets/                     # Data storage
β”‚   └── ml-small-100k/
β”‚       β”œβ”€β”€ raw/                 # Original MovieLens CSVs
β”‚       β”œβ”€β”€ processed/           # Enriched & normalized data
β”‚       └── splits/              # Train/val/test/online splits
β”‚
β”œβ”€β”€ models/                       # Saved models
β”‚   └── kbrs/
β”‚       └── ml-small-100k/
β”‚           β”œβ”€β”€ cosine_sim_matrix.npy
β”‚           └── movie_ids.csv
β”‚
└── results/                      # Experiment results
    └── ml-small-100k/
        └── semantic_mab/
            β”œβ”€β”€ online_history.csv
            β”œβ”€β”€ offline_evaluation.json
            β”œβ”€β”€ online_evaluation.json
            β”œβ”€β”€ EXPERIMENT_REPORT.md
            └── plots/

πŸ”¬ How It Works

1. Data Enrichment

Movies are enriched with semantic metadata from DBpedia/Wikidata:

# SPARQL query to DBpedia
SELECT ?director ?actor ?genre ?abstract
WHERE {
  ?film owl:sameAs wd:Q12345 .
  ?film dbo:director ?director .
  ?film dbo:starring ?actor .
  ...
}

2. Feature Engineering

  • One-hot encoding for genres, directors, actors
  • TF-IDF for text features (abstracts, themes)
  • Normalization of numerical features (runtime, release year)
  • Cosine similarity matrix (9742 Γ— 9742 for ml-small-100k)

3. KBRS Recommendation

For a user with history H = {m1, m2, ..., mn}:

  1. Aggregate user profile: Average feature vectors of liked movies
  2. Compute similarity: Cosine similarity between profile and all movies
  3. Apply strategy:
    • Exploitation: Top-K most similar movies
    • Exploration: Bottom-K similar movies (for discovery)
    • Semantic: Filter by shared director/cast/genre
  4. Predict rating: Weighted average based on similarity

4. Multi-Armed Bandit

Thompson Sampling selects the best strategy:

# For each strategy i:
Ξ±_i = successes + 1
Ξ²_i = failures + 1
ΞΈ_i ~ Beta(Ξ±_i, Ξ²_i)

# Select strategy with highest ΞΈ
strategy = argmax(ΞΈ_i)

5. Online Learning

For each user interaction:
  1. MAB selects strategy
  2. KBRS generates recommendations
  3. User rates a movie
  4. Compute reward (based on rating accuracy)
  5. Update MAB statistics
  6. Repeat

πŸ“Š Evaluation

Offline Metrics

Evaluated on test set (unseen user-movie pairs):

  • RMSE (Root Mean Square Error): Prediction accuracy
  • MAE (Mean Absolute Error): Average prediction error
  • Coverage: % of items that can be recommended
  • Precision@K: Relevant items in top-K
  • NDCG@K: Ranking quality

Online Metrics

Evaluated during simulation:

  • Cumulative Reward: Total reward over time
  • Mean Reward: Average reward per interaction
  • Exploration Rate: % of exploration strategies selected
  • Exploitation Rate: % of exploitation strategies selected
  • Cumulative Regret: Difference from optimal strategy
  • Strategy Performance: Per-strategy reward statistics

Visualization

  • Learning curves (RMSE, reward over time)
  • Strategy selection evolution
  • Reward distribution by strategy
  • Cumulative regret
  • Error distribution

πŸ“ˆ Results

Typical Performance (MovieLens 100K)

Offline Evaluation

RMSE:     0.93
MAE:      0.72
Coverage: 100%

Online Simulation (20K interactions)

Final RMSE:        0.89
Mean Reward:       0.85
Exploration Rate:  28%
Exploitation Rate: 22%
Semantic (Genre):  30%
Semantic (Cast):   20%

Key Findings

  1. MAB Adaptation: System learns to prefer semantic strategies for most users
  2. Exploration Value: 25-30% exploration maintains diversity without hurting accuracy
  3. Semantic Filtering: Genre-based filtering performs best (30% selection rate)
  4. Online Improvement: RMSE improves from 0.93 (offline) to 0.89 (online)

πŸ“š Documentation


🀝 Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments

  • MovieLens: GroupLens Research for the dataset
  • DBpedia/Wikidata: Linked open data for movie metadata
  • Thompson Sampling: Classic MAB algorithm for exploration-exploitation

πŸ“§ Contact

For questions or feedback:


Built with ❀️ for intelligent movie recommendations

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •