🔗 Live Demo: EGY-MOV-RECS App
EGY-MOV-RECS is a Python-based movie recommendation system focused on Egyptian cinema. The application uses content-based filtering to recommend Egyptian movies based on a given movie title. It leverages natural language processing techniques, specifically TF-IDF Vectorization and Cosine Similarity, to compare movie features such as genres, keywords, tagline, cast, and director.
This project is designed to help users discover Egyptian movies similar to their favorite ones, offering personalized movie suggestions based on movie attributes.
- Personalized Movie Recommendations: Provides movie recommendations based on a given movie title.
- Content-Based Filtering: Uses movie attributes such as genres, keywords, tagline, cast, and director to generate recommendations.
- Advanced Text Processing: Implements TF-IDF Vectorization with bi-grams and custom Arabic stop words for better accuracy.
- Arabic Text Normalization: Standardizes Arabic letters (e.g.,
أ
→ا
,إ
→ا
,ى
→ي
,ة
→ه
) for better matching. - Efficient Movie Matching: Uses Cosine Similarity to find the most relevant recommendations.
- Handles Data Issues: Fills missing values in the dataset to ensure stability.
Ensure you have the following Python libraries installed:
pandas
(For handling data)scikit-learn
(For TF-IDF and Cosine Similarity)numpy
(For numerical operations)
To install all dependencies at once, run:
pip install -r requirements.txt
The requirements.txt
file contains all necessary dependencies for the project.
- Clone the repository:
git clone https://github.com/aeyouseff/EGY-MOV-RECS.git
- Navigate to the project directory:
cd EGY-MOV-RECS
- Install dependencies:
pip install -r requirements.txt
- Ensure the dataset is available:
Make sure EgyptionMoviesDataset.csv
is in the project directory and contains these columns:
title
(Movie title)genres
(Genres of the movie)keywords
(Movie-related keywords)tagline
(Movie tagline)cast
(List of actors)director
(Movie director)
- Run the system:
python egyptian_movies_recommendation.py
- Import the recommendation function:
from egyptian_movies_recommendation import recommend_movie
- Call the function with a movie title:
movie_title = "الفيل الأزرق" # Example
recommended_movies = recommend_movie(movie_title)
print("Recommended Movies:", recommended_movies)
- Expected Output:
🎥 Recommended Movies: ['الفيل الأزرق 2', 'كيرة والجن', 'تراب الماس']
You can run the script directly in the terminal:
python egyptian_movies_recommendation.py
The system will prompt you to enter a movie name and display recommendations.
Example Interaction:
🎥 نظام توصية الأفلام المصرية
🔹 اكتب اسم فيلم مصري وشوف التوصيات لأفلام مشابهة.
🔍 أدخل اسم الفيلم: الفيل الأزرق
🎥 أفلام مشابهة:
✅ الفيل الأزرق 2
✅ كيرة والجن
✅ تراب الماس
--------------------------------------------------
## **How It Works**
### **1️⃣ Data Preprocessing**
- The dataset is loaded from `EgyptionMoviesDataset.csv`.
- Missing values in columns like `genres`, `keywords`, `tagline`, `cast`, and `director` are replaced with `"Unknown"` or `"No Data"`.
- Movie titles are **cleaned and normalized** using **`clean_text()`** to ensure accurate matching.
### **2️⃣ Feature Combination**
Relevant text features (`genres`, `keywords`, `tagline`, `cast`, and `director`) are combined into a single string for each movie.
```python
combined_features = movies_dataset[selected_features].agg(' '.join, axis=1)
The TF-IDF Vectorizer converts text into numerical form with:
- Bi-grams (ngram_range=(1,2)) to consider word pairs.
- Custom Arabic stop words to remove unimportant words.
vectorizer = TfidfVectorizer(
min_df=1,
stop_words=arabic_stop_words,
lowercase=True,
ngram_range=(1, 2),
max_features=1000,
sublinear_tf=True
)
The similarity between movies is calculated based on their TF-IDF vectors.
similarity_matrix = cosine_similarity(feature_matrix)
- The system finds the index of the input movie.
- It retrieves similarity scores and finds the top 3 closest matches.
def recommend_movie(movie_name):
movie_name = clean_text(movie_name) # Normalize user input
if movie_name not in movie_titles_list:
return ["❌ الفيلم ده مش موجود هنا لسه، جرب فيلم تاني."]
movie_index = movie_titles_list.index(movie_name)
similarity_scores = similarity_matrix[movie_index]
similar_movies_indices = np.argsort(similarity_scores)[::-1][1:4]
recommended_movies = [movie_titles_dict[movie_titles_list[i]] for i in similar_movies_indices]
return recommended_movies
🎥 Recommended Movies: ['الفيل الأزرق 2', 'كيرة والجن', 'تراب الماس']
Want to contribute? Feel free to fork the project and submit a Pull Request with improvements or bug fixes.