GitHub - bkennedy06/MovieRatingPrediction: A Scala/Spark project for predicting the ratings of IMDb movies using KNN

Preprocess.scala Purpose: Handle data loading and cleaning. Should do: Load IMDb datasets from the data/ directory. Clean missing or invalid data. Transform datasets into a format suitable for KNN, such as converting genres and runtime into numerical features. Output: Returns a processed DataFrame for model training.
TrainModel.scala Purpose: Implements the KNN algorithm to predict movie ratings. Should do: Accept the preprocessed DataFrame from Preprocessing.scala. Compute distances between movies based on features (e.g., genres, runtime, etc.). Use the nearest neighbors to predict ratings for movies. Output: Stores the predicted ratings for evaluation.
EvalModel.scala Purpose: Evaluate the accuracy and performance of the KNN predictions. Should do: Compare predicted ratings to actual ratings from the dataset. Calculate evaluation metrics, such as: Mean Absolute Error (MAE) Root Mean Square Error (RMSE) Generate a summary of model performance.
Main.scala Purpose: The main entry point for running the project pipeline. Should do: Initialize the Spark session. Call the functions from Preprocess.scala, TrainModel.scala, and EvalModel.scala in sequence. Print the results or save outputs for further analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.bsp		.bsp
.idea		.idea
project		project
src/main/scala		src/main/scala
target		target
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt
log.txt		log.txt

Provide feedback