Analyzing student performance with machine learning. This project compares NN and SVM models for both regression (G3 score prediction) and classification (Pass/Fail). Features comprehensive data preparation (Encoding, Scaling) and rigorous evaluation using MAE, Accuracy, and AUC.
This project uses machine learning models to analyze and predict student performance in mathematics based on a comprehensive dataset of student attributes. The core objective is to compare the efficacy of a Sequential Neural Network (NN) and a Support Vector Machine (SVM) across two distinct prediction tasks:
- Regression: Predicting the exact final grade (
G3). - Classification: Predicting whether a student will pass or fail (binary outcome derived from
G3).
- Comprehensive Data Preprocessing: Includes checking for missing values (none found), applying Label Encoding and One-Hot Encoding for categorical features, and Standard Scaling to normalize numerical data.
- Dual-Model Comparison: Implementation and comparison of a custom Sequential Neural Network and an SVM model for both regression (SVR) and classification (SVC).
- Rigorous Evaluation: Utilizes 10-Fold Cross-Validation to assess model generalization.
- Performance Metrics: Evaluates performance using Mean Absolute Error (MAE) for regression and Accuracy, Precision, Recall, F1-Score, and ROC Curve/AUC for classification.
This project is implemented in Python and requires the following libraries:
pandas(for data manipulation)matplotlib(for plotting/visualization)numpy(for numerical operations)scikit-learn(sklearn) - For preprocessing, model selection (KFold, train_test_split), and model implementations (SVR, SVC).tensorflow&keras- For building and training the deep learning model (Neural Network).
You can install the necessary packages using pip:
pip install pandas matplotlib numpy scikit-learn tensorflowThe analysis is performed on the student mathematics performance dataset, loaded from the file student-mat.csv. The dataset contains 395 entries and 33 features covering various student attributes such as school, sex, age, parent's education (Medu, Fedu), family size, absence record, and prior grades (G1, G2).
Key Findings The comparative analysis yielded the following conclusions:
Regression Task (Predicting G3): The Support Vector Regression (SVR) model was found to be the preferred choice, demonstrating a lower Mean Absolute Error (MAE) compared to the Neural Network.
Classification Task (Predicting Pass/Fail): The Neural Network showed better performance with a higher Area Under the ROC Curve (AUC).
Both the NN and SVM models significantly outperformed a random classifier, validating their effectiveness for the task.
Clone the repository:
git clone https://github.com/zekooo69/DeepLearning-vs-SVM-Student-Analysis.git
cd DeepLearning-vs-SVM-Student-Analysis.git
Ensure Data is Available: Place your student-mat.csv file in the appropriate location referenced by the notebook.
Open the Notebook: Run the analysis script using Jupyter or Google Colab:
jupyter notebook LFD_Project.ipynb
Execute Cells: Run all cells in the LFD_Project.ipynb notebook sequentially to perform data loading, preprocessing, model training, and evaluation.
⭐ Support
If you find the project helpful, consider giving the repository a star on GitHub!