TODO: Clean stuff, save model and create UI, deploy w Docker to actually host, run, save and compare results.
SAT Score Prediction Project using raw neural networks constructed using Numpy and PyTorch with both regression and classification methods. This project aims to predict SAT scores based on GPA and other factors using a machine learning model built with the PyTorch framework. The model utilizes a neural network architecture to make accurate predictions and has been fine-tuned to achieve high accuracy and fast convergence.
-
PyTorch Neural Network: Developed a multi-layer neural network architecture using PyTorch for SAT score prediction.
-
Data Preprocessing: Implemented comprehensive data preprocessing techniques such as binary splitting and Principal component analysis (PCA) to handle missing values, outliers, and feature scaling.
-
Hyperparameter Tuning and Model Design: Conducted 100+ hyperparameter fine-tuning and trying different combinations of neural network models, loss function, optimizer, and learning rate scheduler to optimize the model's performance, achieving an accuracy of 90%.
-
Advanced Activation Functions: Utilized advanced activation functions, such as Leaky ReLU, to enhance model training and convergence.
-
Dropout Regularization: Applied dropout layers to prevent overfitting and improve the generalization capability of the model.
-
Reduce Learning Rate: Implemented a learning rate scheduler to gradually reduce the learning rate during training, improving convergence and final performance.
-
Data Randomization and Generation: Generated synthetic data and introduced randomization techniques to enhance model robustness and generalization, and prevent overfitting.
-
Evaluation: Using R-squared value and MAE error to evaluate the accuracy of the model over a set of 10% test data.
-
Visualization: Created informative plotting visualizations of training and validation metrics to analyze the model's performance.
-
User Demo: Created a CLI demo of predicting the SAT score by taking in user's inputs.
Before you begin, ensure you have met the following requirements:
- Python 3.6 or later installed
- Required Python packages listed in
requirements.txt
- CSV files listed in
data
- GPA_Small.csv: contains the original CSV file.
- GPA_dummies.csv: generative data used to prevent overfitting.
- GPA_Big.csv: final data file with multiple generative data combined with original data.
In the pursuit of predicting SAT scores based on GPA and other factors, rigorous experimentation and analysis were conducted. All model runs were performed with a consistent training setup of 200 epochs, capturing valuable insights into the convergence, accuracy, and performance over time.
The initial phase of the project involved training a neural network model from scratch using NumPy.
Optimized Regression Neural Network Models Achieving 90% Accuracy with Advanced Techniques and Enhanced Data Preprocessing.
Optimized Classification Neural Network Models with Advanced Techniques and Enhanced Data Preprocessing for comparison.
The combination of regression and classification approaches, along with rigorous model optimization and preprocessing techniques, showcases the project's progress in predicting SAT scores using regression. The achievements and insights gained underline the potential impact of these models in educational evaluation and assessment, while also highlighting the challenges associated with certain classification tasks.
- The framework in
SAT_score_scratch
is adapted from the BikeSharing Project by Udacity