Student Outcome Classification

Overview

This project builds and evaluates multiple supervised machine learning models to predict student success outcomes in higher education.

The target variable represents whether a student drops out, remains enrolled, or graduates.

The project follows an end-to-end machine learning workflow including data preprocessing, exploratory data analysis (EDA), model training, model evaluation, and model comparison.

Dataset

The dataset used in this project comes from Kaggle:

Higher Education Predictors of Student Retention
https://www.kaggle.com/datasets/thedevastator/higher-education-predictors-of-student-retention

The raw dataset includes demographic, academic performance, and socioeconomic features related to student performance and retention of first-year college students.

Project Structure

student-outcome-classification/
├── data/
│   ├── raw/
│   │   └── student_data_raw.csv                    # Original dataset from Kaggle
│   └── processed/
│       └── student_data_processed.csv              # Cleaned and preprocessed dataset
├── notebooks/
│   ├── 00_run_in_colab.ipynb                       # Interactive notebook for user execution. 
│   ├── 01_data_preprocessing_and_cleaning.ipynb    # Cleaned and preoprocessed the raw student dataset in preparation for modeling. 
│   ├── 02_exploratory_data_analysis.ipynb          # Performed EDA using a multiple linear regression model.
│   ├── 03_logistic_regression_model.ipynb          # Implements a multinomial logistic regression model. 
│   ├── 04_gradient_boosting_model.ipynb            # Implements a gradient boosting classifier model. 
│   └── 05_model_comparison_and_conclusions.ipynb   # Compares both models and summarizes findings. 
└── README.md

Models Implemented

Multiple Linear Regression (EDA only)
Multinomial Logistic Regression
Gradient Boosting Classifier

Models are evaluated using accuracy, classification reports, and confusion matrices.

Technologies & Tools Used

Programming & Environment

Python
Jupyter Notebook
Google Colab

Data Analysis & Machine Learning

Pandas
NumPy
Scikit-learn
SciPy
Matplotlib
Seaborn

Models & Techniques

Gradient Boosting Classifier
Multinomial Logistic Regression
Exploratory Data Analysis (EDA)
Multiple Linear Regression
Feature Scaling & Encoding
Cross-Validation & Hyperparameter Tuning

Interface

ipywidgets (interactive UI)

How to Run

Open 00_run_in_colab.ipynb
Follow the setup instructions
Run notebooks in numerical order

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Student Outcome Classification

Overview

Dataset

Project Structure

Models Implemented

Technologies & Tools Used

How to Run

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
data		data
notebooks		notebooks
README.md		README.md

sngillard/student-outcome-classification

Folders and files

Latest commit

History

Repository files navigation

Student Outcome Classification

Overview

Dataset

Project Structure

Models Implemented

Technologies & Tools Used

How to Run

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages