Source
Dataset Source - source
The data consists of 8 column and 1000 rows.
The dataset The goal is to predict math score of given student (Regression Analysis).
There are 7 independent variables:
gender: sex of students -> (Male/female)race/ethnicity: ethnicity of students -> (Group A, B,C, D,E)parental level of education: parents' final education ->(bachelor's degree,some college,master's degree,associate's degree,high school)lunch: having lunch before test (standard or free/reduced)test preparation course: complete or not complete before testreading scorewriting score
Target variable:
math score: Score of the given student.
AWS Elastic Beanstalk link : link
-
Data Ingestion :
- In Data Ingestion phase the data is first read as csv.
- Then the data is split into training and testing and saved as csv file.
-
Data Transformation :
- In this phase a ColumnTransformer Pipeline is created.
- for Numeric Variables first SimpleImputer is applied with strategy median , then Standard Scaling is performed on numeric data.
- for Categorical Variables SimpleImputer is applied with most frequent strategy, then ordinal encoding performed , after this data is scaled with Standard Scaler.
- This preprocessor is saved as pickle file.
-
Model Training :
- In this phase base model is tested . The best model found was Linear regresion.
- This model is saved as pickle file.
-
Prediction Pipeline :
- This pipeline converts given data into dataframe and has various functions to load pickle files and predict the final results in python.
-
Flask App creation :
- Flask app is created with User Interface to predict the student's score inside a Web Application.
Link : EDA Notebook
Link : Model Training Notebook
