This repository records all my machine learning practices and projects at Brandeis University.
The Regression.Rmd
, Classification.Rmd
, and Clustering.Rmd
are the codes for the Diabetes Diagnosis Machine Learning Project.
- Extracted diabetes diagnosis data from NIH web database including Glucose level, Body Mass Index, Age, Blood Pressure, Skin Thickness, Insulin, and Diabetes Pedigree Function.
- Processed data cleaning through feature engineering and explored correlation matrix to identify the multicollinearity of the features; utilized the K-means clustering algorithm to identify the inherent structure and pattern of data.
- Developed predictive models with linear regression, logistic regression, KNN, decision tree and k-fold cross-validations.
- Compared each model with the baseline linear model by evaluating accuracy through confusion matrix and improved the prediction power to 90.3% recall.