Alcoholic Classification Using Body Signals

Author: Sai Prathyusha Kanisetti
Institution: George Washington University
Instructor: Prof. David W. Trott
Course: Machine Learning (CSCI 6364)
Date: 05/08/2024

Project Overview

Alcohol consumption poses significant public health challenges, with links to a range of physical and mental health disorders. This project leverages machine learning to classify individuals' drinking habits based on body signal data, contributing to better understanding and potential health interventions.

Dataset

The dataset, sourced from Kaggle, contains:

Observations: 991,346 (after cleaning: 906,676)
Features: 24 (22 numerical, 2 categorical)
Key metrics include hemoglobin, glucose levels, height, weight, etc.

Preprocessing:

Duplicate removal (26 entries)
Outlier analysis and removal:
- Waistline (e.g., 999.0)
- Cholesterol levels (e.g., HDL 8110, LDL 5119)
Feature engineering (e.g., identifying blindness from eyesight metrics)

Exploratory Data Analysis (EDA)

Class Distribution: Balanced between drinkers and non-drinkers.
Gender Analysis: Male participants are more likely to drink than females.
Age Trends: Younger and middle-aged individuals consume more alcohol.
Smoking-Alcohol Correlation: Non-smokers exhibit higher alcohol consumption.

SMART Questions

Which age group is most habituated to drinking alcohol?
Does every individual who drinks also smoke?
Is there a significant impact of alcohol on eyesight?
Does regular alcohol consumption affect the liver?

Model Building

Three algorithms were employed:

Random Forest Classifier
Gradient Boosting
XGBoost

Key Features

Set 1: Gamma-GTP, HDL-cholesterol, age, smoking status, etc.
Set 2: Added variables like left/right eyesight, triglycerides, and hemoglobin.

Results

XGBoost outperformed others with robust predictive accuracy and efficiency.
Cross-validation yielded a mean score of ~0.735 with minimal variance.

Conclusion

XGBoost proved most effective due to its:

High accuracy
Resource efficiency
Versatility in handling noisy and incomplete datasets

The insights derived from this project underscore the critical health impacts of alcohol and highlight the value of machine learning in public health research.

Files and Usage

Data: [Dataset from Kaggle] (not included here)
Scripts: Preprocessing, EDA, and model training files
Models: Trained models for Random Forest, Gradient Boosting, and XGBoost

Instructions

Preprocess the dataset:
- Remove duplicates and outliers.
- Engineer relevant features.
Perform EDA to gain insights into class distributions and trends.
Train models using the scripts provided.
Evaluate models with metrics like F1-score and ROC-AUC curves.

Future Work

Extend analysis to other health metrics.
Explore deep learning models for feature representation.
Implement real-time predictions in a healthcare setting.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Alcoholic Classification .pptx		Alcoholic Classification .pptx
G32369302_ML_FinalProject.pdf		G32369302_ML_FinalProject.pdf
Main (1).ipynb		Main (1).ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Alcoholic Classification Using Body Signals

Project Overview

Dataset

Exploratory Data Analysis (EDA)

SMART Questions

Model Building

Key Features

Results

Conclusion

Files and Usage

Instructions

Future Work

About

Releases

Packages

Languages

sai991/Smoking-Drinking-Classification-Using-ML

Folders and files

Latest commit

History

Repository files navigation

Alcoholic Classification Using Body Signals

Project Overview

Dataset

Exploratory Data Analysis (EDA)

SMART Questions

Model Building

Key Features

Results

Conclusion

Files and Usage

Instructions

Future Work

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages