Health-check

Project Overview

This project analyzes the Postpartum Depression (PPD) dataset (THP_clean.csv) to explore demographic, medical, and social factors linked to PPD and to build a machine learning model that predicts risk. Dataschema was alemployed as some column names were encrypted.

The dataset includes:

Demographics: age, marital status, education, employment, parity (number of children)

Medical history: previous depression, HAMD baseline score, birth complications

Social support: MSPSS baseline score

Target: Postpartum depression (yes/no)

Methodology

Data Loading Loaded the dataset (THP_clean.csv) into a Pandas DataFrame. Checked shape, column names, and data types.
Data Cleaning Removed duplicate rows. Standardized column names (lowercase, underscores). Filled missing values: Numeric columns → filled with median. Categorical columns → filled with mode (most frequent value).
Exploratory Data Analysis (EDA) & Visualizations

20 key research questions were asked and answered: Visualizations used: bar plots, boxplots, pie chart, histogram, and heatmap.

Does age group influence PPD?
Does marital status affect PPD?
Does social support reduce PPD risk?
what is depression rate based on parent educational level?
What percentage of mothers experienced PPD?
Does birth complications increase PPD risk?
Does the number of children (parity) affect PPD?
How does social relationship influence PPD?
Does employment status affect PPD?
What is the effect of hamd severity score of the mother on wppsi variables?
how does ppd affect the social behaviour of children using scas & sdq
What is the ratio of depressed and not depressed among mother's with no health issue?
Effect the influcence of external family members being on PPD?
Does the number of child being born influence PPD?
Does the living condaition influence PPD?
Finacial inflence on PPD?
How does BMI inflenece Mother's physical and mental health?
What is the effect of the Home on PPD?
Is PPD influenced by practicing birth spacing?
What is the mortality rate of children?
Predictive Modeling Preprocessed categorical variables using Label Encoding. Defined features (X) and target (y = depressed). Split dataset into training and testing sets (80/20). Trained a Random Forest Classifier. Evaluated with: Accuracy, Precision, Recall, F1-score (classification report). Confusion matrix.
Feature Importance Extracted feature importances from the Random Forest model. Identified the Top 10 predictors of PPD (e.g., social support score, HAMD score, marital status, birth complications). Results & Insights Mothers with lower social support had a higher risk of PPD. History of depression and higher HAMD scores were strong predictors. Marital status and birth complications were also linked to increased risk. Random Forest achieved good predictive performance, showing potential for risk screening.

How to Run

Open the Jupyter Notebook PPD_Analysis.ipynb.
Run all cells step by step.
Upload THP_clean.csv when prompted.
Visualizations and model results will display inline.

Libraries Used

pandas(data handling)
numpy(numerical processing)
matplotlib& seaborn (visualization)
scikit-learn(machine learning)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
PPD Analysis-checkpoint.ipynb		PPD Analysis-checkpoint.ipynb
README.md		README.md
THP_clean.csv		THP_clean.csv
datasetschema.txt		datasetschema.txt
ppd_analysis 2-checkpoint.ipynb		ppd_analysis 2-checkpoint.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Health-check

About

Uh oh!

Releases

Packages

Languages

choice03/Health-check

Folders and files

Latest commit

History

Repository files navigation

Health-check

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages