Pregnancy Prediction Analysis

This project analyzes purchasing behavior patterns to predict pregnancy status using logistic regression. The model achieves 84.5% accuracy in predicting pregnancy based on various shopping behaviors and lifestyle changes.

Installation

Clone the repository:

git clone https://github.com/jfm56/regression_analysis.git
cd regression_analysis

Install required dependencies:

pip install -r requirements.txt

Required dependencies:

pandas==2.1.0
numpy==1.24.3
scikit-learn==1.3.0
openpyxl==3.1.2
matplotlib==3.7.1
seaborn==0.12.2

Development dependencies:

pytest==7.4.0
pylint==2.17.5
pytest-cov==4.1.0
black==23.7.0

Usage

Data Format

The analysis expects an Excel file (target.xlsx) with the following columns:

Implied Gender
Home/Apt/PO Box
Pregnancy Test
Birth Control
Feminine Hygiene
Folic Acid
Prenatal Vitamins
Prenatal Yoga
Body Pillow
Ginger Ale
Sea Bands
Stopped buying ciggies
Cigarettes
Smoking Cessation
Stopped buying wine
Wine
Maternity Clothes
Pregnant (target variable, 0 or 1)

An example file target.xlsx is included in the repository.

Running the Analysis

Place your data file (Excel format) in the project directory
Run the analysis:

python -m regression_analysis.regression

The script will generate three visualization files:

correlation_heatmap.png: Shows correlations between features
scatter_plots.png: Displays relationships between key features and pregnancy
feature_importance.png: Shows the importance of each feature

Analysis Results

Model Performance

Accuracy: 84.50%
Precision (Pregnant): 92%
Recall (Pregnant): 74%
F1-Score (Pregnant): 82%

Key Findings

Strongest Positive Indicators of Pregnancy:

Folic Acid (2.94)
Prenatal Vitamins (2.22)
Pregnancy Test (1.96)
Maternity Clothes (1.72)
Ginger Ale (1.41)

Strongest Negative Indicators:

Birth Control (-2.03)
Feminine Hygiene (-1.73)
Wine (-1.29)
Cigarettes (-1.25)

Visualization Explanation

Scatter Plots

The scatter plots show the relationships between six key features and pregnancy status:

X-axis: Feature value
Y-axis: Pregnancy status (0 = Not Pregnant, 1 = Pregnant)
Red trend lines indicate the direction and strength of relationships
Upward trends suggest positive correlation with pregnancy
Downward trends suggest negative correlation with pregnancy

Key observations:

Folic Acid and Prenatal Vitamins show strong positive correlations
Birth Control and Wine show strong negative correlations
The spread of points indicates the reliability of each relationship

Feature Importance

The model identifies purchasing patterns that are most predictive of pregnancy:

Health supplements (Folic Acid, Prenatal Vitamins) are the strongest positive indicators
Contraceptives and lifestyle products (Birth Control, Wine, Cigarettes) are strong negative indicators
Changes in purchasing behavior (stopping wine/cigarettes) are moderately strong indicators

Development Setup

Local Development

Install development dependencies:

pip install -r requirements.txt

Run tests:

pytest

Check code quality:

pylint regression_analysis/regression.py tests/*.py

Check test coverage:

pytest --cov=regression_analysis --cov-report=html

The test suite includes:

Model accuracy testing with realistic data
Feature importance validation
Error handling tests
90% code coverage

Docker Development

Using Pre-built Image

docker pull jmullen029/regression_analysis:latest
docker run jmullen029/regression_analysis:latest

Local Development with Docker

Build and run using Docker:

docker-compose up --build

Run tests in Docker:

docker-compose run regression pytest

The Docker image is automatically built and published to Docker Hub on every push to main branch.

Notes

The model uses logistic regression for binary classification (Pregnant/Not Pregnant)
Features are encoded using Label Encoding for categorical variables
The dataset is split 80/20 for training and testing
Results include both positive and negative predictors for comprehensive analysis
Includes comprehensive test suite with pytest
Docker support for consistent development environment
Code quality maintained with pylint
Test coverage tracked with pytest-cov
Automated dependency updates with Dependabot
- Weekly checks for Python packages
- Weekly checks for GitHub Actions
- Weekly checks for Docker base images
- Auto-merges patch updates

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github		.github
regression_analysis		regression_analysis
tests		tests
.gitignore		.gitignore
.pylintrc		.pylintrc
Dockerfile		Dockerfile
README.md		README.md
correlation_heatmap.png		correlation_heatmap.png
docker-compose.yml		docker-compose.yml
feature_importance.png		feature_importance.png
feature_relationships.png		feature_relationships.png
pytest.ini		pytest.ini
requirements.txt		requirements.txt
scatter_plots.png		scatter_plots.png
setup.py		setup.py
target.xlsx		target.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pregnancy Prediction Analysis

Installation

Usage

Data Format

Running the Analysis

Analysis Results

Model Performance

Key Findings

Strongest Positive Indicators of Pregnancy:

Strongest Negative Indicators:

Visualization Explanation

Scatter Plots

Feature Importance

Development Setup

Local Development

Docker Development

Using Pre-built Image

Local Development with Docker

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Languages

jfm56/regression_analysis

Folders and files

Latest commit

History

Repository files navigation

Pregnancy Prediction Analysis

Installation

Usage

Data Format

Running the Analysis

Analysis Results

Model Performance

Key Findings

Strongest Positive Indicators of Pregnancy:

Strongest Negative Indicators:

Visualization Explanation

Scatter Plots

Feature Importance

Development Setup

Local Development

Docker Development

Using Pre-built Image

Local Development with Docker

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages