Medical Data Analyzer

This tool is designed to facilitate the analysis and visualization of medical datasets using Python. The project was created and tested on Jupyter Notebook.

Features

Data Cleaning: Handles missing values and outliers in medical datasets.
Exploratory Data Analysis (EDA): Generates summary statistics, correlations, and visualizations.
Custom Visualizations: Create detailed plots such as histograms, heatmaps, and category plot.
Correlation Analysis: Calculate and visualize correlation matrices to identify relationships between variables.
Data Validation: Filters out inconsistent data, such as cases where diastolic pressure is higher than systolic.
Disease Analysis: Differentiates between the presence and absence of cardiovascular disease.

Parameters

The analysis is based on the following parameters:

id
age
sex
height
weight
ap_hi (systolic blood pressure)
ap_lo (diastolic blood pressure)
cholesterol
gluc (glucose level)
smoke
alco (alcohol consumption)
active (physical activity)
cardio (cardiovascular disease indicator)

Getting Started

Prerequisites

Ensure you have the following installed on your system:

Python 3.7+
Jupyter Notebook

Required Python libraries:

pandas
numpy
matplotlib
seaborn
scikit-learn

Install these libraries using pip:

pip install pandas numpy matplotlib seaborn scikit-learn

Installation

Clone the repository:

git clone https://github.com/AyobamiMichael/medicaldata_analysis.git

Navigate to the project directory:
```
cd medical_data_analysis
```
Open the project in Jupyter Notebook:
```
jupyter notebook
```
Launch the main analysis notebook: medical_data_visualizer.ipynb.

Usage

Load Data: Upload your medical dataset in .csv format.
Clean Data:
- Filter out patient segments where diastolic pressure (ap_lo) is higher than systolic pressure (ap_hi).
- Remove or handle other inconsistent or missing data as needed.
Run Analysis:
- Perform exploratory data analysis to examine relationships between parameters.
- Calculate the correlation matrix.
- Generate a heatmap to visualize correlations.
- Differentiate between the presence and absence of cardiovascular disease (cardio).
Save Results: Export cleaned datasets and visualizations.

Example

Here's a snippet of the data cleaning and analysis process:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load the dataset
data = pd.read_csv('medical_examination.csv')

# Filter out invalid data (diastolic > systolic)
data = data[data['ap_lo'] <= data['ap_hi']]

# Calculate the correlation matrix
corr_matrix = data.corr()

# Generate a heatmap
sns.heatmap(corr_matrix, annot=True, fmt='.2f', cmap='coolwarm')
plt.title('Correlation Matrix Heatmap')
plt.show()

Contributing

Contributions are welcome! Follow these steps to contribute:

Fork the repository.
Create a new branch:
```
git checkout -b feature-name
```
Commit your changes:
```
git commit -m "Add new feature"
```
Push to the branch:
```
git push origin feature-name
```
Submit a pull request.

License

This project is licensed under the MIT License.

Contact

For any inquiries or feedback, please contact:

Author: Ayobami Michael Opefeyijimi
Email: ayobamiwealth@gmail.com

Thank you.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
medical_data_visualizer.ipynb		medical_data_visualizer.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Medical Data Analyzer

Features

Parameters

Getting Started

Prerequisites

Installation

Usage

Example

Contributing

License

Contact

About

Releases

Packages

Languages

AyobamiMichael/medicaldata_analysis

Folders and files

Latest commit

History

Repository files navigation

Medical Data Analyzer

Features

Parameters

Getting Started

Prerequisites

Installation

Usage

Example

Contributing

License

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages