Fetal-Health-Classification-with-Pyspark

Overview

This project analyzes fetal health data using Apache Spark to demonstrate data processing and machine learning techniques. The analysis includes data preprocessing, sampling methods, and decision tree optimization.

Data Source

The data is loaded into a Spark DataFrame from a table named fetal_health. The schema of the data is printed at the beginning of the analysis to understand the structure.

Methods

The following methods are applied in the analysis:

Oversampling: To address potential class imbalance, oversampling techniques are tested.
Random Sampling: Random sampling methods are explored for creating balanced datasets.
Decision Tree Pruning: Decision tree models are pruned to optimize performance.
Randome Forest: Random Forest model is also tested to compare the results with Decision Tree.

Results

Results from each method are documented in the notebook. The notebook includes:

Counts of observations by fetal_health category.
Comparisons of model performance with different sampling techniques.
Evaluation of decision tree performance before and after pruning.

Usage

To run this analysis, you will need:

Apache Spark environment.
Access to the fetal_health table or a similar dataset.

Requirements

PySpark
Additional Python libraries used in the analysis.

Acknowledgements

Ayres de Campos et al. (2000) SisPorto 2.0 A Program for Automated Analysis of Cardiotocograms. J Matern Fetal Med 5:311-318 link

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
FetalHealth.ipynb		FetalHealth.ipynb
README.md		README.md
Report_fetalHealth.pdf		Report_fetalHealth.pdf
Visualization		Visualization
project_fetalHealth.ipynb		project_fetalHealth.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fetal-Health-Classification-with-Pyspark

Overview

Data Source

Methods

Results

Usage

Requirements

Acknowledgements

About

Releases

Packages

Languages

Maryamahmadii/Fetal-Health-Classification-with-Pyspark

Folders and files

Latest commit

History

Repository files navigation

Fetal-Health-Classification-with-Pyspark

Overview

Data Source

Methods

Results

Usage

Requirements

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages