Skip to content

sachelsout/EDA-haberman-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

EDA-haberman-dataset

Exploratory Data Analysis(EDA) on Haberman Dataset.

Exploratory Data Analysis is the process of performing initial investigations on data so as to discover patterns,to spot anomalies,to test hypothesis and to check assumptions with the help of summary statistics and graphical representations. image
Haberman Dataset - The dataset contains cases from a study that was conducted between 1958 and 1970 at the University of Chicago's Billings Hospital on the survival of patients who had undergone surgery for breast cancer.
Kaggle Link - https://www.kaggle.com/gilsousa/habermans-survival-data-set

Different types of plots explored in this project are - 2D Scatter Plot, 3D Scatter Plot, Pair Plots, Histogram, PDF, CDF, Box Plot, Violin Plot, Contour Plot.
Mean, Variance, Standard Deviation, Median, Percentile, Quantile, Inter Quartile Range, Median Absolute Deviation are also explored and calculated in this project.

Conclusion -

1. From the data analysis done above, I conclude that the given dataset is not linearly separable.
2. The dataset is imbalanced in nature (Class 1 datapoints is higher than Class 2 datapoints).
3. The most important feature obtained is the number of nodes for the survival of the patients.
4. The features 'age' and 'years' do not provide any value in determining the survival status of the patient though 'age' feature was slightly better. These features may come handy if more data is provided.

Releases

No releases published

Packages

No packages published