Exploratory Data Analysis(EDA) on Haberman Dataset.
Exploratory Data Analysis is the process of performing initial investigations on data so as to discover patterns,to spot anomalies,to test hypothesis and to check assumptions with the help of summary statistics and graphical representations.
Haberman Dataset - The dataset contains cases from a study that was conducted between 1958 and 1970 at the University of Chicago's Billings Hospital on the survival of patients who had undergone surgery for breast cancer.
Kaggle Link - https://www.kaggle.com/gilsousa/habermans-survival-data-set
Different types of plots explored in this project are - 2D Scatter Plot, 3D Scatter Plot, Pair Plots, Histogram, PDF, CDF, Box Plot, Violin Plot, Contour Plot.
Mean, Variance, Standard Deviation, Median, Percentile, Quantile, Inter Quartile Range, Median Absolute Deviation are also explored and calculated in this project.
2. The dataset is imbalanced in nature (Class 1 datapoints is higher than Class 2 datapoints).
3. The most important feature obtained is the number of nodes for the survival of the patients.
4. The features 'age' and 'years' do not provide any value in determining the survival status of the patient though 'age' feature was slightly better. These features may come handy if more data is provided.