- The aim of this project is to be able to analyze the adult dataset and classify the annual income of an individual to either <50K or >50K category (binary classification). Various classfiers have been implemented and their performance have been compared to arrive at the best model for this particular dataset.
- Data obtained from : https://archive.ics.uci.edu/ml/datasets/Adult
- Since the adult dataset has huge amount of data that I could not handle in my computer, I have performed the data analysis and classification using 1/3rd of the data obtained through random sampling which is provided in the data folder.
- The analysis and implementation has been done using Python 3.6 version.
- The versions of the external libraries used are as mentioned below:
- NumPy == 1.16.4
- Pandas == 0.24.2
- Matplotlib == 3.0.3
- Scikit-Learn == 0.21.1