This project implements predictive models to classify adult income based on the U.S. Census Bureau data. The goal is to predict whether a person earns more than $50K per year using machine learning techniques.
| File | Description |
|---|---|
FinalProject.ipynb |
Jupyter Notebook with complete code and outputs |
README.md |
Project overview and usage instructions |
adult-dataset.csv |
The data is in the file "adult-dataset.csv". It was extracted from the census bureau database, found at: http://www.census.gov/ftp/pub/DES/www/welcome.html |
The dataset contains demographic and employment-related attributes for U.S. adults. The target variable is Income:
<=50Kor>50K
- Age (int)
- Work Class (categorical)
- Education (categorical)
- Marital Status (categorical)
- Occupation (categorical)
- Race (categorical)
- Sex (binary)
- Hours per week (int)
- Handled missing data by removing or imputing invalid entries
- Converted categorical variables using one-hot encoding
- Removed irrelevant columns
- Ensured numeric data for model compatibility
Applied both SVD and PCA to reduce dataset dimensions.
- Explained Variance: PCA components capturing >90% of varianc
- Built with
MLPClassifier - Evaluated using Confusion Matrix & Classification Report
- Applied using Scikit-learn
- Good baseline model
- Fast and interpretable model
- Unsupervised clustering excluding
Income
| Model | Accuracy | Precision | Recall | F1 Score |
|---|---|---|---|---|
| MLP | ✅ High | ✅ High | ✅ High | ✅ High |
| Logistic Regression | Moderate | Moderate | Moderate | Moderate |
| Naïve Bayes | Moderate | Lower | Moderate | Moderate |
| K-Means | - | - | - | - (unsupervised) |
pip install numpy pandas matplotlib seaborn scikit-learn