This project analyzes phone usage patterns in India and predicts the primary use of mobile devices based on various features. The notebook covers data preprocessing, exploratory data analysis (EDA), and model training using multiple classification algorithms.
The dataset contains the following features:
- Categorical features:
- Gender
- Location
- Phone Brand
- Operating System (OS)
- Primary Use
- Numerical features:
- Various usage metrics (unnamed in this draft)
- Dropped irrelevant features (e.g., User ID).
- Label encoded categorical variables.
- Normalized numerical features using
MinMaxScaler
. - Split data into training and testing sets (80/20 split).
- Visualized distributions of categorical features.
- Explored relationships between usage patterns and the target variable (Primary Use).
- Analyzed numerical feature distributions and their relationships with categorical variables using violin plots.
Implemented and evaluated the following classifiers:
- Random Forest
- AdaBoost
- Gradient Boosting
- Extra Trees
- Logistic Regression
- Support Vector Machine
- XGBoost
- LightGBM
- Accuracy Score
- Confusion Matrix
- Classification Report
- Bar plots to compare model accuracy.
- Heatmaps for confusion matrices of each classifier.
- Python 3.x
- Required libraries:
- Clone this repository.
- Install the dependencies using:
pip install -r requirements.txt
- Ensure the dataset (
phone_usage_india.csv
) is in the project directory. - Open and run the Jupyter Notebook using:
Jupiter notebook "Phone Usage in India .ipynb"
The notebook provides:
- Insights into the primary uses of phones in India.
- Performance comparison of multiple machine learning models for predicting primary phone usage.
- Visualizations for better interpretability of results.
- Dataset: The data was sourced from a study on phone usage patterns.
- Libraries: Special thanks to contributors of Python libraries used in this project.
This project is licensed under the MIT License. See the LICENSE file for details.