Your task is to conduct a comprehensive analysis of credit card customers using the provided dataset. This assignment is designed to assess your proficiency in data manipulation, visualization, customer segmentation, predictive modeling, and your ability to use Git for version control. Dataset
You will be working with the "Credit Card Customers" dataset available on Kaggle. This dataset includes information on customers' age, salary, marital status, credit card limit, credit card category, and more.
Dataset Link: Credit Card Customers
Please download the dataset directly from Kaggle and include it in your project repository in a dedicated data folder. Tasks
Data Understanding: Load the dataset and perform an initial exploration to understand its structure, identify missing values, and gather basic statistics.
Preprocessing: Clean the dataset by handling missing values, outliers, and any erroneous data points. Document your decisions.
Customer Demographics Analysis: Analyze the demographics of the credit card holders (e.g., age, salary, marital status) and visualize the distributions.
Credit Usage Analysis: Explore how different demographics correlate with credit card limit, balance, and category. Identify any interesting patterns.
Segmentation Model: Use clustering techniques (e.g., K-Means) to segment the customers based on their credit card usage and demographic data. Determine the optimal number of clusters.
Segment Analysis: Analyze each customer segment to identify unique behaviors and characteristics. Provide actionable insights for targeted marketing strategies.
Churn Prediction: Build a predictive model to forecast customer churn based on the features available in the dataset. Experiment with at least two different algorithms and compare their performance.
Model Evaluation: Assess the models using appropriate performance metrics. Discuss the strengths and weaknesses of each model.
Create a new GitHub repository specifically for this assignment.
Make sure the repository is public to allow evaluation.
Include a README.md file that provides an overview of your project, how to run your code, and a summary of your findings and insights.
Organize your repository with clear directories for the dataset, scripts/notebooks, and any additional resources used or created.
Commit your changes with clear and descriptive messages. Demonstrate effective use of version control throughout the project.
Evaluation Criteria
Problem Solving: Your approach to preprocessing, analyzing, and modeling the data.
Code Quality: Readability, structure, and documentation of your code.
Git Usage: Frequency and clarity of commits, including branching and merging practices.
Insights and Recommendations: Depth of the insights drawn from the analysis and the practicality of your recommendations.
Model Performance: Accuracy and robustness of your predictive models.