During this project, I undertook a comprehensive data science initiative aimed at improving my skills and expanding my knowledge. The project covered several key stages, including data cleaning, data visualization, exploratory data analysis (EDA), decision tree classification, and sentiment analysis, applied to various datasets.
- EDA on FIFA Dataset: Conducted exploratory data analysis to visualize the distribution of key player attributes such as stamina, skills, age, potential, and penalties using histograms.
- Correlation Analysis on Student Performance Dataset: Performed data preprocessing and cleaning followed by correlation analysis using Seaborn to explore relationships between various performance metrics through heatmaps, pairplots, relplots, distplots, and catplots.
- Behavioral Analysis using Bank Dataset: Cleaned the dataset, removed unwanted columns, applied label encoding to categorical values, and utilized a decision tree classifier to predict customer product uptake and analyze behavioral patterns.
- Sentiment Analysis on Twitter Data: Conducted sentiment analysis to classify tweets as positive or negative using logistic regression, including text preprocessing steps like removing stop words, followed by visualization of results with countplots and word clouds.
This repository contains the following files:
notebooks/
: Directory contains Jupyter notebooks for all 4 tasks.README.me
: File provide with an overview of the project.
Through comprehensive exploratory data analysis, correlation analysis, machine learning, and sentiment analysis, valuable insights are derived from multiple datasets. The analysis provides a deeper understanding of player attributes in the FIFA dataset, student performance correlations, customer behavior in the bank dataset, and sentiment trends in Twitter data.