Repository containing portfolio of data-science projects completed by me for academic and self-learning purposes.
Note: Data used in the projects (provided in respective folders) is for demonstration purposes only.
-
911 Calls - Exploratory Analysis : Exploratory Data Analysis of the 911 calls dataset hosted on Kaggle. Demonstrates extraction of useful features from different variables.
-
Airplane Crash Analysis : Performing EDA to find interesting trends/behaviours encountered while analysing the dataset.
-
Election Poll Data Analysis : Simple analysis of 2016 US General Election Poll data.
dataset link : http://elections.huffingtonpost.com/pollster/2016-general-election-trump-vs-clinton.csv
-
Netflix Original & IMDB Scores : Analysis performed in search of interesting correlations between the scores and other facets of the data.
-
TPS June 2021 Analysis : Performing basic exploratory data analysis using visualization packages like Seaborn & Plotly for insight generation on the dataset.
-
Titanic Analysis : Exploratory Analysis of the passengers onboard RMS Titanic using Pandas and Seaborn visualizations.
Tools: Seaborn, Plotly, Matplotlib etc.
-
Flight Price Prediction (END-to-END) : A model to predict the value of a given house in the real estate market using various statistical analysis tools & regression techniques.
-
Advance-House Price Prediction : A model to predict the value of a given house in the real estate market using various statistical analysis tools & regression techniques.
-
Breast-Cancer Prediction : Testing out several different supervised learning algorithms to build a model that accurately classifies the tumor into malignant OR benign.
-
Credit Card Customer Segmentation : Identifying different segments in the existing customers based on their spending patterns as well as past interactions with the bank.
-
Data-Scientist Salary Prediction : Creating a machine-learning model to predict the salary of a data-scientist using various ensemble techniques like Random-Forest & Gradient-Boosting.
-
Diabetes Classification : A binary-classification problem where it needs to be analyzed whether a patient is suffering from diabetes or not on the basis of many available features in the dataset.
-
Diamonds : Performing EDA and predicting price of diamonds with the help of statistical analysis tools ,regression techniques & hyperparameter tuning.
-
Heart Disease Prediction : Finding trends in heart data to predict certain cardiovascular events or any clear indications of heart health.
-
IPL-First-Innings Score Prediction : A model to predict the first innings score in the IPL using various statistical analysis tools & regression techniques.
-
Income-Gender Classification : Creating a machine-learning model to predict whether a person makes <=50k or >50k annually on the basis of available information in the dataset.
-
Laptop Price Prediction : Preparing a machine learning model to predict the price of a laptop given its configurations using various regression techniques.
-
Mall-Customer Segmentation : Analyzing a dataset containing data on various customer's for gaining customer insight and figuring out strategies for these customers to increase sales.
-
Messy-vs-Clean Room : Image classification problem --> classifying room as messy or clean.
-
Real-vs-Fake Jobs : Creating a classification model that uses text-data features and meta-features to predict which job descriptions are fraudulent.
-
Real-vs-Fake News : Developing a machine learning model to detect opinion spams and fake news using text classification.
-
Spam Email Detection : A binary-classification problem to classify given email as spam or not using various NLP techniques.
-
Student Performance in Exams : Performing EDA & predicting student's marks to understand the influence of the parents background, test preparation etc. on students performance.
-
Wine Quality Prediction : A model to predict the quality of wine (from 1-10) by analyzing the amount of various chemicals present in wine and their effect on it's quality.
Tools: scikit-learn,Numpy, Pandas, Seaborn, Matplotlib etc.
-
A/B testing on Advertisement Data
- Involves A/B testing on an advertisement dataset, examining the impact of a new distribution strategy on ad success rates.
- Statistical hypothesis testing is employed to compare control and exposed groups, aiming to determine the effectiveness of the new design strategy.
-
Quote Sentiment Analysis using BeautifulSoup : Building a ML model combining Natural Language Processing (NLP) & BeautifulSoup library to assess sentiments of the given quotes. data scraped from : https://www.goodreads.com/quotes
-
Sentiment Analysis - Stock using News Headlines : Creating a machine-learning model to analyze stock-prices using stock news headlines.
-
Spam SMS classification : Using the given dataset to build a prediction model that will accurately classify which texts are spam.
Tools: NLTK, scikit-learn etc.
-
Decision Tree + Random Forest : Using Decision Tree and Random Forest to predict whether a lender will pay their loan back.
-
KNN : Using KNN to classify instances from a fake dataset into two target classes, while choosing the best value for K using the elbow method.
-
- Using Linear Regression to help a company decide whether to focus their efforts on their mobile app experience or their website, depending on which one of them has the greater impact.
- Using Linear Regression to predict the salary of a person based on their years of experience.
-
Logistic Regression : Using Logistic Regression to predict whether an internet user clicked an ad or not.
-
SVM : Using Support Vector Machine to work on classification of the Iris dataset into different categories.
Tools: scikit-learn,Numpy, Pandas, Seaborn, Matplotlib etc.
Do ⭐ the repository, if it inspired you, gave you ideas for your own portfolio or helped you in any way.