Welcome to my Data Science projects repository! This repository contains a collection of small to medium-sized data science projects that I have worked on. Each project is organized into its own folder and contains all necessary files, including data, code, and documentation.
This repository is a compilation of various data science projects that I have created. The projects range from simple analyses and visualizations to more complex machine learning models. Each project aims to solve a specific problem or explore a particular dataset.
Small projects are quick and concise, often focusing on a single concept or technique. They are great for beginners or for anyone looking to understand the basics of data science.
-
1- Exploring NYC Public School Test Result Scores
- Description: This project analyzes test result scores from NYC public schools to identify trends and insights that can help improve educational outcomes.
- Technologies: Python, Pandas, Matplotlib, Seaborn
- Link to Project
-
2- Netflix Data Analysis
- Description: This project analyzes Netflix data to determine if movie lengths are getting shorter and to identify the most frequent movie duration in the 1990s as well as the number of short action movies released in the 1990s.
- Technologies: Python, Pandas, Matplotlib, Seaborn
- Link to Project
-
3- Visualizing the History of Nobel Prize Winners
- Description: This project analyzes Nobel Prize data to answer several key questions about the demographics and trends of laureates.
- Technologies: Python, Pandas, Matplotlib, Seaborn
- Link to Project
-
4- Analyzing Crime in Los Angeles
- Description: This project analyzes crime data in Los Angeles to help the Los Angeles Police Department gain insights about the crimes in the city.
- Technologies: Python, Pandas, Matplotlib, Seaborn
- Link to Project
-
5- Project: Customer Analytics: Preparing Data for Modeling
- Description: This project involves transforming a DataFrame called
ds_jobs_transformed
to store the data fromcustomer_train.csv
much more efficiently. The goal is to optimize data types and filter the dataset based on specific criteria to reduce memory usage. - Technologies: Python, Pandas, Numpy
- Link to Project
- Description: This project involves transforming a DataFrame called
-
6- Exploring Airbnb Market Trends
- Description: This project investigates the short-term rental market in New York using Airbnb listing data. It involves analyzing the dates of the earliest and most recent reviews, counting the number of private rooms, and calculating the average listing price. The results are combined into a summary DataFrame.
- Technologies: Python, Pandas, Jupyter Notebook, seaborn, matplotlib
- Link to Project
-
7- Modeling Car Insurance Claim Outcomes
- Description: This project aims to build a model to predict whether a customer will make a claim on their car insurance during the policy period.
- Technologies: Python, Pandas, Scikit-learn, statsmodels
- Link to Project
-
8- Hypothesis Testing with Men's and Women's Soccer Matches
- Description: This project investigates whether more goals are scored in women's international soccer matches compared to men's. The analysis focuses on official FIFA World Cup matches since January 1, 2002, using statistical hypothesis testing to validate the hypothesis.
- Technologies: Python, Pandas, Matplotlib, Pingouin, Scipy
- Link to Project
-
9- Predictive Modeling for Agriculture
- Description: This project aims to assist farmers in selecting the best crops to plant each season by using machine learning to predict crop yields based on soil conditions.
- Technologies: Python, Pandas, Scikit-learn
- Link to Project
10- 10- Clustering Antarctic Penguin Species
- Description: This project uses clustering techniques to identify different species of Antarctic penguins based on their physical characteristics.
- Technologies: Python, Pandas, Scikit-learn, Matplotlib, Seaborn
- Link to Project
11- 11- Predicting Movie Rental Durations
- Description: This project aims to predict the duration for which a movie will be rented based on various features.
- Technologies: Python, Pandas, Scikit-learn
- Link to Project
Medium projects are more comprehensive and involve multiple steps and techniques. They are suitable for those who have a basic understanding of data science and want to delve deeper into more complex problems.
-
12- DataCamp Data Scientist Associate Practical Supermarket Loyalty
- Description: This project involves analyzing supermarket loyalty data to understand customer behavior and predict future loyalty.
- Technologies: Python, Pandas, Scikit-learn, Matplotlib, Seaborn
- Link to Project
-
13- DataCamp Data Scientist Associate Certification DS601P
- Description: This project is part of the DataCamp Data Scientist Associate Certification and involves various data science tasks to demonstrate proficiency in data analysis and modeling.
- Technologies: Python, Pandas, Scikit-learn, Matplotlib, Seaborn
- Link to Project
To get started with any of the projects, follow these steps:
-
Clone the repository:
git clone https://github.com/AbdooMohamedd/data-science-projects.git cd data-science-projects
-
Navigate to the desired project folder:
cd project-name-1
-
Install the required dependencies:
Each project specifies its dependencies in the documentation. Install the dependencies using pip:
pip install pandas Matplotlib Seaborn scikit-learn
-
Run the project:
Follow the instructions provided in the project's README or documentation file to run the project.
I welcome contributions to this repository. If you have a project that you would like to add or improvements to suggest, please fork the repository and create a pull request.
If you have any questions or suggestions, feel free to contact me at abdelrahman.mohamed1081@gmail.com.