Skip to content

In this repository, I have uploaded tasks which were assigned to me by TechnoHacks EduTech during Data Analytics Internship.

Notifications You must be signed in to change notification settings

Maryam0330/TechnoHacks-EduTech-Tasks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 

Repository files navigation

TechnoHacks-EduTech-Tasks

In this repository, I have uploaded tasks which were assigned to me by TechnoHacks EduTech during Data Analytics Internship.

  1. Task 1 : Data Cleaning - Cleaned a dataset by removing missing values and outliers. I have used the Titanic dataset from Kaggle. I have used Python various libraries such as NumPy, Pandas and SciPy to complete this task.

    Identified missing values from the dataset using functions such as isnull(), isnull().sum(). Handled missing values by filling them using the fillna() function. Also used mean() and mode() functions to handle missing values for specific column. Dropped the column in which many missing values were present. Verified the cleaned DataFrame to check if there are any remaining missing values in the DataFrame.

    Used statistical Methods for Outlier Detection : IQR (Interquartile Range) Method to find outliers by identifying data points. Calculated IQR, detected and removed outliers for specific columns.

  2. Task 2 : Summary Statistics - Calculated summary statistics (mean, median, mode, standard deviation) for numeric columns in a dataset. I have used the Titanic dataset from Kaggle. I have used Python libraries such as NumPy and Pandas to complete this task.

    Calculated the mean (average) using Pandas mean() function, median (middle value) using Pandas median() function and standard deviation (dispersion in a set of data points) of the numeric columns using Pandas std() function. Calculated the mode (most frequent value) of the numeric columns using Pandas mode() function.

  3. Task 3 : Remove Duplicates - Identified and removed duplicate values in a dataset. I have used the Iris dataset from Kaggle. I have used Python libraries such as NumPy and Pandas to complete this task.

    Identified duplicates values using the most important method to deal with duplicates, duplicated() method which will tell which values are duplicate. Removed duplicates using drop_duplicates() method to get rid of duplicate values. And finally checked whether the duplicates are removed from the dataset.

About

In this repository, I have uploaded tasks which were assigned to me by TechnoHacks EduTech during Data Analytics Internship.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published