Data cleaning - Remove the outliers / check feasibilty #3

Stefpur · 2024-08-08T07:37:40Z

Description:

The numerical values in the columns Goal, Pledged, and Backers appear to be highly skewed, which may indicate the presence of outliers.

Task:
check and address this

Visualize the distribution of the data:
Create histograms and boxplots to check the distribution of the data and identify outliers.

Remove outliers:
Apply a method to remove outliers, such as setting a threshold or using the Interquartile Range (IQR).

Delete outliers (z.B. mit IQR-Methode)
def remove_outliers(df, column):
Q1 = df[column].quantile(0.25)
Q3 = df[column].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
return df[(df[column] >= lower_bound) & (df[column] <= upper_bound)]

for feature in numerical_features:
kickstarter = remove_outliers(kickstarter, feature)

If evaluation is positive, a function will be created and added to the base.jpyt

Essejran · 2024-08-09T13:33:04Z

New ticket: deciding on best transformation for outliers

Stefpur assigned Essejran Aug 8, 2024

Essejran closed this as completed Aug 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data cleaning - Remove the outliers / check feasibilty #3

Data cleaning - Remove the outliers / check feasibilty #3

Stefpur commented Aug 8, 2024

Essejran commented Aug 9, 2024

Data cleaning - Remove the outliers / check feasibilty #3

Data cleaning - Remove the outliers / check feasibilty #3

Comments

Stefpur commented Aug 8, 2024

Essejran commented Aug 9, 2024