Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data cleaning - Remove the outliers / check feasibilty #3

Closed
Stefpur opened this issue Aug 8, 2024 · 1 comment
Closed

Data cleaning - Remove the outliers / check feasibilty #3

Stefpur opened this issue Aug 8, 2024 · 1 comment
Assignees

Comments

@Stefpur
Copy link
Collaborator

Stefpur commented Aug 8, 2024

Description:

The numerical values in the columns Goal, Pledged, and Backers appear to be highly skewed, which may indicate the presence of outliers.

Task:
check and address this

Visualize the distribution of the data:
Create histograms and boxplots to check the distribution of the data and identify outliers.

Remove outliers:
Apply a method to remove outliers, such as setting a threshold or using the Interquartile Range (IQR).

Delete outliers (z.B. mit IQR-Methode)
def remove_outliers(df, column):
Q1 = df[column].quantile(0.25)
Q3 = df[column].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
return df[(df[column] >= lower_bound) & (df[column] <= upper_bound)]

for feature in numerical_features:
kickstarter = remove_outliers(kickstarter, feature)

If evaluation is positive, a function will be created and added to the base.jpyt

@Essejran
Copy link
Collaborator

Essejran commented Aug 9, 2024

New ticket: deciding on best transformation for outliers

@Essejran Essejran closed this as completed Aug 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants