Data-preprocessing

This is my submission for the Lab assignment 1 in Data mining class. This implementation consists of 2 sections: "Preprocessing" and "Introduction to numpy and pandas".

Preprocessing

Main features:

List attributes with missing data
Fill out the missing data (by constant, mean, median or mode)
Filter out rows and columns with a certain amount of missing data
Remove duplicated rows
Remove outliers
Calculate point combinations
Normalize data (min-max and Z-score)
Apply One-hot encoding

Introduction to numpy and pandas

Calculate the correlations between any given pairs of numeric attributes, and create a heat map for visualization
Use histogram charts to visualize the distribution of scores of every subjects

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Lab01.ipynb		Lab01.ipynb
README.md		README.md
data_filled_const.csv		data_filled_const.csv
diemthi2019.csv		diemthi2019.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data-preprocessing

Preprocessing

Introduction to numpy and pandas

About

Languages

NgKhaiPhu/Data-preprocessing

Folders and files

Latest commit

History

Repository files navigation

Data-preprocessing

Preprocessing

Introduction to numpy and pandas

About

Topics

Resources

Stars

Watchers

Forks

Languages