Skip to content

Latest commit

 

History

History
48 lines (34 loc) · 1.94 KB

02-data-preparation.md

File metadata and controls

48 lines (34 loc) · 1.94 KB

3.2 Data preparation

Slides

Notes

This session covered data obtention and some procedures of data preparation.

Commands, functions, and methods:

  • !wget - Linux shell command for downloading data
  • pd.read.csv() - read csv files
  • df.head() - take a look of the dataframe
  • df.head().T - take a look of the transposed dataframe
  • df.columns - retrieve column names of a dataframe
  • df.columns.str.lower() - lowercase all the letters in the columns names of a dataframe
  • df.columns.str.replace(' ', '_') - replace the space separator in the columns names of a dataframe
  • df.dtypes - retrieve data types of all series
  • df.index - retrieve indices of a dataframe
  • pd.to_numeric() - convert a series values to numerical values. The errors='coerce' argument allows making the transformation despite some encountered errors.
  • df.fillna() - replace NAs with some value
  • (df.x == "yes").astype(int) - convert x series of yes-no values to numerical values.

The entire code of this project is available in this jupyter notebook.

⚠️ The notes are written by the community.
If you see an error here, please create a PR with a fix.

Navigation