This session covered data obtention and some procedures of data preparation.
Commands, functions, and methods:
!wget
- Linux shell command for downloading datapd.read.csv()
- read csv filesdf.head()
- take a look of the dataframedf.head().T
- take a look of the transposed dataframedf.columns
- retrieve column names of a dataframedf.columns.str.lower()
- lowercase all the letters in the columns names of a dataframedf.columns.str.replace(' ', '_')
- replace the space separator in the columns names of a dataframedf.dtypes
- retrieve data types of all seriesdf.index
- retrieve indices of a dataframepd.to_numeric()
- convert a series values to numerical values. Theerrors='coerce'
argument allows making the transformation despite some encountered errors.df.fillna()
- replace NAs with some value(df.x == "yes").astype(int)
- convert x series of yes-no values to numerical values.
The entire code of this project is available in this jupyter notebook.
The notes are written by the community. If you see an error here, please create a PR with a fix. |