- Python library that helps structure data in
DataFrames
and contains built-in data analysis functions.
import pandas as pd
- Pandas is an
exploratory data analysis toolkit
with a rich set of attributes and methods - Pandas provide a wide range of
functions
andmethods
- Widely used for
data cleaning
,data exploration
,data manipulation
, anddata analysis
tasks. Toolkit
for reading, writing, accessing, filtering, grouping, aggregating, merging, joining, combining, reshaping, cleaning, selecting data and performing statistical computation. The financial term formultidimensional structured data sets
isPanel
- Supports various formats of data:
csv
,tsv
,txt
,xls
,xlsx
,json
, etc. Performance optimization
( Changing data types, storage type )- Integrates well with other important libraries like
NumPy
,Matplotlib
,Seaborn
,Scipy
, etc.
- Time series support
- Handling missing values
- Grouped operations
- Categorical data support
- Merging and joining DataFrames
- Statistical functions
- Data visualization tools
Data Type or Data Structure | Description |
---|---|
pandas.Series() |
1D array is an object that can hold any data type. |
pandas.DataFrame() |
2D table is like a data structure that can hold multiple types of data in columns. |
Attribute | Meaning |
---|---|
df.index |
The row index labels of DataFrame ( Default: RangeIndex |
df.columns |
The column index labels of DataFrame (axis = 1) |
df.size |
Number of columns in DataFrame |
df.shape |
A tuple of rows and columns ( nrows, ncols ) |
df.ndim |
Number of dimensions in the DataFrame ( 1D, 2D, 3D ) |
df.values |
Values of DataFrame |
df.axes |
List containing index and columns indices in a DataFrame |
Method | Use |
---|---|
pd.read_csv() , pd.read_excel() , pd.read_json() |
Import data |
df.to_csv() , df.to_excel() , df.to_parquet() |
Export data |
df.head() , df.tail() , df.sample() ,df.sort_values() |
Preview data |
df.query() |
Filter data |
df.iat[] , df.at[] , df.iloc[] , df.loc[] |
Indexing and Slicing |
df.info() |
Metadata Information |
df.dropna() , df.fillna() , df.drop_duplicates() , df.rename() , df.set_index() |
Clean data |
df.apply() , df.map() , df.reduce() , df.explode() |
Transform data |
df.groupby() , df.groupby().agg() , df.groupby().aggregate() |
Group and aggregate data |
df.join() , df.merge() , df.concat() |
Combine data |
df.pivot_table() , df.stack() , df.unstack() |
Reshape data |
df.plot() |
Visualize data |
df.sum() , df.mean() , df.median() , df.max() , df.value_counts() , df.describe() |
Mathematical operations |
df.date_range() , df.to_datetime() |
Time Series analysis |
Series
holdshomogeneous
data values, i.e. All data values are ofsame
data type.- Data axis labels are called as
index
# Create a series:
pd.Series([1, 2, 3, 4])
# Accessing a series:
DataFrame['SeriesName'] or DataFrame.SeriesName
- Data is aligned in tabular form with
rows
andcolumns
DataFrame
is a sequence ofSeries
that shares the sameindex
- The Python equivalent of an Excel or SQL table which is used to store and analyze data.
# Empty DataFrame:
pd.DataFrame()
# Accessing DataFrame:
DataFrame[['SeriesName1', 'SeriesName2', 'SeriesName3']]