This is a basic data analysis library based on pandas. It was build after taking tutorials from Ted Petrou.
- Have a DataFrame class with data stored in numpy arrays.
- Read in data from a comma-separated value files.
- Have a nicely formatted display of DataFrame in the notebook.
- Select subsets of data with brackets operator.
- Use special methods defined in python data model.
- Implement aggregation methods - sum, min, max, median, etc.
- Implement non-aggregation methods - isna, unique, rename ,drop.
- Have methods specific to string columns - aggregation methods don't make any sense in string columns like sum.
For usage of these features see Examples.ipynb
DataFrame(data)
- data should be python dict.
- keys should be string. They will form column names.
- values must be one dimensional numpy array.
- Length of all numpy arrays must be same.
- All unicode numpy arrays will be converted to object type to provide more flexibility.
Note: Unlike pandas accessing columns using .
operator is not supported.
>>> emps.salary # will give error
>>> emps['salary'] # will work
columns
: This property can be used to retrive columns as a list and to set a column.shape
: Returns shape of DataFrame as a tuple.values
: Returns list of all column values as numpy arrays.dtypes
: Returns a two-column DataFrame containing type of each column.
read_csv
: To read csv file data.head
: Returns first n rows of DataFrame.tail
: Returns last n rows of DataFrame.isna
: Returns a boolean DataFrame with each cell eitherTrue
orFalse
.True
means value in that cell is not missing,False
means missing.count
: Thecount
method returns a single-row DataFrame with the number of non-missing values for each column.unique
: This method will return the unique values for each column in the DataFrame. Specifically, it will return a list of one-column DataFrames of unique values in each column.nunique
: Returns a single-row DataFrame with the number of unique values for each column.value_counts
: Returns frequency of unique values in DataFrame.rename
: Therename
method renames one or more column names.drop
: Accept a single string or a list of column names as strings. Return a DataFrame without those columns.sort_values
: Sorts the dataframe rows according to one or more columns.sample
: Returns a randomly selected sample of rows.
Selecting data using brackets operator. There are many features provided by brackets operator. Some are:
- Select single column:
df['col_name']
- Select multiple column:
df[list of columns]
- Select rows and columns simultaneously:
df[list of rows, list of columns]
- Select using slices:
df[2:5, 3:5]
- Boolean selection:
df[boolean condition or boolean mask]
Using special methods defined in the python data model
- Provided support for pointwise operations like +, - etc:
Example:df[col_name]
+ 1000 will add 1000 to each value in columncol_name
. - Implemented
__len__
special method to provide support forlen()
. It will return number of rows in DataFrame.
- Used special methods to implement arithmetic and comparison operators. +, -, *, %(modulus), //(floor division), /(true division), <, >, ==, !=, <=, >=
To summarize a sequences of values using a single value.
min
max
mean
median
sum
var
std
all
any
argmax
- index of the maximumargmin
- index of the minimum
abs
cummin
cummax
cumsum
clip
round
copy
diff
pct_change
All the string methods present in python are implemented here. They have same syntax as in python except that here they return DataFrame.
capitailize
count
endswith
startswith
find
len
get
index
isalnum
isaplha
isdecimal
islower
isnumeric
isspace
istitle
isupper
lstrip
rstrip
strip
replace
,swapcase
title
lower
upper
zfill
encode
- [] Simultaneously adding or overwriting multiple rows and columns in DataFrame. Current support is only for adding one column at a time.
- []
Generic Aggregation methods
: Support for columnwise aggregation is provided. Pandas provides both row and column aggregation. Implement row aggregation as well. - [] Add support to access columns using . operator.
df.salary
is not supported only df['salary'] is supported. - [] Add
groupby()
method.