Data Analysis Library in Python

This is a basic data analysis library based on pandas. It was build after taking tutorials from Ted Petrou.

Features of this library

Have a DataFrame class with data stored in numpy arrays.
Read in data from a comma-separated value files.
Have a nicely formatted display of DataFrame in the notebook.
Select subsets of data with brackets operator.
Use special methods defined in python data model.
Implement aggregation methods - sum, min, max, median, etc.
Implement non-aggregation methods - isna, unique, rename ,drop.
Have methods specific to string columns - aggregation methods don't make any sense in string columns like sum.

DataFrame, Methods and Properties

For usage of these features see Examples.ipynb

DataFrame

DataFrame(data)

data should be python dict.
keys should be string. They will form column names.
values must be one dimensional numpy array.
Length of all numpy arrays must be same.
All unicode numpy arrays will be converted to object type to provide more flexibility.

Note: Unlike pandas accessing columns using . operator is not supported.

        >>> emps.salary      # will give error  
        >>> emps['salary']   # will work

Basic Properties

columns: This property can be used to retrive columns as a list and to set a column.
shape: Returns shape of DataFrame as a tuple.
values: Returns list of all column values as numpy arrays.
dtypes: Returns a two-column DataFrame containing type of each column.

Basic Methods

read_csv: To read csv file data.
head: Returns first n rows of DataFrame.
tail: Returns last n rows of DataFrame.
isna: Returns a boolean DataFrame with each cell either True or False. True means value in that cell is not missing, False means missing.
count: The count method returns a single-row DataFrame with the number of non-missing values for each column.
unique: This method will return the unique values for each column in the DataFrame. Specifically, it will return a list of one-column DataFrames of unique values in each column.
nunique : Returns a single-row DataFrame with the number of unique values for each column.
value_counts: Returns frequency of unique values in DataFrame.
rename: The rename method renames one or more column names.
drop: Accept a single string or a list of column names as strings. Return a DataFrame without those columns.
sort_values : Sorts the dataframe rows according to one or more columns.
sample: Returns a randomly selected sample of rows.

Using brackets operator

Selecting data using brackets operator. There are many features provided by brackets operator. Some are:

Select single column: df['col_name']
Select multiple column: df[list of columns]
Select rows and columns simultaneously: df[list of rows, list of columns]
Select using slices: df[2:5, 3:5]
Boolean selection: df[boolean condition or boolean mask]

Implementation of special methods of python data model

Using special methods defined in the python data model

Provided support for pointwise operations like +, - etc:
Example: df[col_name] + 1000 will add 1000 to each value in column col_name.
Implemented __len__ special method to provide support for len(). It will return number of rows in DataFrame.

Arithmetic and Comparison operators

Used special methods to implement arithmetic and comparison operators. +, -, *, %(modulus), //(floor division), /(true division), <, >, ==, !=, <=, >=

Aggregation methods

To summarize a sequences of values using a single value.

min
max
mean
median
sum
var
std
all
any
argmax - index of the maximum
argmin - index of the minimum

Non-Aggregation methods

abs
cummin
cummax
cumsum
clip
round
copy
diff
pct_change

String Methods

All the string methods present in python are implemented here. They have same syntax as in python except that here they return DataFrame.

capitailize
count
endswith
startswith
find
len
get
index
isalnum
isaplha
isdecimal
islower
isnumeric
isspace
istitle
isupper
lstrip
rstrip
strip
replace,
swapcase
title
lower
upper
zfill
encode

TODO

[] Simultaneously adding or overwriting multiple rows and columns in DataFrame. Current support is only for adding one column at a time.
[] Generic Aggregation methods: Support for columnwise aggregation is provided. Pandas provides both row and column aggregation. Implement row aggregation as well.
[] Add support to access columns using . operator. df.salary is not supported only df['salary'] is supported.
[] Add groupby() method.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
pandas_cub		pandas_cub
.gitignore		.gitignore
Examples.ipynb		Examples.ipynb
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Analysis Library in Python

Features of this library

DataFrame, Methods and Properties

DataFrame

Basic Properties

Basic Methods

Using brackets operator

Implementation of special methods of python data model

Arithmetic and Comparison operators

Aggregation methods

Non-Aggregation methods

String Methods

TODO

About

Releases

Packages

Languages

yash9724/pandas_cub

Folders and files

Latest commit

History

Repository files navigation

Data Analysis Library in Python

Features of this library

DataFrame, Methods and Properties

DataFrame

Basic Properties

Basic Methods

Using brackets operator

Implementation of special methods of python data model

Arithmetic and Comparison operators

Aggregation methods

Non-Aggregation methods

String Methods

TODO

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages