Skip to content

Latest commit

 

History

History
91 lines (75 loc) · 5.34 KB

File metadata and controls

91 lines (75 loc) · 5.34 KB

Python-BasicOperations-and-DataAnalysis

medium

This repository contains two Notebooks

Notebook 1: "Python-BasicOperations.ipynb"- Contains Basic Operations with Python Dataframes. This notebook contains the below topics.The datasets(1000_Sales_Records.csv) used for this notebook is placed in the same repository. I have pulled this dataset randomly from the Internet

Topic 1: Basic Dataframe Reading/Operations

Code Block 1.1: Reading the dataframe
Code Block 1.2: Getting to know the shape of the dataset (Rows and Columns)
Code Block 1.3: Length of dataframe.
Code Block 1.4: Getting to know the data type of the dataset
Code Block 1.5: Extracting one column from the dataframe and getting to know the data type and the size (of Series)
Code Block 1.6: Printing the size of the series
Code Block 1.7: Data types for whole dataframe (variables)
Code Block 1.8: Working on creating specific indexes
Code Block 1.9: Printing the first 5 rows of the dataframe
Code Block 1.10: Printing the last 5 rows of the dataframe
Code Block 1.11: Displaying the information of the dataframe
Code Block 1.12: Extracting all rows from the dataframe with only one column
Code Block 1.13: Understanding difference between Series and Dataframe
Code Block 1.14: Extracting range of columns. For example all columns from country to right end
Code Block 1.15: Selection Based on single index column
Code Block 1.16: Selection Based on multiple index columns values
Code Block 1.17: Selection Based on multiple index columns values

Topic 2: Conversion of operations/code from SQL to Python

Code Block 2.1: SQL (where clause with single condition)-->Python Code
Code Block 2.2: SQL (where clause with multiple conditions)-->Python Code
Code Block 2.3: SQL (where clause with multiple conditions using NOT IN)-->Python Code
Code Block 2.4: SQL (where clause with order by on single variable)-->Python Code
Code Block 2.5: SQL (where clause with order by on multiple variables)-->Python Code

Topic 3: Data Exploration

Code Block 3.1:Checking on various statistics for categorical variables with 1 variable (Series)
Code Block 3.2:Checking on various statistics for integer variables with 1 variable (Series)
Code Block 3.3:Working on describe method for the whole dataframe which basically consists a mix of numbers and categorical variables
Code Block 3.4: Data exploration methods for Series vs Dataframe
Code Block 3.5:Checking median of all integer columns in the dataframe
Code Block 3.6: Select distinct values of any column

Topic 4: Creating a new column, calculated columns, cleaning the column names, dropping rows/columns

4.1 Creating a new column with in dataframe
4.2 Printing all the columns from the dataset/Getting to know the column names
4.3 Cleaning the columns
4.4 Converting the data type of the column
4.5 Renaming the column names
4.6 Counting the missing values for each column in the dataset
4.7 Dropping rows and columns that has missing values
4.8 Replacing the missing values with some value (provided with some conditional logic)
4.9 Map funciton
4.10 Writing the final dataset (cleaned) one into csv

Topic 5: Plotting

5.1 Plotting a horizontal bar/histogram

Notebook 2: "Python-Data Combining.ipynb"- Contains various techniques on combining/merging the dataframes. This notebook contains the below topics. The datasets (Trans1.csv, Trans2.csv, Transactions.csv, Info.csv) used in this notebook were placed in the same repository. I have created all these datasets with some random data on my own to work on the operations

Topic 1: Data Combine

1.1 Combining dataframes using concat function
1.2 Combining dataframes using concat function- with Ignore Index option
1.4 Combining dataframes using Merge function (Inner Join)
1.5 Combining dataframes using Merge function (left Join)
1.6 Combining dataframes using Merge function (right Join)
1.7 Combining dataframes using Merge function (outer Join)
1.8 Use of suffixes

Topic 2: Transforming Data with Pandas- Using map(), apply(), applymap(), apply(), melt()

2.1 Creation of new column based on cases (this is more like CASE statement in SAS/SQL)
2.2 Difference between apply() and map()
2.3 Use of applymap()
2.4 Using pd.melt(): unpivots a Dataframe from wide format to long format

Topic 3: Working with Strings in Pandas

3.1 Renaming one of the column
3.2 Commonly used String functions
3.3 Calculating the lenth of a string for one column and store it in a different column
3.4 Creating a new calculated column by converting the string stored in one column to upper case letters
3.5 Pattern Searching- Using contains
3.6 extract() and extractall()

Topic 4: Working with Missing and Duplicate Data

4.1 Identifying missing values
4.2 Indentifying the duplicate values
4.3 Dropping the duplicates
4.4 Imputation of missing values with mean or any fixed value- Using fillna()
4.5 Dropping rows/columns which contains missing values