You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Notebook 1: "Python-BasicOperations.ipynb"- Contains Basic Operations with Python Dataframes. This notebook contains the below topics.The datasets(1000_Sales_Records.csv) used for this notebook is placed in the same repository. I have pulled this dataset randomly from the Internet
Topic 1: Basic Dataframe Reading/Operations
Code Block 1.1: Reading the dataframe
Code Block 1.2: Getting to know the shape of the dataset (Rows and Columns)
Code Block 1.3: Length of dataframe.
Code Block 1.4: Getting to know the data type of the dataset
Code Block 1.5: Extracting one column from the dataframe and getting to know the data type and the size (of Series)
Code Block 1.6: Printing the size of the series
Code Block 1.7: Data types for whole dataframe (variables)
Code Block 1.8: Working on creating specific indexes
Code Block 1.9: Printing the first 5 rows of the dataframe
Code Block 1.10: Printing the last 5 rows of the dataframe
Code Block 1.11: Displaying the information of the dataframe
Code Block 1.12: Extracting all rows from the dataframe with only one column
Code Block 1.13: Understanding difference between Series and Dataframe
Code Block 1.14: Extracting range of columns. For example all columns from country to right end
Code Block 1.15: Selection Based on single index column
Code Block 1.16: Selection Based on multiple index columns values
Code Block 1.17: Selection Based on multiple index columns values
Topic 2: Conversion of operations/code from SQL to Python
Code Block 2.1: SQL (where clause with single condition)-->Python Code
Code Block 2.2: SQL (where clause with multiple conditions)-->Python Code
Code Block 2.3: SQL (where clause with multiple conditions using NOT IN)-->Python Code
Code Block 2.4: SQL (where clause with order by on single variable)-->Python Code
Code Block 2.5: SQL (where clause with order by on multiple variables)-->Python Code
Topic 3: Data Exploration
Code Block 3.1:Checking on various statistics for categorical variables with 1 variable (Series)
Code Block 3.2:Checking on various statistics for integer variables with 1 variable (Series)
Code Block 3.3:Working on describe method for the whole dataframe which basically consists a mix of numbers and categorical variables
Code Block 3.4: Data exploration methods for Series vs Dataframe
Code Block 3.5:Checking median of all integer columns in the dataframe
Code Block 3.6: Select distinct values of any column
Topic 4: Creating a new column, calculated columns, cleaning the column names, dropping rows/columns
4.1 Creating a new column with in dataframe
4.2 Printing all the columns from the dataset/Getting to know the column names
4.3 Cleaning the columns
4.4 Converting the data type of the column
4.5 Renaming the column names
4.6 Counting the missing values for each column in the dataset
4.7 Dropping rows and columns that has missing values
4.8 Replacing the missing values with some value (provided with some conditional logic)
4.9 Map funciton
4.10 Writing the final dataset (cleaned) one into csv
Topic 5: Plotting
5.1 Plotting a horizontal bar/histogram
Notebook 2: "Python-Data Combining.ipynb"- Contains various techniques on combining/merging the dataframes. This notebook contains the below topics. The datasets (Trans1.csv, Trans2.csv, Transactions.csv, Info.csv) used in this notebook were placed in the same repository. I have created all these datasets with some random data on my own to work on the operations
Topic 1: Data Combine
1.1 Combining dataframes using concat function
1.2 Combining dataframes using concat function- with Ignore Index option
1.4 Combining dataframes using Merge function (Inner Join)
1.5 Combining dataframes using Merge function (left Join)
1.6 Combining dataframes using Merge function (right Join)
1.7 Combining dataframes using Merge function (outer Join)
1.8 Use of suffixes
Topic 2: Transforming Data with Pandas- Using map(), apply(), applymap(), apply(), melt()
2.1 Creation of new column based on cases (this is more like CASE statement in SAS/SQL)
2.2 Difference between apply() and map()
2.3 Use of applymap()
2.4 Using pd.melt(): unpivots a Dataframe from wide format to long format
Topic 3: Working with Strings in Pandas
3.1 Renaming one of the column
3.2 Commonly used String functions
3.3 Calculating the lenth of a string for one column and store it in a different column
3.4 Creating a new calculated column by converting the string stored in one column to upper case letters
3.5 Pattern Searching- Using contains
3.6 extract() and extractall()
Topic 4: Working with Missing and Duplicate Data
4.1 Identifying missing values
4.2 Indentifying the duplicate values
4.3 Dropping the duplicates
4.4 Imputation of missing values with mean or any fixed value- Using fillna()
4.5 Dropping rows/columns which contains missing values