GitHub - solegalli/Python-Feature-Engineering-Cookbook-First-Edition

Python Feature Engineering Cookbook - Code Repository

Published January 22nd, 2020

Paperback: 372 pages
Publisher: Packt Publishing
Language: English ISBN: 9781789806311

Links

Packt Page

Table of Contents and Recipes

Foreseeing Variable problems in building ML models
1. Identifying numerical and categorical variables
2. Quantifying missing data
3. Determining cardinality in categorical variables
4. Pinpointing rare categories in categorical variables
5. Identifying a linear relationship
6. Identifying normal distributions
7. Distinguishing variable distribution
8. Highlighting Outliers
9. Comparing feature magnitude
Missing data imputation
1. Removing observations with missing data
2. Performing mean or median imputation
3. Implementing mode or frequent category imputation
4. Replacing missing values by an arbitrary number
5. Capturing missing values in a bespoke category
6. Replacing missing values by a value at the end of the distribution
7. Implementing random sample imputation
8. Adding a missing value indicator variable
9. Performing multivariate imputation by chained equations, MICE
10. Assembling an imputation pipeline with Scikit-learn
11. Assembling an imputation pipeline with feature-engine
Encoding Categorical Variables
1. Creating binary variables through One Hot Encoding
2. Performing One hot encoding of frequent categories
3. Replacing categories by ordinal numbers
4. Replacing categories by counts or frequency of observations
5. Encoding with integers in an ordered manner
6. Encoding with the mean of the target
7. Encoding with the Weight of evidence
8. Grouping rare or infrequent categories
9. Performing Binary encoding
10. Performing Feature hashing
Transforming Numerical Variables
1. Transforming variables with the logarithm
2. Transforming variables with the reciprocal function
3. Using square and cube root to transform variables
4. Using power transformations on numerical variables
5. Performing Box-Cox transformation on numerical variables
6. Carrying out Yeo-Johnson transformation on numerical variables
Performing Variable Discretisation
1. Dividing the variable in intervals of equal width
2. Sorting the variable values in intervals of equal frequency
3. Performing discretization followed by categorical encoding
4. Allocating the variable values in arbitrary intervals
5. Performing discretization with k-means
6. Using decision trees for discretization
Working with Outliers
1. Trimming outliers from the data set
2. Performing Winsorization
3. Capping the variable at arbitrary maximum and minimum values
4. Performing zero-coding – capping the variable at zero
Deriving features from Dates and time variables
1. Extracting date and time parts from datetime variable
2. Deriving representations of year and month
3. Creating representations of day and week
4. Extracting time parts from a time variable
5. Capturing elapsed time between datetime variables
6. Working with time in different timezones
Performing Feature Scaling
1. Standardization the features
2. Performing Mean Normalisation
3. Scaling to the maximum and minimum values
4. Implementing maximum absolute scaling
5. Scaling with the median and quantiles
6. Scaling to vector unit length
Applying Mathematical Computations to Features
1. Combining multiple features with statistical operations
2. Combining pairs of features with mathematical functions
3. Performing polynomial expansion
4. Deriving new features with decision trees
5. Carrying out Principal Component Analysis
Creating Features from Time Series and Transactional Data
1. Aggregating transactions with mathematical operations
2. Aggregating transactions in a time window
3. Determining number of local maxima and minima
4. Deriving time elapsed between time-stamped events
5. Creating features from transactions with Featuretools
Extracting features from text variables
1. Counting characters, words and vocabulary
2. Estimating text complexity by counting sentences
3. Creating features with Bag of words and ngrams
4. Implementing term frequency-inverse document frequency
5. Cleaning and stemming text variables

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github		.github
ch01-variable-characteristics		ch01-variable-characteristics
ch02-missing-data-imputation		ch02-missing-data-imputation
ch03-categorical-encoding		ch03-categorical-encoding
ch04-tranforming-numerical-vars		ch04-tranforming-numerical-vars
ch05-discretisation		ch05-discretisation
ch06-outliers		ch06-outliers
ch07-datetime		ch07-datetime
ch08-feature-scaling		ch08-feature-scaling
ch09-mathematical-transformations		ch09-mathematical-transformations
ch10-transactional-data		ch10-transactional-data
ch11-text		ch11-text
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cover.png		cover.png
environment.yml		environment.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python Feature Engineering Cookbook - Code Repository

Links

Table of Contents and Recipes

About

Releases

Sponsor this project

Packages

Contributors 2

Languages

License

solegalli/Python-Feature-Engineering-Cookbook-First-Edition

Folders and files

Latest commit

History

Repository files navigation

Python Feature Engineering Cookbook - Code Repository

Links

Table of Contents and Recipes

About

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Contributors 2

Languages

Packages