3.8 One-hot encoding

Notes

One-Hot Encoding allows encoding categorical variables in numerical ones. This method represents each category of a variable as one column, and a 1 is assigned if the value belongs to the category or 0 otherwise.

Classes, functions, and methods:

df[x].to_dict(orient='records') - convert x series to dictionaries, oriented by rows.
DictVectorizer().fit_transform(x) - Scikit-Learn class for one-hot encoding by converting x dictionaries into a sparse matrix. It does not affect the numerical variables.
DictVectorizer().get_feature_names() - return the names of the columns in the sparse matrix.

The entire code of this project is available in this jupyter notebook.

⚠️	The notes are written by the community. If you see an error here, please create a PR with a fix.

Notes from Peter Ernicke

Navigation

Machine Learning Zoomcamp course
Session 3: Machine Learning for Classification
Previous: Feature importance: Correlation
Next: Logistic regression

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

08-ohe.md

08-ohe.md

3.8 One-hot encoding

Notes

Navigation

Files

08-ohe.md

Latest commit

History

08-ohe.md

File metadata and controls

3.8 One-hot encoding

Notes

Navigation