2.12 Categorical variables

Notes

Categorical variables are typically represented as strings, and pandas identifies them as object types. However, some variables that appear to be numerical may actually be categorical (e.g., the number of doors a car has). All these categorical variables need to be converted to a numerical form because ML models can interpret only numerical features. It is possible to incorporate certain categories from a feature, not necessarily all of them. This transformation from categorical to numerical variables is known as One-Hot encoding.

The entire code of this project is available in this jupyter notebook.

⚠️	The notes are written by the community. If you see an error here, please create a PR with a fix.

Notes from Peter Ernicke

Comments

This way of encoding categorical features is called "one-hot encoding". We'll learn more about it in Session 3.

Navigation

Machine Learning Zoomcamp course
Session 2: Machine Learning for Regression
Previous: Feature engineering
Next: Regularization

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

12-categorical-variables.md

12-categorical-variables.md

2.12 Categorical variables

Notes

Comments

Navigation

Files

12-categorical-variables.md

Latest commit

History

12-categorical-variables.md

File metadata and controls

2.12 Categorical variables

Notes

Comments

Navigation