Deals with transforming, cleaning and handling of Categorical data while performing predictions.
Size: 43.51 KB
Language: en
Added: Jul 18, 2020
Slides: 4 pages
Slide Content
How to handle Categorical Data By Srinivas Rao PrithviNag Kolla, Masters in Data Science, University of North Texas, Email: [email protected]
Categorical Variable: Generally Data falling into a fixed set of categories is called a categorical data. Ex: Survey of what type of phone brand people own comes under categorical data. Id Name Phone Brand 1 Alex Apple 2 George Nokia 3 Chen Apple 4 prithvi Samsung Dropping Categorical Variables: If the columns in the data set have categorical values , which are not useful for modeling , we can drop them.
Label Encoding: Giving a unique integer value for the labels in the categorical column. Ex: Phone Brand Apple Nokia Jio Apple Phone Brand 1 2 Label Encoding Decision Trees and Random Forests work well with Label Encoding. ‘Apple’(0) < ‘Nokia’(1) < ‘Jio’(2)
One – Hot – Encoding As previously in Label Encoding, we gave an order based unique values to the labels. It doesn’t work well, so we use the method of separating the categorical values into columns and give ‘1’ as they are present in that row, if not ‘0’. Ex: Phone Brand Apple Nokia Apple Jio Nokia Apple Nokia Jio 1 1 1 1 1 One-Hot-Encoding