Data Preprocessing:Feature scaling methods

sonalisonavane 29 views 11 slides Jul 18, 2024

Slide 1 of 11

About This Presentation

Execute feature scaling on given dataset

Size: 59.92 KB

Language: en

Added: Jul 18, 2024

Slides: 11 pages

Slide Content

Assignment 4 Execute feature scaling on given dataset.

What is feature Scaling? It is a step of Data Pre Processing that is applied to independent variables or features of data. It helps to normalize the data within a particular range. Sometimes , it also helps in speeding up the calculations in an algorithm.

Why and Where to Apply Feature Scaling? features that highly vary in magnitudes, units, and range . Normalization should be performed when the scale of a feature is irrelevant or misleading and should not normalize when the scale is meaningful. The algorithms which use Euclidean Distance measures are sensitive to Magnitudes . Here feature scaling helps to weigh all the features equally. Formally, If a feature in the dataset is big in scale compared to others then in algorithms where Euclidean distance is measured this big scaled feature becomes dominating and needs to be normalized.

Feature Scaling Techniques Min-Max Scaling Normalization Standardization

Min Max Scaling This method of scaling requires below two-step: 1. First , we are supposed to find the minimum and the maximum value of the column. 2. Then we will subtract the minimum value from the entry and divide the result by the difference between the maximum and the minimum value .

Min_Max code import pandas as pd from sklearn.preprocessing import MinMaxScaler df = pd.read_csv ('car_details.csv') x= df.iloc [:,3:9] print( x.head ()) scaler = MinMaxScaler () scaled_data = scaler.fit_transform (x) scaled_df = pd.DataFrame ( scaled_data , columns= x.columns ) scaled_df.head ()

Normalization This method is more or less the same as the previous method but here instead of the minimum value, we subtract each entry by the mean value of the whole data and then divide the results by the difference between the minimum and the maximum value.

Normalizer code import pandas as pd import numpy as np from sklearn.preprocessing import Normalizer df = pd.read_csv ('car_details.csv') x= df.iloc [:,3:9] final_dataset = x.replace ({np.nan:0 }) scaler = Normalizer() scaled_data = scaler.fit_transform ( final_dataset ) scaled_df = pd.DataFrame ( scaled_data , columns= final_dataset.columns ) print( scaled_df.head ())

Standardization Based on the central tendencies and variance of the data. First, calculate the mean and standard deviation of the data Then subtract the mean value from each entry and then divide the result by the standard deviation. This helps us achieve a normal distribution (if it is already normal but skewed) of the data with a mean equal to zero and a standard deviation equal to 1.

Standardization code import pandas as pd import numpy as np from sklearn.preprocessing import StandardScaler df = pd.read_csv ('car_details.csv') x= df.iloc [:,3:9] final_dataset = x.fillna (0) scaler = StandardScaler () scaled_data = scaler.fit_transform ( final_dataset ) scaled_df = pd.DataFrame ( scaled_data , columns= final_dataset.columns ) print( scaled_df.head ())

Examples of Algorithms where Feature Scaling matters 1 . K-Means uses the Euclidean distance measure here feature scaling matters. 2. K-Nearest-Neighbors also require feature scaling. 3. Principal Component Analysis (PCA) : Tries to get the feature with maximum variance, here too feature scaling is required. 4. Gradient Descent : Calculation speed increase as Theta calculation becomes faster after feature scaling. Note: Naive Bayes, Linear Discriminant Analysis, and Tree-Based models are not affected by feature scaling. In Short, any Algorithm which is Not Distance-based is Not affected by Feature Scaling.

Data Preprocessing:Feature scaling methods

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Data Preprocessing:Feature scaling methods

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......