Feature Scaling and Normalization Feature Scaling and Normalization.pptx

Nishant83346 144 views 19 slides Jun 10, 2024
Slide 1
Slide 1 of 19
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19

About This Presentation

Feature Scaling and Normalization


Slide Content

Feature Scaling and Normalization

Introduction Feature scaling can vary your results a lot while using certain algorithms and have a minimal or no effect in others. To understand this, let’s look why features need to be scaled, varieties of scaling methods and when we should scale our features.

What is Feature Scaling? Technique to standardize the independent features present in the data in a fixed range. Performed during the data pre-processing. One of the most important steps during the preprocessing of data before creating a machine learning model. This can make a difference between a weak machine learning model and a strong one. Two most important scaling techniques is Standardization and Normalization.

When to Scale Rule of thumb I follow here is any algorithm that computes distance or assumes normality,  scale your features!!! Some examples of algorithms where feature scaling matters are: k-nearest neighbors  with an Euclidean distance measure is sensitive to magnitudes and hence should be scaled for all features to weigh in equally. Scaling is critical, while performing  Principal Component Analysis(PCA) . PCA tries to get the features with maximum variance and the variance is high for high magnitude features. This skews the PCA towards high magnitude features. We can speed up  gradient descent  by scaling. This is because θ will descend quickly on small ranges and slowly on large ranges, and so will oscillate inefficiently down to the optimum when the variables are very uneven. Tree based models  are not distance based models and can handle varying ranges of features. Hence, Scaling is not required while modelling trees. Algorithms like  Linear Discriminant Analysis(LDA), Naive Bayes   are by design equipped to handle this and gives weights to the features accordingly. Performing a features scaling in these algorithms may not have much effect.

Why and Where to Apply Feature Scaling? The Real World dataset includes features that highly vary in magnitudes, units, and range. Normalization should be executed when the scale of a feature is pointless or misleading and not should Normalise when the scale is meaningful. Some algorithms which use Euclidean Distance measure are receptive to Magnitudes. Feature scaling also helps to weigh all the features equally. If a feature in the dataset is big in scale compared to others then in algorithms where Euclidean distance is measured this big scaled feature becomes dominating and needs to be normalized. If left alone, these algorithms only take in the magnitude of features neglecting the units. The results would vary greatly between different units, 5kg and 5000gms. The features with high magnitudes will weigh in a lot more in the distance calculations than features with low magnitudes. To supress this effect, we need to bring all features to the same level of magnitudes. This can be acheived by scaling.

How to Scale Features Standardization Min-Max Scaling(Normalization ) MaxAbs Scaler Robust Scaler Normalizer

Standardization Standardization is another scaling technique where the values are centered around the mean with a unit standard deviation. This means that the mean of the attribute becomes zero and the resultant distribution has a unit standard deviation.

Standardization(Points to Note) In Standardization we transform our values such that the mean( μ ) of the values is 0 and the standard deviation( σ) is 1. Standardisation replaces the values by their Z scores. The values are not restricted to a particular range. Standardizing the features so that they are centered around 0 with a standard deviation of 1 is not only important if we are comparing measurements that have different units, but it is also a general requirement for many machine learning algorithms.

Example Consider the below dataframe , here we have 2 numerical values Age and Salary . They are not on the same scale as Age is in years and Salary is in dollars and since Salary will always be greater than Age ; therefore, our model will give more weightage to salary which is not the ideal scenario as age is also an integral factor here. In order to avoid this issue we perform Standardization.

Formula for Standardization So in simple terms we just calculate the mean and standard deviation of the values and then for each data point we just subtract the mean and divide it by standard deviation.

Normalization(Min Max Scaling) Normalization is a scaling technique in which values are shifted and rescaled so that they end up ranging between 0 and 1. It is also known as Min-Max scaling.

Min-Max scaling An alternative approach to Z-score normalization (or standardization) is the so-called  Min-Max scaling  (often also simply called “normalization” - a common cause for ambiguities). In this approach, the data is scaled to a fixed range - usually 0 to 1. The cost of having this bounded range - in contrast to standardization - is that we will end up with smaller standard deviations, which can suppress the effect of outliers.

Formula for Min-Max scaling

Z-score standardization or Min-Max scaling? It is hard to know whether rescaling your data will improve the performance of your algorithms before you apply them. If often can, but not always. A good tip is to create rescaled copies of your dataset and race them against each other using your test harness and a handful of algorithms you want to spot check. This can quickly highlight the benefits (or lack there of) of rescaling your data with given models, and which rescaling method may be worthy of further investigation.

MaxAbs Scaler The  MaxAbsScaler   works very similarly to the  MinMaxScaler   but automatically scales the data to a  [-1,1]  range based on the  absolute maximum .  This scaler is meant for  data that is already centered at zero or sparse data . It does not shift/ center the data, and thus does not destroy any sparsity . x_scaled = x / max(abs(x))

Robust Scaler If your data contains many  outliers , scaling using the mean and standard deviation of the data is likely to not work very well.  In these cases, you can use the  RobustScaler .    It removes the median and scales the data according to the quantile range .  The exact formula of the  RobustScaler   is not specified by the documentation. If you want full details you can always check  the source code .

Continued… By default, the scaler uses the Inter Quartile Range (IQR), which is the range between the 1st quartile and the 3rd quartile. The quantile range can be manually set by specifying the  quantile_range   parameter when initiating a new instance of the  RobustScaler . Here, we transform  feature 3  using an quantile range from  10%  till  90% .

Normalizer The normalizer scales each value by dividing each value by its magnitude in  nn -dimensional space for  nn   number of features . Say your features were x, y and z Cartesian co-ordinates your scaled value for x would be: xi/ sqrt (xi*2 + yi *2 + zi *2)

Examples of Algorithms where Feature Scaling matters 1.  K-Means  uses the Euclidean distance measure here feature scaling matters. 2.  K-Nearest-Neighbours  also require feature scaling. 3.  Principal Component Analysis (PCA) : Tries to get the feature with maximum variance, here too feature scaling is required. 4.  Gradient Descent : Calculation speed increase as Theta calculation becomes faster after feature scaling. Note :  Naive Bayes , Linear Discriminant Analysis, and Tree-Based models are not affected by feature scaling. In Short, any Algorithm which is  Not  Distance based is  Not  affected by Feature Scaling.
Tags