Feature Scaling and Normalization Feature Scaling and Normalization.pptx
Nishant83346
144 views
19 slides
Jun 10, 2024
Slide 1 of 19
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
About This Presentation
Feature Scaling and Normalization
Size: 133.97 KB
Language: en
Added: Jun 10, 2024
Slides: 19 pages
Slide Content
Feature Scaling and Normalization
Introduction Feature scaling can vary your results a lot while using certain algorithms and have a minimal or no effect in others. To understand this, let’s look why features need to be scaled, varieties of scaling methods and when we should scale our features.
What is Feature Scaling? Technique to standardize the independent features present in the data in a fixed range. Performed during the data pre-processing. One of the most important steps during the preprocessing of data before creating a machine learning model. This can make a difference between a weak machine learning model and a strong one. Two most important scaling techniques is Standardization and Normalization.
When to Scale Rule of thumb I follow here is any algorithm that computes distance or assumes normality, scale your features!!! Some examples of algorithms where feature scaling matters are: k-nearest neighbors with an Euclidean distance measure is sensitive to magnitudes and hence should be scaled for all features to weigh in equally. Scaling is critical, while performing Principal Component Analysis(PCA) . PCA tries to get the features with maximum variance and the variance is high for high magnitude features. This skews the PCA towards high magnitude features. We can speed up gradient descent by scaling. This is because θ will descend quickly on small ranges and slowly on large ranges, and so will oscillate inefficiently down to the optimum when the variables are very uneven. Tree based models are not distance based models and can handle varying ranges of features. Hence, Scaling is not required while modelling trees. Algorithms like Linear Discriminant Analysis(LDA), Naive Bayes are by design equipped to handle this and gives weights to the features accordingly. Performing a features scaling in these algorithms may not have much effect.
Why and Where to Apply Feature Scaling? The Real World dataset includes features that highly vary in magnitudes, units, and range. Normalization should be executed when the scale of a feature is pointless or misleading and not should Normalise when the scale is meaningful. Some algorithms which use Euclidean Distance measure are receptive to Magnitudes. Feature scaling also helps to weigh all the features equally. If a feature in the dataset is big in scale compared to others then in algorithms where Euclidean distance is measured this big scaled feature becomes dominating and needs to be normalized. If left alone, these algorithms only take in the magnitude of features neglecting the units. The results would vary greatly between different units, 5kg and 5000gms. The features with high magnitudes will weigh in a lot more in the distance calculations than features with low magnitudes. To supress this effect, we need to bring all features to the same level of magnitudes. This can be acheived by scaling.
How to Scale Features Standardization Min-Max Scaling(Normalization ) MaxAbs Scaler Robust Scaler Normalizer
Standardization Standardization is another scaling technique where the values are centered around the mean with a unit standard deviation. This means that the mean of the attribute becomes zero and the resultant distribution has a unit standard deviation.
Standardization(Points to Note) In Standardization we transform our values such that the mean( μ ) of the values is 0 and the standard deviation( σ) is 1. Standardisation replaces the values by their Z scores. The values are not restricted to a particular range. Standardizing the features so that they are centered around 0 with a standard deviation of 1 is not only important if we are comparing measurements that have different units, but it is also a general requirement for many machine learning algorithms.
Example Consider the below dataframe , here we have 2 numerical values Age and Salary . They are not on the same scale as Age is in years and Salary is in dollars and since Salary will always be greater than Age ; therefore, our model will give more weightage to salary which is not the ideal scenario as age is also an integral factor here. In order to avoid this issue we perform Standardization.
Formula for Standardization So in simple terms we just calculate the mean and standard deviation of the values and then for each data point we just subtract the mean and divide it by standard deviation.
Normalization(Min Max Scaling) Normalization is a scaling technique in which values are shifted and rescaled so that they end up ranging between 0 and 1. It is also known as Min-Max scaling.
Min-Max scaling An alternative approach to Z-score normalization (or standardization) is the so-called Min-Max scaling (often also simply called “normalization” - a common cause for ambiguities). In this approach, the data is scaled to a fixed range - usually 0 to 1. The cost of having this bounded range - in contrast to standardization - is that we will end up with smaller standard deviations, which can suppress the effect of outliers.
Formula for Min-Max scaling
Z-score standardization or Min-Max scaling? It is hard to know whether rescaling your data will improve the performance of your algorithms before you apply them. If often can, but not always. A good tip is to create rescaled copies of your dataset and race them against each other using your test harness and a handful of algorithms you want to spot check. This can quickly highlight the benefits (or lack there of) of rescaling your data with given models, and which rescaling method may be worthy of further investigation.
MaxAbs Scaler The MaxAbsScaler works very similarly to the MinMaxScaler but automatically scales the data to a [-1,1] range based on the absolute maximum . This scaler is meant for data that is already centered at zero or sparse data . It does not shift/ center the data, and thus does not destroy any sparsity . x_scaled = x / max(abs(x))
Robust Scaler If your data contains many outliers , scaling using the mean and standard deviation of the data is likely to not work very well. In these cases, you can use the RobustScaler . It removes the median and scales the data according to the quantile range . The exact formula of the RobustScaler is not specified by the documentation. If you want full details you can always check the source code .
Continued… By default, the scaler uses the Inter Quartile Range (IQR), which is the range between the 1st quartile and the 3rd quartile. The quantile range can be manually set by specifying the quantile_range parameter when initiating a new instance of the RobustScaler . Here, we transform feature 3 using an quantile range from 10% till 90% .
Normalizer The normalizer scales each value by dividing each value by its magnitude in nn -dimensional space for nn number of features . Say your features were x, y and z Cartesian co-ordinates your scaled value for x would be: xi/ sqrt (xi*2 + yi *2 + zi *2)
Examples of Algorithms where Feature Scaling matters 1. K-Means uses the Euclidean distance measure here feature scaling matters. 2. K-Nearest-Neighbours also require feature scaling. 3. Principal Component Analysis (PCA) : Tries to get the feature with maximum variance, here too feature scaling is required. 4. Gradient Descent : Calculation speed increase as Theta calculation becomes faster after feature scaling. Note : Naive Bayes , Linear Discriminant Analysis, and Tree-Based models are not affected by feature scaling. In Short, any Algorithm which is Not Distance based is Not affected by Feature Scaling.