Feature Engineering Fundamentals Explained.pptx

shilpamathur13 219 views 41 slides Aug 11, 2024
Slide 1
Slide 1 of 41
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41

About This Presentation

Feature engineering is the process of selecting, modifying, or creating new features (variables) from raw data to improve the performance of machine learning models. It involves identifying the most relevant features, transforming data into a suitable format, handling missing values, encoding catego...


Slide Content

Module 3 Advanced Feature Engineering and Feature Selection

Introduction to Feature Engineering Feature engineering is the process of improving a model’s accuracy by using domain knowledge to select and transform raw data’s most relevant variables into features of predictive models that better represent the underlying problem.

Feature Engineering Feature Transformation Feature Construction Feature Selection Feature Extraction Missing value imputation Handling Categorical Features Outlier detection Feature scaling

Missing Value Imputation

Handling Categorical Features

Outlier Detection interquartile range = Upper Quartile – Lower Quartile = Q­3 – Q­1

Feature Scaling

Why do we need feature scaling

Feature Scaling Standardization Normalization

Standardisation(Z- score normalization) Assume our dataset has random numeric values in the range of 1 to 95,000 (in random order)Just for our understanding consider a small Dataset of barely 10 values with numbers in the given range and randomized order. If we just look at these values, their range is so high, that while training the model with 10,000 such values will take lot of time. We have a solution to solve the problem arisen i.e. Standardization. It helps us solve this by : Down Scaling the Values to a scale common to all, usually in the range -1 to +1. And keeping the Range between the values intact.

Normalization

Feature Selection Techniques Feature selection is a crucial step in the machine learning pipeline, involving the selection of a subset of relevant features (variables, predictors) for use in model construction. Effective feature selection can improve model performance, reduce overfitting, and decrease training time. The role of feature selection in machine learning is, 1. To reduce the dimensionality of feature space. 2. To speed up a learning algorithm. 3. To improve the predictive accuracy of a classification algorithm.

T here are several techniques for feature selection:

Filter Methods In Filter Method, features are selected on the basis of statistics measures. This method does not depend on the learning algorithm and chooses the features as a pre-processing step. These methods are faster and less computationally expensive than wrapper methods. When dealing with high-dimensional data, it is computationally cheaper to use filter methods. Very good for removing duplicated, correlated, redundant features but these methods do not remove multicollinearity.

Information Gain It is defined as the amount of information provided by the feature for identifying the target value and measures reduction in the entropy values. Information gain of each attribute is calculated considering the target values for feature selection. Chi-square Test Chi-square test is a technique to determine the relationship between the categorical variables. The chi-square value is calculated between each feature and the target variable, and the desired number of features with the best chi-square value is selected.

Chi- square Test Example

Steps: 1.Define Null and Alternative Hypothesis: Null Hypothesis: There is no significant association between the two categorical data Alternative Hypothesis: There is significant associ ation between the two categorical data. 2.Calculate Contingency Table:

3. Calculate expected Value

4.Calculate Chi-square value

5. Compare Chi-square value with Critical value to Accept or Reject Hypothesis Degree of freedom=(r-1) (c-1) Significance level=0.05

Therefore Income level is relevant feature for predicting subscription status

Fisher’s Score Fisher score is one of the most widely used supervised feature selection methods. The algorithm returns the ranks of the variables based on the fisher’s score in descending order. Missing Value Ratio The value of the missing value ratio can be used for evaluating the feature set against the threshold value. The formula for obtaining the missing value ratio is the number of missing values in each column divided by the total number of observations. The variable is having more than the threshold value can be dropped.

Fisher Score

Missing Value Ratio: Calculate the missing value ratio for each feature by dividing the number of missing values by the total number of instances in the dataset. Set a threshold for the acceptable missing value ratio (e.g., 0.8, meaning that a feature should have at most 80% of its values missing to be considered). Filter out features that have a missing value ratio above the threshold.

Advanced Feature Selection

Wrapper Methods Wrapper methods, also referred to as greedy algorithms train the algorithm by using a subset of features in an iterative manner. Based on the conclusions made from training in prior to the model, addition and removal of features takes place. Stopping criteria for selecting the best subset are usually pre-defined by the person training the model such as when the performance of the model decreases or a specific number of features has been achieved. The main advantage of wrapper methods over the filter methods is that they provide an optimal set of features for training the model , thus resulting in better accuracy than the filter methods but are computationally more expensive.

Forward selection Forward selection is an iterative process, which begins with an empty set of features. After each iteration, it keeps adding on a feature and evaluates the performance to check whether it is improving the performance or not. The process continues until the addition of a new variable/feature does not improve the performance of the model. Backward elimination Backward elimination is also an iterative approach, but it is the opposite of forward selection. This technique begins the process by considering all the features and removes the least significant feature. This elimination process continues until removing the features does not improve the performance of the model. Recursive Feature Elimination Recursive feature elimination is a recursive greedy optimization approach, where features are selected by recursively taking a smaller and smaller subset of features. Now, an estimator is trained with each set of features, and the importance of each feature is determined using coef_attribute or through a feature_importances_attribute.

Exhaustive Feature Selection Exhaustive feature selection is one of the best feature selection methods, which evaluates each feature set as brute-force. It means this method tries & make each possible combination of features and return the best performing feature set. How Exhaustive Feature Selection Works Generate all possible feature subsets: For a dataset with 𝑛 n features, this means evaluating 2 ^𝑛 subsets (including the empty set). Evaluate each subset: Train and evaluate a model using each subset of features. The evaluation metric could be accuracy, precision, recall, F1 score, etc. Select the best subset: Identify the subset of features that provides the best performance according to the chosen evaluation metric.

Embedded Methods 1.Regularization This method adds a penalty to different parameters of the machine learning model to avoid overfitting of the model. Lasso Regression (L1 Regularization): Adds an L1 penalty (the absolute value of the magnitude of coefficients) to the loss function. This can shrink some coefficients to zero, effectively performing feature selection. Ridge Regression (L2 Regularization): Adds an L2 penalty (the square of the magnitude of coefficients) to the loss function. While it does not perform feature selection by shrinking coefficients to zero, it helps in reducing overfitting and improving model generalization.

2. Tree-based methods Decision Trees: Decision Trees split the data into subsets based on the value of input features, and the splits that provide the best separation (based on criteria like Gini impurity or information gain) indicate the most important features. The depth of the tree and the features selected for splits at various levels provide insights into feature importance. Random Forests: Random Forests are ensembles of decision trees. They provide feature importance by averaging the importance measures of each feature across all the trees. Feature importance in Random Forests is typically calculated by looking at the decrease in impurity (e.g., Gini impurity)

Automated Feature Engineering Automated feature engineering aims to simplify and speed up the process of creating features from raw data by leveraging algorithms and tools. This approach reduces manual effort and can uncover complex patterns and interactions that might be missed otherwise. Benefits of Automated Feature Engineering Speed: Quickly generates and evaluates a large number of features. Complexity Handling: Captures complex interactions and transformations that might be difficult to manually specify. Consistency: Applies feature engineering techniques uniformly across different datasets and tasks. Performance: Often improves model performance by discovering useful features that enhance predictive power.

EvalML AutoML library to automate Feature Engineering evalML is an open-source Python library designed to automate and streamline the machine learning workflow, particularly focusing on end-to-end model development.

Feature Engineering for Specific Data Types Numerical Data Feature Scaling Power Transformations

2.Categorical Data One hot encoding Label encoding Target Encoding

3.Text Data Bag of Words (BoW) TF-IDF (Term Frequency-Inverse Document Frequency Word Embeddings 4.Time-Series Data Lag Fourier Transforms Time-Based Features