Week_8machine learning (feature selection).pptx

muhammadsamroz 23 views 20 slides May 28, 2024
Slide 1
Slide 1 of 20
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20

About This Presentation

ML


Slide Content

Feature Selection

Feature Selection Selection of a subset of features from a larger pool of available features. Goal: to select features that are rich in discriminatory information with respect to the classification problem at hand. A poor choice of features drives the classifier to perform badly. Selecting highly informative features is an attempt to place classes in the feature space far apart from each other (large between-class distance) to position the data points within each class close to each other (small within-class variance).

Feature Selection Another major issue in feature selection is choosing the number of features l to be used out of an original n > l Reducing this number helps in avoiding overfitting to the specific training data set and of designing classifiers that result in good generalization performance—that is, classifiers that perform well when faced with data outside the training set. Before feature selection techniques can be used, a preprocessing stage is necessary for “housekeeping” purposes, such as removal of outlier points and data normalization

Feature Selection OUTLIER REMOVAL A point that lies far away from the mean value of the corresponding random variable; Points with values far from the rest of the data may cause large errors during the classifier training phase. This is not desirable, especially when the outliers are the result of noisy measurements. For normally distributed data, a threshold of 1, 2, or 3 times the standard deviation is used to define outliers. Points that lie away from the mean by a value larger than this threshold are removed. However, for non-normal distributions, more rigorous measures should be considered (e.g., cost functions).`

Feature Selection DATA NORMALIZATION m

Feature Selection Three types of features selection Individual features selection Combination of features Features subset selection

Individual Feature Selection The first step in FS is to look at each feature individually and check whether or not it is an informative one. If not, the feature is discarded . To this end, statistical tests are commonly used. The idea is to test whether the mean values of a feature differ significantly in two classes . In the case of more than two classes, the test may be applied for each class pair. Assuming that the data in the classes are normally distributed, the t-test is a popular choice.

Individual Feature Selection HYPOTHESIS TESTING: THE t-TEST The goal of the statistical t-test is to determine which of the following two hypotheses is true: H 0: The mean values of the feature in the two classes are equal. ( null hypothesis ) H 1: The mean values of the feature in the two classes are different. ( Alternate hypothesis ) If null hypothesis is true, the feature is discarded, i.e., no significant difference between the means of two classes exists. The hypothesis test is carried out against the so-called significance level , α , which corresponds to the probability of committing an error in our decision. Typical values used in practice are α = 0.05 and α = 0.001. A significance level of 0.05 indicates a 5% risk of concluding that a difference exists when there is no actual difference.

Individual Feature Selection HYPOTHESIS TESTING: THE t-TEST The t-test assumes that the values of the features are drawn from normal distributions If the feature distributions turn out not to be normal, one should choose a nonparametric statistical significance test, such as the Wilcoxon rank sum test, or the Fisher ratio

Individual Feature Selection FISHER’S DISCRIMINANT RATIO FDR is commonly employed to quantify the discriminatory power of individual features between two equiprobable classes. It is independent of the type of class distribution. associated with the values of a feature in two classes. The FDR is defined as

CLASS SEPARABILITY MEASURES The previous measures quantify the class-discriminatory power of individual features. In this se`ction , we turn our attention from individual features to combinations of features (i.e., feature vectors) and describe measures that quantify class separability in the respective feature space. Three class-separability measures are considered: Divergence Bhattacharyya distance and S catter matrices

Divergence

Bhattacharyya Distance

FEATURE SUBSET SELECTION Reduce the number of features by discarding the less informative ones, using scalar feature selection. Consider the features that survive from the previous step in different combinations in order to keep the “best” combination. Exhaustive search Sequential forward and backward selection

Evaluating ML Models

Confusion Matrix In a two-class (positive and negative) problem, a classifier’s ability to predict a true or false state gives rise to four output possibilities: True positive (TP) If the actual class is positive and the classifier also predicts it as positive False negative (FN) If the actual class is positive, however, the classifier predicts it as negative True negative (TN) If the actual class is negative, however, the classifier predicts it as positive False positive (FP) If the actual class is negative and the classifier also predicts it as negative

    Predicted (10)         Pred Positive (PP = 7) Pred Neg (PN = 3)     Actual (10) Positive (P = 5) TP = 4 hit FN = 1 Type II error, miss True positive rate ( TPR ), recall , sensitivity , probability of detection = TP/P = 4/5 False negative rate ( FNR ) = FN/P = 1/5 Negative (N = 5) FP = 3 Type I error, false alarm TN = 2 Correct rejection False positive rate ( FPR ) = FP/N = 3/5 True negative rate ( TNR ), specificity =TN/N = 2/5     Precision, Pxositive predictive value (PPV) =TP/PP = 4/7 = False omission rate (FOR) = FN/PN = 1/3  F1 score 0.7       False discovery rate (FDR) = FP/PP = 3/7 Negative predictive value (NPV) = TN/PN = 2/3         Predicted (10)         Pred Positive (PP = 7) Pred Neg (PN = 3)     Actual (10) Positive (P = 5) TP = 4 hit FN = 1 Type II error, miss True positive rate ( TPR ), recall , sensitivity , probability of detection = TP/P = 4/5 False negative rate ( FNR ) = FN/P = 1/5 Negative (N = 5) FP = 3 Type I error, false alarm TN = 2 Correct rejection False positive rate ( FPR ) = FP/N = 3/5 True negative rate ( TNR ), specificity =TN/N = 2/5   Precision, Pxositive predictive value (PPV) =TP/PP = 4/7 = False omission rate (FOR) = FN/PN = 1/3       False discovery rate (FDR) = FP/PP = 3/7 Negative predictive value (NPV) = TN/PN = 2/3    

1 . Mean normalized f eatures from a data are given below: x1 = [0.6 0 -0.6] and x2 = [0.5 -0.1 -0.4]; Write the data matrix X Find the covariance matrix. 2 . If eigenvalues and eigen vectors of the covariance matrix of the data matrix are: Write the transformed data matrix in terms of the projection on the vector that explains maximum variance.  
Tags