Pattern_recognition for master computer science

minacodegirl 0 views 19 slides Sep 28, 2025
Slide 1
Slide 1 of 19
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19

About This Presentation

this slide is for Pattern recognition course in master computer science


Slide Content

Feature Selection for Classification using PCA and Information Gain ( Erick Odhiambo Omuya )

Feature Selection & Classification 2

Classification process Breakdown of data into groups: 1. the method finds a model for the class attribute as a function of other variables of the datasets. 2. it applies a previously designed model on the new and unseen datasets. Using machine learning methods f.g : Decision Trees, Logistic & Linear Regression, ANN, Naïve Bayes, SVM, kNN , … ) 3

Feature selection Feature selection is a process that involves removing non-relevant and repeated features from a data set so as to improve the performance of machine learning techniques and their applications. 4

Feature selection algorithms Supervised S elect relevant features based on labelled datasets. Supervised methods can either be filter, wrapper, or embedded models. Semi supervised use both labelled and unlabeled data to evaluate the relevance of features. Unsupervised Identify and select relevant features without using class label information 5

Supervised methods 6 The filter model works in a way that feature selection and learning of the model are independent. The wrapper model uses a small set of feature. The Embedded model mainly deals with selecting features that rate highly in terms of accuracy. feature search process is embedded into the classification algorithm, and the learning process and the feature selection process can’t be separated.

Big challenge The curse of dimensionality! 7

High dimensionality in data sets It results from: collecting information with many features or variables that has not been proved to be either needed or significant for the task. 8

a hybrid model for selecting features and classifying data It will work to reduce data dimensions , reduce training time and provide better performance of classification using the selected features. 9

Hybrid model consists of the following components: Last Model Training and Classification First Principal Component Analysis Second Evaluation of features using Information Gain 10

Information Gain Step One : Given an attribute A and a class C, step one is calculating the Entropy (H) before observation of attribute A which is given by the formula: 11

Step two Step two is to calculate the Entropy after observation of an attribute A which is given by: 12

Last Step The Last step is calculating Information Gain. The Information Gain of attribute A is the difference between the entropy before observation of attribute a In A and the entropy after attribute observation of the attribute: 13

14

15

16

17

18

Thanks! Any questions? 19