Lecture04_Concept Learning_ FindS Algorithm.pptx

Machine Learning Lecture 03 – Concept Learning- Find-S Algorithm Slides Credit Dr. Rao Muhammad Adeel Nawab Edited by : Dr. Allah Bux Sargana

Representing Training Examples and Hypothesis for FIND-S Machine Learning Algorithm

Concept Learning Concept learning, also known as category learning was defined by Bruner, Goodnow , & Austin (1967) as "the search for and listing of attributes that can be used to distinguish exemplars from non exemplars of various categories”. In a concept learning task, a human or machine learner is trained to classify objects by being shown a set of example objects along with their class labels. Concept is an idea of something formed by combining all its features or attributes which construct the given concept. Every concept has two components: Attributes: features that one must look for to decide whether a data instance is a positive one of the concept. A rule: denotes what conjunction of constraints on the attributes will qualify as a positive instance of the concept .

Concept Learning (Cont.) Concept Learning: Acquiring the definition of a general category from given sample positive and negative training examples of the category. Concept Learning can be seen as a problem of searching through a predefined space of potential hypotheses for the hypothesis that best fits the training examples. The hypothesis space has a general-to-specific ordering of hypotheses, and the search can be efficiently organized by taking advantage of a naturally occurring structure over the hypothesis space.

Concept Learning (Cont.) A Formal Definition for Concept Learning: Inferring a boolean -valued function from training examples of its input and output. An example for concept-learning is the learning of bird-concept from the given examples of birds (positive examples) and non-birds ( negative examples ). • We are trying to learn the definition of a concept from given examples.

Learning Input-Output Functions – General Settings Input to Learner In this lecture Note that h is an approximation of Target Function f Set of Training Examples (D) Set of Functions / Hypothesis (H) A Hypothesis (h) from H which best fits the Training Examples (D) Output by Learner Learner is FIND-S Machine Learning Algorithm 😊

Lecture Focus In this Lecture, we will take the Gender Identification Problem and try to explain three main things Representation of Hypothesis (h) Searching Strategy Representation of Training Examples (D) How to represent Training Examples (D) in a Format which FIND-S Algorithm can understand and learn from them? How to represent Hypothesis (h) in a Format which FIND-S Algorithm can understand ? What searching strategy is used by FIND-S Algorithm to find a h from H , which best fits the Training Examples (D)?

Gender Identification Problem Gender Identification Machine Learning Problem Input Output Human Features Gender of a Human Task Treated as Given f eatures a Human (Input), predict the Gender of the Human (Output) Learning Input-Output Function i.e . Learn from Input to predict Output

Representation of Examples Representation of Input and Output Example = Input + Output Attribute-Value Pair Input = Human Output = Gender

Representation of Input 1 A Input is represented as a 3 5 Height HairLength 2 4 6 Weight HeadCovered WearingChain ShirtSleeves Set of 6 Input Attributes

Representation of Input (Cont.) Instance No. Input Attribute Data Type Input Attribute Values x 1 Height Categorical Short, Normal, Tall x 2 Weight Categorical Light, Heavy x 3 HairLength Categorical Short, Long x 4 HeadCovered Categorical Yes, No x 5 WearingChain Categorical Yes, No x 6 ShirtSleeves Categorical Half, Full

Representation of Output Set of 1 Output Attribute Gender Input is represented as a Instance No. Output Attribute Data Type Output Attribute Values x 1 Gender Categorical Yes, No Yes means Female and No means Male Note

Computing Size of Instance Space (X) Instance No. Input Attribute Input Attribute Values No. of Values x 1 Height Short, Normal, Tall 3 x 2 Weight Light, Heavy 2 x 3 HairLength Short, Long 2 x 4 HeadCovered Yes, No 2 x 5 WearingChain Yes, No 2 x 6 ShirtSleeves Half, Full 2 |X | = No. of Values of Height Input Attribute x No. of Values of Weight Input Attribute x No. of Values of HairLength Input Attribute x No. of Values of HeadCovered Input Attribute x No. of Values of WearingChain Input Attribute x No. of Values of ShirtSleeves Input Attribute |X | = 3*2*2*2*2*2 = 96

Sample Data We obtained a Sample Data of 6 examples Instance No. Input Gender Height Weight Hair Length Head Covered Wearing Chain Shirt Sleeves x 1 Short Light Short Yes Yes Half Male x 2 Short Light Long Yes Yes Half Female x 3 Tall Heavy Long Yes Yes Full Female x 4 Short Light Long Yes No Full Male x 5 Short Heavy Short Yes Yes Half Female x 6 Tall Light Short No Yes Full Male

Representation of Hypothesis (h) We represent a Hypothesis (h) as Conjunction (AND) of Constraints on Input Attributes Each constraint can be: No value allowed (null hypothesis Ø): e.g. Height = Ø A specific value : e.g. Height = Short A don’t care value (any of possible values): e.g. Height = ? Most Specific Hypothesis (h) < Height Weight HairLength HeadCovered WearingChain ShirtSleeves > < ∅ ∅ ∅ ∅ ∅ ∅ >

Representation of Hypothesis (h ) (Cont.) Most General Hypothesis (h) < Height Weight HairLength HeadCovered WearingChain ShirtSleeves > < ? ? ? ? ? ? > Another Hypothesis (h) < Height Weight HairLength HeadCovered WearingChain ShirtSleeves > < Normal Light ? ? No ? > Important Note The order of Input Attributes must be exactly same in Training Example (d) and Hypothesis (h)

Computing Size of Concept Space (C) and Hypothesis Space (H) Instance No. Input Attribute Input Attribute Constraints No. of Constraints x 1 Height ∅, Short, Normal, Tall, ? 5 x 2 Weight ∅, Light, Heavy, ? 4 x 3 HairLength ∅, Short, Long, ? 4 x 4 HeadCovered ∅, Yes, No, ? 4 x 5 WearingChain ∅, Yes, No, ? 4 x 6 ShirtSleeves ∅. Half, Full, ? 4

Computing Size of Concept Space (C) and Hypothesis Space (H) Size of Concept Space (C) Size of Hypothesis Space (H) (Syntactically Distinct Hypothesis) Size of Hypothesis Space (H) (Semantically Distinct Hypothesis) Size of Instance Space (X) | X| = 96 = 79,228,162,514,264,337,593,543,950,336 Only one more value for attributes: ?, and one hypothesis representing empty set of instances.

FIND-S Algorithm - Machine Learning Cycle

Machine Learning Cycle Four phases of a Machine Learning Cycle are Build the Model using Training Data Training Phase Testing Phase Evaluate the performance of Model using Testing Data Application Phase Deploy the Model in Real-world, to make prediction on Real-time unseen Data Feedback Phase Take Feedback form the Users and Domain Experts to improve the Model

Split the Sample Data We split the Sample Data using Random Split Approach into 1/3 2/3 Training Data Testing Data

Sample Data Instance No. Input Gender Height Weight Hair Length Head Covered Wearing Chain Shirt Sleeves x 1 Short Light Short Yes Yes Half Male x 2 Short Light Long Yes Yes Half Female x 3 Tall Heavy Long Yes Yes Full Female x 4 Short Light Long Yes No Full Male x 5 Short Heavy Short Yes Yes Half Female x 6 Tall Light Short No Yes Full Male

Training Data Instance No. Input Gender Height Weight Hair Length Head Covered Wearing Chain Shirt Sleeves x 1 Short Light Short Yes Yes Half Female x 2 Short Light Long Yes Yes Half Female x 3 Tall Heavy Long Yes Yes Full Male x 4 Short Light Long Yes No Full Female

Testing Data Instance No. Input Gender Height Weight Hair Length Head Covered Wearing Chain Shirt Sleeves x 1 Short Light Short Yes Yes Half Male x 2 Tall Light Short No Yes Full Male

Note After splitting Sample Data using Random Split Approach Sample Data is balanced 3 Positive Instances (Female) Training Data is unbalanced 3 Negative Instances (Male) Testing Data is unbalanced 3 Positive Instances (Female) 1 Negative Instances (Male) Positive Instances (Female) 2 Negative Instances (Male)

Sample Data – Vector Representation Vector Representation of Examples

Training Data – Vector Representation Vector Representation of Training Examples

Testing Data – Vector Representation Vector Representation of Test Examples

Find-S Algorithm (or Learner) Initialize h to the Most Specific Hypothesis in H For each positive training instance x For each attribute constraint a i in h If the constraint a i in h is satisfied by x then do nothing else replace a i in h by the next more general constraint that is satisfied by x Output hypothesis h

Specific to General Constraints A set of specific values (for e.g. Short, Normal and Tall for Height Attribute) 2 A don’t care value (?) 3 No value allowed (Ø) 1 We have three constraints on our Attributes Most Specific Constraint Note that ? is next more generic constraint than specific value Note that specific value is next more generic constraint than No value allowed (Ø )

Training Phase Best Fit means In the Training Phase, the FIND-S Algorithm will Search H to find out a h, which best fits the Training Data h correctly classifies Positive and Negative instances in the Training Data Incorrect Classification Correct Classification Positive instance is classified as Positive Negative instance is classified as Positive Negative instance is classified as Negative Positive instance is classified as Negative

Training Phase,(Cont.) Initialize h to the Most Specific Hypothesis in H h = < > For each positive training instance x For each attribute constraint a i in h If the constraint a i in h is satisfied by x then do nothing else replace a i in h by the next more general constraint that is satisfied by x

Training Phase,(Cont.) x 1 = <Short, Light, Short, Yes, Yes, Half> + First Training Example Let’s see if attribute constraints in h satisfy x 1 or not? If ( = Short AND = Light AND = Short AND = Yes AND = Yes AND = Half) THEN Gender = Yes Else Gender = No

Training Phase,(Cont.) As we can see that attribute constraints in h do not satisfy x 1 Therefore , x 1 is incorrectly classified as Negative To satisfy x 1 , we will need to replace attribute constraints in h by the next more general constraint that is satisfied by x 1 h = <∅ ,∅ ,∅ ,∅ ,∅ ,∅> will become h 1 = <Short, Light, Short, Yes, Yes, Half>

Training Phase,(Cont.) x 1 = <Short, Light, Short, Yes, Yes, Half> + First Training Example Let’s see if attribute constraints in h 1 satisfy x 1 or not? If (Short = Short AND Light = Light AND Light = Short AND Yes = Yes AND Yes = Yes AND Half = Half) THEN Gender = Yes Else Gender = No As we can see that attribute constraints in h 1 satisfy x 1 Therefore , x 1 is correctly classified as Positive

Training Phase,(Cont.) x 2 = <Short, Light, Long, Yes, Yes, Half> + Second Training Example Let’s see if attribute constraints in h 1 satisfy x 2 or not? If (Short = Short AND Light = Light AND Long = Short AND Yes = Yes AND Yes = Yes AND Half = Half) THEN Gender = Yes Else Gender = No

Training Phase,(Cont.) As we can see that attribute constraints in h 1 do not satisfy x 2 Therefore , x 2 is incorrectly classified as Negative To satisfy x 2 , we will need to replace attribute constraints in h 1 by the next more general constraint that is satisfied by x 2 h 1 = <Short, Light, Short, Yes, Yes, Half> will become h 2 = <Short, Light, ?, Yes, Yes, Half>

Training Phase,(Cont.) x 2 = <Short, Light, Long, Yes, Yes, Half> + Second Training Example Let’s see if attribute constraints in h 2 satisfy x 2 or not? If (Short = Short AND Light = Light AND? = Long AND Yes = Yes AND Yes = Yes AND Half = Half) THEN Gender = Yes Else Gender = No As we can see that attribute constraints in h 1 satisfy x 2 Therefore , x 2 is correctly classified as Positive

Note Learner (FIND-S Algorithm) has observed two Training examples up till now and our hypothesis is as follows h 2 = <Short, Light, ?, Yes, Yes, Half> Let’s see if h 2 must best fit the observed Trailing Example i.e. x 1 and x 2 or not? h 2 correctly classifies x 1 as Positive x 2 as Positive To conclude, h 2 best fits the first two observed Trailing Example i.e. x 1 and x 2

Training Phase,(Cont.) x 3 = <Tall, Heavy, Long, Yes, Yes, Full> - Third Training Example Note that 3 rd Training Example is Negative and FIND-S only operates on Positive Training Examples Therefore , there will be no change in h 2 and h 3 h 2 = <Short, Light, ?, Yes, Yes, Half> will become h 3 = <Short, Light, ?, Yes, Yes, Half>

Training Phase,(Cont.) Interestingly h 3 correctly classifies x 3 as Negative If ( Short = Tall AND Light = Heavy AND ? = Long AND Yes = Yes AND Yes = Yes AND Half = Full ) THEN Gender = Yes Else Gender = No

Training Phase,(Cont.) x 3 = <Tall, Heavy, Long, Yes, Yes, Full> - Third Training Example h 3 correctly classifies x 1 as Positive Thus , h 3 best fits the three Training Examples observed up till now x 2 as Positive x 3 as Negative

Training Phase,(Cont.) x 4 = <Short, Light, Long, Yes, No, Full> + Fourth Training Example Let’s see if attribute constraints in h 3 satisfy x 4 or not? If (Short = Short AND Light = Light AND ? = Long AND Yes = Yes AND Yes = No AND Half = Full ) THEN Gender = Yes Else Gender = No

Training Phase,(Cont.) To satisfy x 4 , we will need to replace attribute constraints in h 3 by the next more general constraint that is satisfied by x 4 h 3 = <Short, Light, ?, Yes, , Half > h 4 = <Short, Light, ?, Yes, ?, ?> As we can see that attribute constraints in h 3 do not satisfy x 4 Therefore , x 4 is incorrectly classified as Negative

Training Phase,(Cont.) x 4 = <Short, Light, Long, Yes, No, Full> + Fourth Training Example Let’s see if attribute constraints in h 4 satisfy x 4 or not? If (Short = Short AND Light = Light AND ? = Long AND Yes = Yes AND ? = No AND ? = Full) THEN Gender = Yes Else Gender = No As we can see that attribute constraints in h 4 satisfy x 4 Therefore , x 4 is correctly classified as Positive

Training Phase,(Cont.) h 4 correctly classifies x 1 as Positive Thus , h 4 best fits the four Training Examples observed up till now x 2 as Positive x 3 as Negative x 4 as Positive There were total 4 Training Examples and we have observed all of them Note

Find-S Algorithm x 1 = <Short, Light, Short, Yes, Yes, Half> + h = < > x 2 = <Short, Light, Long, Yes, Yes, Half> + h 1 =<Short, Light, Short, Yes, Yes, Half> x 3 = <Tall, Heavy, Long, Yes, Yes, Full> - h 2, 3 = <Short, Light, ?, Yes, Yes, Half> x 4 = <Short, Light, Long, Yes, No, Full> + h 4 = <Short, Light, ?, Yes, ?, ?>

The h returned by FIND-S Algorithm is After observing all the Training Examples, the FIND-S Algorithm will Output hypothesis h h = <Short, Light, ?, Yes, ?, ?> h is an approximation of the Target Function f Note Training Phase,(Cont.)

Training Data Model Training Phase,(Cont.) h = <Short, Light, ?, Yes, ?, ?>

In the next phase i.e. Testing Phase, we will Model – in the form of Rules Evaluate the performance of the Model Training Phase,(Cont.) If (Height = Short AND Weight = Light AND HairLength = ? AND HeadCovered = Yes AND WearingChain = ? AND ShirtSleeves = ?) THEN Gender = Yes Else Gender = No

Testing Phase Answer Question Evaluate the performance of Model on unseen data (or Testing Data) How good Model has learned?

Evolution Measures Evaluation will be carried out using Error measure

Error Definition Formula Error is defined as the proportion of incorrectly classified Test instances Accuracy = 1 - Error Note

Evaluate Model Apply Model on Test Data Applying Model on x 5 If (Short = Short AND Light = Light AND Short = ? AND Yes = Yes AND Yes = ? AND Full = ?) THEN Gender = Yes Else Gender = No Prediction returned by Model x 5 is predicted Positive (Incorrectly Classified Instance)

Evaluate Model (Cont.) Applying Model on x 6 If ( Tall = Short AND Light = Light AND Short = ? AND No = Yes AND Yes = ? AND Full = ?) THEN Gender = Yes Else Gender = No Prediction returned by Model x 6 is predicted Negative (Correctly Classified Instance)

Evaluate Model (Cont.) Test Example Actual Predicted x 5 = < Short, Light, Short, Yes, Yes, Half> - (Male) + (Female) x 6 = <Tall, Light, Short, No, Yes, Full> - (Male) - (Male) Error = 1/2 = 0.5

Application Phase We assume that our Model Model is deployed in the Real-world and now we can make Predictions on Real-time Data Performed well on large Test Data and can be deployed in Real-world

Steps – Making Predictions on Real-time Data Take Input from User Step 1 Convert User Input into Feature Vector Step 2 Return Prediction to the User Step 4 Apply Model on the Feature Vector Step 3 Exactly same as Feature Vectors of Training and Testing Data

Example – Making Predictions on Real-time Data Take Input from User Step 1 Convert User Input into Feature Vector Step 2 Note that order of Attributes must be exactly same as that of Training and Testing Examples Enter Height (Short, Normal, Tall): Short Enter Weight (Light, Heavy): Light Enter HairLength (Short, Long): Long Is HeadCovered (Yes, No): Yes Is WearingChain (Yes, No): Yes Is ShirtSleeves (Half, Full): Half <Short, Light, Long, Yes, Yes, Half>

Example – Making Predictions on Real-time Data Return Prediction to the User Step 4 Apply Model on the Feature Vector Step 3 If (Short = Short AND Light = Light AND Long = ? AND Yes = Yes AND Yes = ? AND Half = ?) THEN Gender = Yes Else Gender = No Positive You can take Input from user, apply Model and return predictions as many times as you like 😊 Note

Take Feedback on your deployed Model from Improve your Model based on Feedback 😊 Feedback Phase Domain Experts and Users

Inductive Bias - FIND-S Algorithm Inductive Bias Inductive Bias of FIND-S Algorithm Inductive Bias Is the set of assumptions needed in addition to Training Examples to justify Deductively Learner’s Classiﬁcation Training Data is error free Target Function / Concept is present in the Hypothesis Space (H)

Strengths and Weakness - FIND-S Algorithm Strengths Weaknesses Returns a Model (h), which can be used to make predictions on unseen data Only works on error free Data However , Real-world Data is noisy Works on assumption that Target Function is present in the Hypothesis Space (H) However , we may / may not find the Target Function in the Hypothesis Space (H) and this may / may not be known Only returns one hypothesis which best fits the Training Data However , there can be multiple hypothesis , which best fits the Training Data

TODO Task Consider the Titanic Dataset with the following Attributes Gender : Male, Female Ticket Class: Upper, Middle, Lower Parent/Child Abroad: Zero, One, Two, Three Embarked : Cherbourg, Queenstown, Southampton Survival : No, Yes

TODO (Cont.) We obtained the following Sample Data Instance No. Gender Ticket class Parent/Child Abroad Embarked Survival x 1 Male Lower Zero Southampton No x 2 Female Upper Zero Cherbourg Yes x 3 Male Lower Zero Southampton No x 4 Female Lower Zero Southampton Yes x 5 Male Lower Zero Queenstown No x 6 Female Upper Zero Southampton Yes Sample Data was split into Training and Testing Data in a Train-Test Split Ratio of 67%-33%

TODO (Cont.) Training Data Instance No. Gender Ticket class Parent/Child Abroad Embarked Survival x 1 Male Lower Zero Southampton No x 2 Female Upper Zero Cherbourg Yes x 3 Male Lower Zero Southampton No x 4 Female Lower Zero Southampton Yes

TODO (Cont.) Testing Data Instance No. Gender Ticket class Parent/Child Abroad Embarked Survival x 5 Male Lower Zero Queenstown No x 6 Female Upper Zero Southampton Yes Note Consider FIND-S Algorithm when answering questions given on next slide Well Justified Your answer should be

TODO (Cont.) Questions Write down the Input and Output for the above Machine Learning Problem? How Training Example is represented ? How Hypothesis (h ) should be represented ? Calculate Size of Instance Space, Concept Space, Syntactically Distinct Hypothesis and Semantically Distinct Hypothesis? Execute the Machine Learning Cycle? Write down your observations that you observed during the execution of Machine Learning Cycle?

Your Turn Task Select a Machine Learning Problem (similar to: Titanic – Machine Learning form Disaster) and answer the questions given on next slide. Note Consider FIND-S Algorithm in answering all the questions.

Your Turn Write down the Input and Output for the selected Machine Learning Problem? Questions How Training Example is represented ? Calculate Size of Instance Space, Concept Space, Syntactically Distinct Hypothesis and Semantically Distinct Hypothesis? Execute the Machine Learning Cycle? How Hypothesis (h ) should be represented ? Write down your observations that you observed during the execution of Machine Learning Cycle?

Lecture Summary (Cont.) Therefore, in Research, we mainly refine the solution(s) proposed for a Real-world Problem The main steps of a Research Cycle are as follows Step 1 Identify the Real-world Problem Step 2 Propose Solution (called Solution 01) to solve the Real-world Problem Step 3 List down Strengths and Weaknesses of Solution 01

Lecture Summary (Cont.) Step 4 Propose Solution (called Solution 02) to Step 5 List down Strengths and Weaknesses of Solution 02 Step 6 Propose Solution (called Solution 03) to further strengthen the Strengths of Solution 01 overcome limitations of Solution 01 Strengthen the Strengths of Solution 02 overcome limitations of Solution 02

Lecture Summary (Cont.) Step 4 Continue this cycle till the Day of Judgment 😊 Considering FIND-S Algorithm Input to Learner (FIND-S Algorithm) Set of Training Examples (D) Output by Learner (FIND-S Algorithm) A Hypothesis (h) from H which best fits the Training Examples (D) Set of Functions / Hypothesis (H) Note that h is an approximation of Target Function

Lecture Summary (Cont.) Inductive Bias Is the set of assumptions needed in addition to Training Examples to justify Deductively Learner’s Classiﬁcation FIND-S Algorithm – Summary Representation of Example Attribute-Value Pair Representation of Hypothesis (h) Conjunction (AND) of Constraints on Attributes

Lecture Summary (Cont.) Training Regime Incremental Method Inductive Bias of FIND-S Algorithm Training Data is error-free Strengths Returns a Model (h), which can be used to “make predictions” on unseen data Target Function / Concept is present in the Hypothesis Space (H)

Lecture Summary (Cont.) Weaknesses Only works on error-free Data However, Real-world Data is “noisy” Works on assumption that Target Function is present in the Hypothesis Space (H) However, we may / may not find the Target Function in the Hypothesis Space (H) and this may / may not be known Only returns one hypothesis which best fits the Training Data However, there can be multiple hypothesis , which best fit the Training Data

Lecture04_Concept Learning_ FindS Algorithm.pptx

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Lecture04_Concept Learning_ FindS Algorithm.pptx

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Slide 54

Slide 55

Slide 56

Slide 57

Slide 58

Slide 59

Slide 60

Slide 61

Slide 62

Slide 63

Slide 64

Slide 65

Slide 66

Slide 67

Slide 68

Slide 69

Slide 70

Slide 71

Slide 72

Slide 73

Slide 74

Slide 75

Slide 76

Slide 77