FDP ON MACHINE LEARNING Dr.M.Bindhu Associate Professor/ECE Saveetha Engineering College
OUTLINE Machine Learning (AI) –Learning Rules Sequential Covering Algorithm Learn One Rule Why Learn First Order Rules? First Order Logic: Terminology The FOIL Algorithm Why Combine Inductive And Analytical Learning? Kbann : Prior Knowledge To Initialize The Hypothesis Tangentprop , EBNN: Prior Knowledge Alters Search Objective FOCL Algorithm : Prior Knowledge Alters Search Operators
Machine Learning (AI) A key aspect of intelligence is the ability to learn knowledge over time. Vast majority of machine learning algorithm have been developed to learn knowledge that is inherently propositional, where the domain of interest is encoded in terms of a fixed set of variables. Subfield of symbolic artificial intelligence which uses logic programming as a uniform representation for examples, background knowledge and hypotheses
https://youtu.be/ad79nYk2keg
We can learn sets of rules and then converting the tree to rules. We can also use a genetic algorithm that encodes the rules as bit strings. But, these only work with predicate rules (no variables). They also consid er the set of rules as a whole, not one rule at a time. LEARNING RULES
SEQUENTIAL COVERING ALGORITHM 1. .Learn one rule with high accuracy, any coverage 2. Remove positive examples covered by this rule 3. Repeat Example: Wind(d1)= Strong, Humdity (d1 )= Low, Outlook(d1 )= Sunny, PlayTennis (d1 )= No Wind(d2)= Weak, Humdity (d2 )= Med, Outlook(d2 )= Sunny, PlayTennis (d2 )=Yes Wind(d3 )=Med, Humdity (d3 )=Med, Outlook(d3 )=Rain, PlayTennis (d3 )=No Target_attribute is the one wish to learn. e.g . PlayTennis (x) Attributes is the set of all possible attributes. e.g . Wind(x), Humidity(x), Outlook(x) threshold is the desired performance .
Drawback Of Sequential Covering Algorithm We require Learn-One-Rule to have high (perfect?) accuracy but not necessarily high coverage ( i.e., when it makes a prediction it should be true ). Since it performs a greedy search it is not guaranteed to find the best or smallest set of rules that cover the training examples .
Why Learn First Order Rules? Propositional logic allows the expression of individual propositions and their truth-functional combination. 1.Propositions like Tom is a man or All men are mortal may be represented by single proposition letters such as P or Q 2. Truth functional combinations are built up using connectives, such as ∧, ∨, ¬, → Example: P∧Q Inference rules defined over propositional forms,P → Q P is Tom is a man and Q is All men are mortal, Inference that Tom is mortal does not follow in propositional logic
Why Learn First Order Rules?.. Contd First order logic allows the expression of propositions and their truth functional combination, but it also allows us to represent propositions as assertions of predicates about individuals or sets of individuals. Example: Propositions like Tom is a man or All men are mortal may be represented by predicate-argument representations such as man(tom) or ∀x(man(x) → mortal(x)) (so, variables range over individuals) . Inference rules permit conclusions to be drawn about sets/individuals – e.g. mortal(tom)
Why Learn First Order Rules?.. Contd
Day Outlook Temp Humid Wind PlayTennis D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Low Weak Yes D6 Rain Cool Low Strong No D7 Overcast Cool Low Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Low Weak Yes D10 Rain Mild Low Weak Yes D11 Sunny Mild Low Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Low Weak Yes D14 Rain Mild High Strong No
Learn-One-Rule P os positive Examples N eg negative Examples while P os , Learn a N ewRule { N ewRule most general rule possible N ewRule N eg N eg { while N ewRuleN eg, do Add a new literal to specialize N ewRule 1. Candidate literals generate candidates 2.Best literal argmaxL2Candidate literals P erf ormance ( SpecializeRule (N ewRule ; L)) 3. add Best literal to N ewRule preconditions 4. N ewRule N eg subset of N ewRuleN eg that satises N ewRule preconditions { Learned rules Learned rules + N ewRule { P os P os fmembers of P os covered by N ewRuleg Return Learned rule
Autonomous Car Technology Laser Terrain Mapping S t anle y Learning from Human Drivers Sebastian Adaptive Vision P a th P l ann i n g Images and movies taken from Sebastian Thrun ’ s multimedia w 1 e 4 bsite.
Idea: organize the hypothesis space search in general to specific fashion. Start with most general rule precondition, then greedily add the attribute that most improves performance measured over the training examples . Can be generalized to multi-valued target functions. There are other ways to define Performance(), besides using entropy . Learn One Summary
First-Order Logic Definitions Every expression is composed of constants , variables , predicates , and functions . A term is any constant, or variable, or any function applied to any term. A literal is any predicate (or its negation) applied to any set of terms.Female (Mary), ¬Female(x) A ground literal is a literal that does not contain any variables. A clause is any disjunction of literals whose variables are universally quantified.∀ x : Female(x) ∨ Male(x) A Horn clause is an expression of the form H ← L1 ∧ L2 ∧ ... ∧ Ln
First Order Logic: Terminology – constants – e.g. bob, 23, a – variables – e.g. X,Y,Z – predicate symbols – e.g. female, father – predicates take on the values True or False only – function symbols – e.g. Age – functions can take on any constant as a value – connectives – e.g. ∧, ∨, ¬, → (or ←) – quantifiers – e.g. ∀, ∃
A term is – any constant – e.g. bob – any variable – e.g X – any function applied to any term – e.g. age(bob) A literal is any predicate or negated predicate applied to any terms – e.g. f emale (sue), ¬f ather (X,Y) – A ground literal is a literal that contains no variables – e.g. f emale (sue) – A positive liter al is a literal that does not contain a negated predicate – e.g. f emale (sue) – A negative literal is a literal that contains a negated predicate – e.g ¬f ather (X,Y)
Learning First-Order Horn Clauses Say we are trying to learn the concept Daughter( x,y ) from examples. Each person is described by the attributes: Name, Mother, Father, Male, Female. Each example is a pair of instances, say a and b : Name(a ) = Sharon, Mother(a) = Louise, Father(a) = Bob, Male(a) = False, Female(a) = True Name(b) = Bob, Mother(b) = Nora, Father(b) = Victor, Male(b) = True, Female(b) = False, Daughter( a,b ) = True If we give a bunch of these examples to CN2 or C4.5 they will output a set of rules like: IF Father(a) = Bob ∧ Name(b) = Bob ∧ Female(a) THEN Daughter( a,b ) A first-order learner would output more general rules like IF Father(x) = y ∧ Female(x) THEN Daughter( x,y )
FOIL ALGORITHM FOIL extends the SEQUENTIAL-COVERING and LEARN-ONE-RULE algorithms for propositional rule learning to first order rule learning • FOIL learns in two phases: an outer loop which acquires a disjunction of Horn clause-like rules which together cover the positive examples an inner loop which constructs individual rules by progressive specialisation of a rule through adding new literals selected until no negative examples are covered.
FOIL(T arget predicate; P redicates ; Examples) P os positive Examples N eg negative Examples while P os , do Learn a N ewRule N ewRule most general rule possible N ewRuleN eg N eg while N ewRuleN eg , do Add a new literal to specialize N ewRule 1. Candidate literals generate candidates 2. Best literal argmaxL2Candidate literals F oil Gain(L; N ewRule ) 3. add Best literal to N ewRule preconditions 4. N ewRuleN eg subset of N ewRuleN eg that satisfiesN ewRule preconditions Learned rules Learned rules + N ewRule P os P os -members of P os covered by N ewRuleg Return Learned rule
Inductive and Analytical Learning INDUCTIVE LEARNING ANALYTICAL LEARNING Inductive logic programming is particularly useful in bioinformatics and natural language processing. Hypothesis fits data Hypothesis fits domain theory Statistical inference Deductive inference Requires little prior Learns from scarce data Syntactic inductive bias Bias is domain theory Plentiful data No prior knowledge Scarce data Perfect prior knowledge
26 KBANN Knowledge Based Artificial Neural Networks KBANN (data D , domain theory B ) 1. Create a feed forward network h equivalent to B 2. Use BACKPROP to tune h to fit D
CS 5751 Machine Learning Chapter 12 Comb. Inductive/Analytical 27 Neural Net Equivalent to Domain Theory Expensive BottomIsFlat MadeOfCeramic MadeOfStyrofoam MadeOfPaper HasHandle HandleOnTop HandleOnSide Light HasConcavity ConcavityPointsUp Fragile Stable Liftable OpenVessel Cup Graspable large positive weight large negative weight negligible weight
28 Creating Network Equivalent to Domain Theory Create one unit per horn clause rule (an AND unit) Connect unit inputs to corresponding clause antecedents For each non-negated antecedent, corresponding input weight w W , where W is some constant For each negated antecedent, weight w - W Threshold weight w -(n - .5) W, where n is number of non-negated antecedents Finally, add additional connections with near-zero weights Liftable Graspable, ¬Heavy
29 Result of Refining the Network Expensive BottomIsFlat MadeOfCeramic MadeOfStyrofoam MadeOfPaper HasHandle HandleOnTop HandleOnSide Light HasConcavity ConcavityPointsUp Fragile Stable Liftable OpenVessel Cup Graspable large positive weight large negative weight negligible weight
30 Hypothesis Space Search in KBANN Hypotheses that fit training data equally well Initial hypothesis for KBANN Initial hypothesis for Backpropagation Hypothesis Space
31 EBNN Explanation Based Neural Network Key idea: Previously learned approximate domain theory Domain theory represented by collection of neural networks Learn target function as another neural network
32 Expensive BottomIsFlat MadeOfCeramic MadeOfStyrofoam MadeOfPaper HasHandle HandleOnTop HandleOnSide Light HasConcavity ConcavityPointsUp Fragile Stable Explanation in Terms of Domain Theory Prior learned networks for useful concepts combined into a single target network Graspable Stable OpenVessel Liftable Cup
33 Hypothesis Space Search in TangentProp Hypotheses that maximize fit to data TangetProp Search Backpropagation Search Hypothesis Space Hypotheses that maximize fit to data and prior knowledge
34 FOCL algorithm ( First Order Combined Learner ) Adaptation of FOIL that uses domain theory When adding a literal to a rule, not only consider adding single terms, but also think about adding terms from domain theory Most importantly a potentially incorrect hypothesis is allowed as an initial approximation to the predicate to be learned. The main goal of FOCL is to incorporate the methods of explanation-based learning (EBL)
35 Search in FOCL Cup HasHandle [2+,3-] Cup Cup ¬HasHandle [2+,3-] Cup Fragile [2+,4-] Cup BottomIsFlat , Light, HasConcavity , ConcavityPointsUp [4+,2-] ... Cup BottomIsFlat, Light, HasConcavity, ConcavityPointsUp, HandleOnTop [0+,2-] Cup BottomIsFlat, Light, HasConcavity, ConcavityPointsUp, ¬ HandleOnTop [4+,0-] Cup BottomIsFlat, Light, HasConcavity, ConcavityPointsUp, HandleOnSide [2+,0-] ...
36 FOCL Results Recognizing legal chess endgame positions : 30 positive, 30 negative examples FOIL: 86% FOCL: 94% (using domain theory with 76% accuracy) NYNEX telephone network diagnosis 500 training examples FOIL: 90% FOCL: 98% (using domain theory with 95% accuracy)
VIDEO LINK https://www.youtube.com/watch?v=rVlN0ZCYmtI https://www.youtube.com/watch?v=SSrD02pdE78
RECAP Sequential Covering Algorithm Learn One Rule The FOIL Algorithm FOCL Algorithm