Learning sets of rules, Sequential Learning Algorithm,FOIL
1,757 views
19 slides
Apr 24, 2020
Slide 1 of 19
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
About This Presentation
Presentation covers Learning sets of rules, Generic to specific search, Beam search,Sequential Learning Algorithm and FOIL
Size: 829.45 KB
Language: en
Added: Apr 24, 2020
Slides: 19 pages
Slide Content
Learning Sets of
Rules
SEQUENTIAL COVERING ALGORITHMS
LEARNING RULE SETS
LEARNING FIRST ORDER RULES
LEARNING SETS OF FIRST ORDER RULES
4/24/2020
1
Learning Rules
Oneofthemostexpressiveandhumanreadable
representationsforlearnedhypotheses(Target
Function)issetsofproductionrules(if-thenrules).
Rules can be derived from other representations
Decision trees
Genetic Algorithms
or they can be learned directly (direct method).
An important aspect of direct rule-learning algorithms
is that they can learn sets of first-order ruleswhich
have much more representational power than the
propositionalrules that can be derived from decision
trees.
4/24/2020
2
Sequential Covering Algorithms:
In SCA, “Family of Algorithms used to learn the rule sets
based on the strategy of learning one rule by removing
the data it covers and iterating the process “
Example: (explains why it is called sequential covering)
Assume a subroutine “Learn one rule”
Input : Set of positive and Negative examples
Expected output : Rule that covers many +veand few –ve
examples
Requirement : Output rule with high accuracy and low
coverage
Approach :
Learn a rule and remove a positive example covered by rule
Invoke subroutines again to learn second rule for another
positive example
Iterate it as many times as required to learn rules as “disjunctive
set of rules”
4/24/2020
4
Learning Propositional Rules:
Sequential Covering Algorithms
The algorithm is called a sequential covering algorithmbecause it
sequentially learns a set of rules that together cover the whole set of
positive examples.
It has the advantage of reducing the problem of learning a
disjunctive set of rules to a sequence of simpler problems, each
requiring that a single conjunctive rule be learned.
The final set of rules is sorted so that the most accurate rules are
considered first at classification time.
However, because it does not backtrack, this algorithm is not
guaranteed to find the smallest or best set of rules ---> Learn-one-
rule must be very effective!
4/24/2020
6
Learning Propositional Rules: Learn-one-rule
General-to-Specific Search:
Consider the most general rule (hypothesis) which matches
every instances in the training set.
Repeat
Add the attribute that most improves rule performance measured over the training set.
Until the hypothesis reaches an acceptable level of
performance.
The difference between this algorithm and ID3 is that it
follows a single descendant at each step
It has no backtracking and hence chances of making
suboptimal choice.
Solution is making use of beam search rather than single
best candidate
4/24/2020
7
General-to-Specific Beam Search
(CN2):
Rather than considering a single candidate at each
search step, keep track of the kbest candidates.
It keep track of most promising alternative to the
current top rated hypotheses
Learn-one-rule (Target attribute, Attributes, Examples,
k)
It returns a single rule that covers some of the examples
Conducts general to specific greedy beam search for the
best rule guided by performance metric
8
Implementation of Learn one rule using
generic to specific beam search
Learn-one-rule (Target attribute, Attributes, Examples, k)
1.Best-Hypothesis most general hypothesis {ø}
2.Candidate-Hypotheses {Set of best hypothesis}
3.While Candidate-Hypothesis is not empty, Do
a.Generate the next most specific Candidate Hypothesis
All-Constraints set of all constraints of the form (a=v)
New Candidate-Hypothesis Generated for each h in Candidate-
Hypothesis and for each c in all constraints
Remove from New Candidate-Hypotheses duplicate, inconsistent or
not maximally specific hypothesis
b.Update Best Hypothesis
For all h in New Candidate-Hypotheses do
IF (Performance(h, Examples, Target-attribute) > Performance(Best-
Hypotheses, Examples, Target-attribute))
Then Best Hypothesis h
9
10
c. Update Candidate Hypotheses
Candidate –Hypotheses the k best members of New
Candidate-Hypotheses, according to the performance measure
Return a rule of the form
“IF Best-Hypothesis THEN prediction”
Where prediction is the most frequent value of Target
attribute among those examples that match Best-Hypothesis
Performance (h, Examples, Target-attribute)
h-examples the subset of examples that match h
return –Entropy (h-examples), where entropy is w.r.t.
Target attribute
Variation:
1.Learning only the rules covered by positive examples
(negation as failure)
2.Using single positive example and covering attributes that
cover the example
Comments and Variations regarding the
Basic Rule Learning Algorithms
Sequential versus Simultaneous covering: sequential covering
algorithms (CN2) make a larger number of independent choices than
simultaneous covering ones (ID3). Choice depends on the number of
dataset
Direction of the search:CN2 uses a general-to-specific search
strategy. Other systems (GOLEM) uses a specific to general search
strategy. General-to-specific search has the advantage of having a
single hypothesis from which to start.
Generate-then-test versus example-driven:CN2(Clark and Niblett-
Learn one rule) is a generate-then-test method(impact of noise is
minimized).
Other methods (Find S, CandElimi.) are example-driven. Generate-then-
test systems are more robust to noise(easily misled by single noisy data).
4/24/2020
11
Learning Sets of First-Order
Rules: FOIL (Quinlan, 1990)
FOIL is similar to the Propositional Rule
learning approach except for the following:
FOIL accommodates first-orderrules and thus needs to
accommodate variablesin the rule pre-conditions.
FOIL uses a special performance measure(FOIL-GAIN) which takes
into account the different variable bindings.
FOILS seeks only rules that predict when the target literal is True
(instead of predicting when it is True or when it is False).
FOIL performs a simple hillclimbingsearch rather than a beam
search.
4/24/2020
13
Induction as Inverted Deduction
Let Dbe a set of training examples, each of the form <x
i,f(x
i)>.
Then, learning is the problem of discovering a hypothesis h, such
that the classification f(x
i)of each training instance x
ifollows
deductively from the hypothesis h, the description of x
iand any
other background knowledge Bknown to the system.
Example:
x
i: Male(Bob), Female(Sharon), Father(Sharon, Bob)
f(x
i): Child(Bob, Sharon)
B: Parent(u,v) <--Father(v,u)
we want to findh such that(B^h^xi) |--f(xi).
h1: Child(u,v) <--Father(v,u)
h2: Child(u,v) <--Parents(v,u)
14
Basic Definitions from First Order Logic
Constants(capitalized Symbols)
Ex : Bob, Sharon
Variables(Lower case symbols)
Ex : x , y, z
Predicate symbols (Capitalized Symbols)
Female, Male, Married
Function Symbols(Lower case symbols)
age
Term: Any Constant, Variable or “function applied to any term”
Example : Mary, x, Age(Mary), Age(x)
Literal: Any Predicate or its negation applied to any term
Example : Married(Bob) positive literal
Greater_Than(age (Sharon),20) negative literal
Ground Literal : Literal without any variable
Clause : Disjunction of literals
16
Basic Definitions from First Order Logic
Horn Clause : Clause with at most one positive literal
H L1 L2 . . . . . . Ln
H (L1лL2 лL3 л. . . . . Ln) IF (L1лL2 лL3 л. . . . . Ln)
THEN H
In a Horn Clause :
HHead or Consequent of Horn clause.
(L1лL2 лL3 л. . . . . Ln) Body or antecedents of Horn Clause
Well formed expression composed of :
1.Constants 3. Variables
2.Functions 4. Predicates
Example : Female(Joe)
Substitution: A function that replaces variables by terms
Example: {x/3, y/z} replaces x by 3 and y by z
In general, Substitution is θand literal is L (Lθ) represents
substitution
A unifying Substitution : For any two literals L1 and L2,
substitution θis such that L1θ=L2θ
17
Learning Sets of First Order Rules (FOIL)
FOIL (Target_Predicate, Predicates, Examples)
Pos examples with Target_Predicate True
Neg examples with Target_Predicate False
Learned Rules { }
While Pos, do ; Outer loop
Learn a new rule
NewRule the rule that predicts Target_Predicate with
no preconditions
NewRuleNeg Neg
While NewRuleNeg, do ; inner loop
Add a new literal to specialize NewRule
Candidate_literals generate candidate new literals for
NewRule, based on Predicates
Best_Literal argmax Foil_Gain (L, NewRule)
L ЄCandidate_literals
18
Add Best_literal to preconditions of NewRule
NewRuleNeg subset of NewRuleNeg that satisfies
NewRule preconditions
Leraned_rules Learned_rules + NewRule
PosPos –{Members of Pos covered by NewRule}
Return Learned_rules
Comments :
Each iteration of outer loop adds new rule to
Learned_rules by performimg
specific to general hypothesis
Inner loop performs finer grained search to determine
the exact definition of each rule by performing general
to specific hill climbing search
19