Machine Learning.pdf

nikola_tesla1 51 views 126 slides Aug 23, 2022
Slide 1
Slide 1 of 126
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96
Slide 97
97
Slide 98
98
Slide 99
99
Slide 100
100
Slide 101
101
Slide 102
102
Slide 103
103
Slide 104
104
Slide 105
105
Slide 106
106
Slide 107
107
Slide 108
108
Slide 109
109
Slide 110
110
Slide 111
111
Slide 112
112
Slide 113
113
Slide 114
114
Slide 115
115
Slide 116
116
Slide 117
117
Slide 118
118
Slide 119
119
Slide 120
120
Slide 121
121
Slide 122
122
Slide 123
123
Slide 124
124
Slide 125
125
Slide 126
126

About This Presentation

ML


Slide Content

Chittaranjan Hota
Professor, Computer Science & Information Systems Department
BITS Pilani Hyderabad Campus, Hyderabad
[email protected]
Artificial Intelligence
(CS F407)
Machine Learning: Introduction

Some tasks…
‘cocktail party problem’
ContentPromotion,ContentPrice
Modeling,AdaptiveStreamingQoE…

What is Learning?
•“Learningdenoteschangesinasystemthat
enablesasystemtodothesametaskmore
efficientlythenexttime.”–HerbertSimon
(WritingyourAIMidsemandthenCompre…)
•Learningistheprocessofacquiringnewor
modifyingexistingknowledge,behaviors,skills,
values,orpreferences-Wiki
•Humanslearnfromexperiences….

What is Machine Learning?
System
… …1
x 2x Nx 1y 2y M
y 12
, ,...,
K
h h h
Input Variables
Hidden Variables:
Output Variables
data
program
output
data
output
program

Defining the Learning Task
Improveontask,T,withrespecttoperformancemetric,P,
basedonexperience,E(TomMitchell,1997).
T:Playingchess
P:Percentageofgameswonagainstanarbitraryopponent
E:Playingpracticegamesagainstwhom?
T:Recognizinghand-writtenwords
P:Percentageofwordscorrectlyclassified
E:Databaseofhuman-labeledimagesofhandwrittenwords
T:Categorizeemailmessagesasspamorlegitimate.
P:Percentageofemailmessagescorrectlyclassified.
E:Databaseofemails,somewithhuman-givenlabels

Why Machine Learning is important?
•Machinelearningisprogrammingcomputerstooptimizea
performancecriterionusingexampledata.
•Thereisnoneedto“learn”tocalculatepayroll
•Learningisusedwhen:
–Humanexpertisedoesnotexist
•x:Bondgraphforanewmolecule,f(x):Predictedbinding
strengthtoAIDSproteasemolecule.
•NavigatingonMars
–Humansareunabletoexplaintheirexpertise
•x:Bitmappictureofhand-writtencharacter,f(x):Asciicodeof
thecharacter
•Speechrecognition

Continued…
–Solutionchangesintime
•x:Descriptionofstockpricesandtradesforlast20days,
f(x):Recommendedstocktransactions
•Routing on a computer network
–Solutionneedstobeadaptedtoparticularcases
•x:Incomingemailmessage,f(x):Importancescorefor
presentingtouser(ordeletingwithoutpresenting)
•Userbiometrics
–Relationshipandcorrelationscanbehiddenwithinlarge
amountsofdata
–Newknowledgeisbeingconstantlydiscoveredbyhumans

Applications of Machine Learning
•natural language processing
•search engines
•medical diagnosis
•detecting credit card fraud
•stock market analysis
•classifying DNA sequences
•speech and handwriting recog.
•Spam filtering
•Financial Investments (Real
estate or Postal deposits)
•gameplaying
•adaptivewebsites
•robotlocomotion
•structuralhealthmonitoring
•sentimentanalysis
Apache Singa
(Frameworks/Tools)

Few more examples and algos…
Rs. 5 lakhs
Rs. 80 lakhs
? lakhs
Example of Linear Regression…
Gradient descent

Few more examples and algos…
Boss
Cheap
Bulk
Dollar
Missing title etc…
Naïve Bayes
Identifying spam mails

Few more examples and algos…
Gender Age App
F 17 Instagram
F 28 WhatsApp
M 35 FB messenger
F 44 WhatsApp
M 13 Instagram
M 16 Instagram
Recommending Apps to
different users …
Decision
Tree.

Few more examples and algos…
Newsp
aper
distribu
tor.
K-
means
clusteri
ng

University admissions
Logisticregression,SVM,ANNsetc…

Issues in Machine Learning
•Whatalgorithmsareavailableforlearninga
concept?Howwelldotheyperform?
•Howmuchtrainingdataissufficienttolearna
conceptwithhighconfidence?
•Aresometrainingexamplesmoreusefulthan
others?
•Whatarethebesttasksforasystemtolearn?
•Whatisthebestwayforasystemtorepresentit’s
knowledge?

ANNs: Nerve cell or neuron
•Ahealthyhumanbrainhasaround
100billionneurons
•Aneuronmayconnecttoasmany
as100,000otherneurons
Nervous
System
Computational
Abstraction
Neuron Node
Dendrites Input link and
propagation
Cell Body Combination function,
threshold, activation
function
Axon Output link
Synaptic
strength
Connection
strength/weight
Local,
parallel
computat
ion
Acknowledgement: Few slides are adopted from R J Mooney’s class slides

Simple Threshold Linear Unit

Simple Neuron Model
1

Simple Neuron Model
1
1
1
1

Simple Neuron Model
1
1
1
1
1

Simple Neuron Model
1
0
1
1

Simple Neuron Model
1
0
1
1
0

Activation Functions
Step
t(x)= 1 if x ≥t, else 0 threshold=t
Sign(x)= +1 if x ≥ 0, else –1
Sigmoid(x) =1/(1+e
-x
)2
ax
e)(

xf

Artificial neurons
McCulloch & Pitts (1943) described an artificial neuron
inputs are either electrical impulse (1) or not (0)
(note: original version used +1 for excitatory and –1 for
inhibitory signals)
each input has a weight associated with it
the activation function multiplies each input value by its weight
if the sum of the weighted inputs >= ,
then the neuron fires (returns 1), else doesn't fire (returns 0)
if w
ix
i>= , output = 1
if w
ix
i< , output = 0
x
1
x
n
x
2
. . .
w
1
w
2
w
n

Computation via activation function
INPUT: x
1= 1, x
2= 1
.75*1 + .75*1 = 1.5 >= 1OUTPUT: 1
INPUT: x
1= 1, x
2= 0
.75*1 + .75*0 = .75 < 1OUTPUT: 0
INPUT: x
1= 0, x
2= 1
.75*0 + .75*1 = .75 < 1OUTPUT: 0
INPUT: x
1= 0, x
2= 0
.75*0 + .75*0 = 0 < 1 OUTPUT: 0
x
1
x
2
.75 .75
this neuron computes the AND function

Normalizing thresholds
tomakelifemoreuniform,cannormalizethe
thresholdto0
simplyaddanadditionalinputx
0=1,w
0=-
x
1
x
n
x
2
. . .
w
1
w
2
w
n
advantage: threshold = 0 for all neurons
w
ix
i>=   -*1 +w
ix
i>= 0 
x
1
x
n
x
2
. . .
w
1
w
2
w
n
1


Normalized examples
INPUT: x
1= 1, x
2= 1
1*-1 + .75*1 + .75*1 = .5 >= 0OUTPUT: 1
INPUT: x
1= 1, x
2= 0
1*-1 +.75*1 + .75*0 = -.25 < 1OUTPUT: 0
INPUT: x
1= 0, x
2= 1
1*-1 +.75*0 + .75*1 = -.25 < 1OUTPUT: 0
INPUT: x
1= 0, x
2= 0
1*-1 +.75*0 + .75*0 = -1 < 1 OUTPUT: 0
x
1
x
2
.75
.75-1
1
AND
INPUT: x
1= 1, x
2= 1
1*-.5 + .75*1 + .75*1 = 1 >= 0OUTPUT: 1
INPUT: x
1= 1, x
2= 0
1*-.5 +.75*1 + .75*0 = .25 > 1OUTPUT: 1
INPUT: x
1= 0, x
2= 1
1*-.5 +.75*0 + .75*1 = .25 < 1OUTPUT: 1
INPUT: x
1= 0, x
2= 0
1*-.5 +.75*0 + .75*0 = -.5 < 1 OUTPUT: 0
x
1
x
2
.75
.75-.5
1
OR

Perceptron
Perceptronlearning algorithm:
1.Set the weights on the connections with random values.
2.Iterate through the training set, comparing the output of the network with the
desired output for each example.
3.If all the examples were handled correctly, then DONE.
4.Otherwise, update the weights for each incorrect example:
•if should have fired on x
1, …,x
nbut didn't, w
i+=x
i (0 <= i <= n)
•if shouldn't have fired on x
1, …,x
nbut did, w
i-=x
i (0 <= i <= n)
5.GO TO 2
•Rosenblatt, 1962
•2-Layer network.
•Threshold activation function at output
–+1 if weighted input is above threshold.
–-1 if below threshold.

Learning Process for Perceptron
●Initiallyassignrandomweightstoinputsbetween-0.5and
+0.5
●Trainingdataispresentedtoperceptronanditsoutputis
observed.
●Ifoutputisincorrect,theweightsareadjustedaccordinglyusing
followingformula.
wiwi+(a.xi.e),where‘e’iserrorproduced
and‘a’(-1a1)islearningrate
−‘a’isdefinedas0ifoutputiscorrect,itis+ve,ifoutputistoolowand–
ve,ifoutputistoohigh.

Example: Perceptron to learn OR function
●Initiallyconsiderw1=-0.2andw2=0.4
●Trainingdatasay,x1=0andx2=0,outputis0.
●Computey=Step(w1*x1+w2*x2)=0.Outputiscorrectso
weightsarenotchanged.
●Fortrainingdatax1=0andx2=1,outputis1
●Computey=Step(w1*x1+w2*x2)=0.4=1.Outputiscorrect
soweightsarenotchanged.
●Nexttrainingdatax1=1andx2=0andoutputis1
●Computey=Step(w1*x1+w2*x2)=-0.2=0.Outputis
incorrect,henceweightsaretobechanged.
●Assumea=0.2anderrore=1
wi=wi+(a*xi*e)givesw1=0andw2=0.4
●Withtheseweights,testtheremainingtestdata.
●Repeattheprocesstillwegetstableresult.

Example: perceptron learning
Suppose we want to train a perceptron to compute AND
training set:x
1= 1, x
2= 1 1
x
1= 1, x
2= 0 0
x
1= 0, x
2= 1 0
x
1= 0, x
2= 0 0
1 x
2
x
1
-0.9
0.6
0.2
randomly, let:w
0= -0.9, w
1= 0.6, w
2= 0.2
using these weights:
x
1= 1, x
2= 1:-0.9*1 + 0.6*1 + 0.2*1 = -0.1 0 WRONG
x
1= 1, x
2= 0:-0.9*1 + 0.6*1 + 0.2*0 = -0.3 0 OK
x
1= 0, x
2= 1:-0.9*1 + 0.6*0 + 0.2*1 = -0.7 0 OK
x
1= 0, x
2= 0:-0.9*1 + 0.6*0 + 0.2*0= -0.9 0 OK
new weights:w
0= -0.9 + 1= 0.1
w
1= 0.6 + 1= 1.6
w
2= 0.2 + 1= 1.2

Example: perceptron learning (cont.)
1 x
2
x
1
0.1
1.6
1.2
using these updated weights:
x
1= 1, x
2= 1: 0.1*1 + 1.6*1 + 1.2*1 = 2.9 1 OK
x
1= 1, x
2= 0: 0.1*1 + 1.6*1 + 1.2*0 = 1.7 1 WRONG
x
1= 0, x
2= 1: 0.1*1 + 1.6*0 + 1.2*1 = 1.3 1 WRONG
x
1= 0, x
2= 0: 0.1*1 + 1.6*0 + 1.2*0 = 0.1 1 WRONG
new weights: w
0= 0.1 -1 -1 -1= -2.9
w
1= 1.6 -1 -0 -0= 0.6
w
2= 1.2 -0 -1 -0= 0.2
1 x
2
x
1
-2.9
0.6
0.2
using these updated weights:
x
1= 1, x
2= 1:-2.9*1 + 0.6*1 + 0.2*1 = -2.1 0 WRONG
x
1= 1, x
2= 0:-2.9*1 + 0.6*1 + 0.2*0 = -2.3 0 OK
x
1= 0, x
2= 1:-2.9*1 + 0.6*0 + 0.2*1 = -2.7 0 OK
x
1= 0, x
2= 0:-2.9*1 + 0.6*0 + 0.2*0= -2.9 0 OK
new weights: w
0= -2.9 + 1= -1.9
w
1= 0.6 + 1= 1.6
w
2= 0.2 + 1= 1.2

Example: perceptron learning (cont.)
1 x
2
x
1
-1.9
1.6
1.2
using these updated weights:
x
1= 1, x
2= 1:-1.9*1 + 1.6*1 + 1.2*1 = 0.9 1 OK
x
1= 1, x
2= 0:-1.9*1 + 1.6*1 + 1.2*0 = -0.3 0 OK
x
1= 0, x
2= 1:-1.9*1 + 1.6*0 + 1.2*1 = -0.7 0 OK
x
1= 0, x
2= 0:-1.9*1 + 1.6*0 + 1.2*0= -1.9 0 OK
DONE!

Convergence in Perceptrons
key reason for interest in perceptrons:
Perceptron Convergence Theorem
The perceptron learning algorithm will always find weights to
classify the inputs if such a set of weights exist.
Minsky & Papert showed weights exist if and only if the problem is linearly separable
intuition: consider the case with 2 inputs, x
1and x
2x
1
x
2
accepting examples
non-accepting examples
ifyoucandrawalineandseparatethe
accepting&non-acceptingexamples,
thenlinearlyseparable

Linearly separable
firing depends onw
0+ w
1x
1+ w
2x
2 >= 0
border case is when w
0+ w
1x
1+ w
2x
2 = 0
i.e.,x
2= (-w
1/w
2) x
1+ (-w
0/w
2) the equation of a line
the training algorithm simply shifts the line around (by changing the
weight) until the classes are separatedx
1
x
2
0
1
1
1
AND function
x
1
x
2
0 1
1 11
OR function
0
0
1
why does this make sense?

Perceptron as Hill Climbing
Thehypothesisspacebeingsearchedisasetofweightsanda
threshold.
Objectiveistominimizeclassificationerroronthetrainingset.
Perceptroneffectivelydoeshill-climbing(gradientdescent)in
thisspace,changingtheweightsasmallamountateachpoint
todecreasetrainingseterror.
weights0
training
error

Inadequacy of perceptron
•inadequacyofperceptrons
isduetothefactthatmany
simpleproblemsarenot
linearlyseparablex
1
x
2
0 1
11
XOR function
0
however, can compute XOR
by introducing a new, hidden
unit0.1
x
2
1.5
x
1
1.5
-3.5
1.5
1 1

Recap: Perceptron learning
•Perceptronisanalgorithmforlearningabinary
classifier.
Anexample:Aperceptron
updatingitslinearboundary
asmoretrainingexamples
areadded.(Source:Wiki)

Recap:Buildingmulti-layernetsusing
Perceptrons
•smallerexample:cancombineperceptronstoperformmore
complexcomputations(orclassifications)0.1
x
2
1.5
x
1
1.5
-3.5
1.5
1 1
1
.75
.75
3-layer neural net
2 input nodes
1 hidden node
2 output nodes
RESULT?
HINT: left output node is AND
right output node is XOR
HALF ADDER
The introduction of hidden layer(s) makes it possible for the network to exhibit non-linear
behavior.

Hidden units
the addition of hidden units allows the network to develop complex
feature detectors(i.e., internal representations)
e.g., Optical Character Recognition (OCR)
perhaps one hidden unit
"looks for" a horizontal bar
another hidden unit
"looks for" a diagonal
another looks for the vertical base
the combination of specific
hidden units indicates a 7

Hidden units & learning
every classification problem has a perceptron solution if enough
hidden layers are used.
i.e., multi-layer networks can compute anything
(recall: can simulate AND, OR, NOT gates)
expressiveness is not the problem –learning is!
it is not known how to systematically find solutions
the Perceptron Learning Algorithm can't adjust weights between levels
Minsky&Papert'sresultsaboutthe"inadequacy"ofperceptronsprettymuchkilledneural
netresearchinthe1970's
rebirthinthe1980'sduetoseveraldevelopments
faster,moreparallelcomputers
newlearningalgorithms e.g.,backpropagation
newarchitectures e.g.,Hopfieldnets

Backpropagation nets
perceptronsutilizeastepwise
activationfunction
output=1ifsum>=0
0ifsum<0
backpropagationnetsutilizea
continuousactivationfunction
output=1/(1+e
-sum
)
backpropagation nets are multi-layer networks
normalize inputs between 0 (inhibit) and 1 (excite)
utilize a continuous activation function

Multi layer feed-forward NN (FFNN)
●Hiddennodesareused
forcomplexfeature
detectors.kO jkw
Output nodes
Input nodes
Hidden nodes
Output Class
x
i
w
ij
Network is fully connectedjO

What do middle layers hide?
•Ahiddenlayer“hides”itsdesiredoutput.
Neuronsinthehiddenlayercannotbe
observedthroughtheinput/outputbehaviour
ofthenetwork.Thereisnoobviouswayto
knowwhatthedesiredoutputofthehidden
layershouldbe.

Multiple hidden layers & their neuronsInput
layer
First
hidden
layer
Second
hidden
layer
Output
layer
O u t p u t S
i g n a l s
I n p u t S
i g n a l s
•How many hidden unitsin the layer?
Too few==>can’t learn
Too many ==>poor generalization
•How many hidden layers?
Usually just one (i.e., a 2-layer net)
Underfittingoccurswhenamodelistoo
simple–informedbytoofewfeaturesor
regularizedtoomuch–whichmakesit
inflexibleinlearningfromthedataset.
Overfittingisunabletogeneralize.
Solu: Cross-validationRemove features
Early stoppingRegularization(pruningaDT,
dropoutonaneuralnetwork)
Ensembling

BPN Algorithm
•Initialize weights (typically random!)
•Keep doing epochs
•For eachexample ‘e’ in the training set do
•forward passto compute
•O = neural-net-output (network, e)
•miss = (T-O) at each output unit
•backward passto calculate deltasto weights
•update all weights
•end
•until tuning set errorstops improving

Gradient Descent
Try to minimize your position on the “error surface.

Error Backpropagation
First calculate error of output units and use this to
change the top layer of weights.
output
hidden
input
Current output: o
j=0.2
Correct output: t
j=1.0
Error δ
j= o
j(1–o
j)(t
j–o
j)
0.2(1–0.2)(1–0.2)=0.128
Update weights into jijji ow

Error Backpropagation
Next calculate error for hidden units based on
errors on the output units it feeds into.
output
hidden
input
k
kjkjjj woo  )1(

Error Backpropagation
Finally update bottom layer of weights based on
errors calculated for hidden units.
output
hidden
input
k
kjkjjj woo  )1(
Update weights into jijji ow

Sample Learned XOR Network
3.11
7.386.96
5.24
3.6
3.58
5.57
5.74
2.03A
X Y
B
Hidden Unit A represents: (X Y)
Hidden Unit B represents: (X Y)
Output O represents: A B = (X Y) (X Y)
= X Y
O

Backpropagation example (XOR)2.8
x
2
2.2
x
1
5.7
-7
3.2
1 1
4.8
5.7 3.2
6.4
H
1
H
2
x1 = 1, x2 = 1
sum(H
1) = -2.2 + 5.7 + 5.7 = 9.2, output(H
1) = 0.99
sum(H
2) = -4.8 + 3.2 + 3.2 = 1.6, output(H
2) = 0.83
sum = -2.8 + (0.99*6.4) + (0.83*-7) = -2.28, output = 0.09
x1 = 1, x2 = 0
sum(H
1) = -2.2 + 5.7 + 0 = 3.5, output(H
1) = 0.97
sum(H
2) = -4.8 + 3.2 + 0 = -1.6, output(H
2) = 0.17
sum = -2.8 + (0.97*6.4) + (0.17*-7) = 2.22, output = 0.90
x1 = 0, x2 = 1
sum(H
1) = -2.2 + 0 + 5.7 = 3.5, output(H
1) = 0.97
sum(H
2) = -4.8 + 0 + 3.2 = -1.6, output(H
2) = 0.17
sum = -2.8 + (0.97*6.4) + (0.17*-7) = 2.22, output = 0.90
x1 = 0, x2 = 0
sum(H
1) = -2.2 + 0 + 0 = -2.2, output(H
1) = 0.10
sum(H
2) = -4.8 + 0 + 0 = -4.8, output(H
2) = 0.01
sum = -2.8 + (0.10*6.4) + (0.01*-7) = -2.23, output = 0.10

Backpropagation example
considerthefollowingpoliticalpoll,takenbysixpotentialvoters
eachrankedvarioustopicsastotheirimportance,scaleof0
to10
voters1-3identifiedthemselvesasDemocrats,voters4-6as
Republicans
EconomyDefenseCrime Environment
voter 19 3 4 7
voter 27 4 6 7
voter 38 5 8 4
voter 45 9 8 4
voter 56 7 6 2
voter 67 8 7 4

Backpropagation example (cont.)
usingtheneuralnet,trytoclassifythefollowingnew
respondents.
EconomyDefenseCrime Environment
voter 19 3 4 7
voter 27 4 6 7
voter 38 5 8 4
voter 45 9 8 4
voter 56 7 6 2
voter 67 8 7 4
voter 710 10 10 1
voter 85 2 2 7
voter 98 3 3 3

Recurrent networks
●FFNNisacyclicwheredatapassesfrominputtotheoutput
nodesandnotviceversa.
−OncetheFFNNistrained,itsstateisfixedanddoesnotalterasnew
dataispresentedtoit.Itdoesnothavememory.
●Recurrentnetworkscanhaveatleastoneconnectionthatgo
backwardfromoutputtoinputnodesandmodeldynamic
systems(todotemporalprocessing).
−Inthisway,arecurrentnetwork’sinternalstatecanbealteredassetsof
inputdataarepresented.Itcanbesaidtohavememory.
−Itisusefulinsolvingproblemswherethesolutiondependsnotjustonthe
currentinputsbutonallpreviousinputs.
●Applications
−predictstockmarketprice
−weatherforecast

Hopfield Network: Example Recurrent
Networks
●AHopfieldnetworkisakindofrecurrentnetworkasoutput
valuesarefedbacktoinputinanundirectedway.Ithasthe
followingfeatures:
−Distributedrepresentation.
−Distributed,asynchronouscontrol.
−Contentaddressablememory.
−Faulttolerance.

Example Hopfield net: Pattern
completion and association
•ThepurposeofaHopfieldnetistostoreoneormorepatternsandtorecallthefull
patternsbasedonpartialinput.

Activation Algorithm: Parallel relaxation
Active unit represented by 1 and inactive by 0.
●Repeat
−Chooseanyunitrandomly.Thechosenunitmaybeactiveor
inactive.
−Forthechosenunit,computethesumoftheweightsonthe
connectiontotheactiveneighboursonly,ifany.
Ifsum>0(thresholdisassumedtobe0),thenthechosenunit
becomesactive,otherwiseitbecomesinactive.
−Ifchosenunithasnoactiveneighboursthenignoreit,and
statusremainssame.
●Untilthenetworkreachestoastablestate

Example Hopfield nets
A Three node Hopfield Network
InaHopfieldnetwork,allthenodes
areinputstoeachother,andthey're
alsooutputs.
A Seven node Hopfield Network
?
?
?
Four stable states of the Hopfield Network (given any initial state, the
network will settle into one of these four states, hence stores these patterns)
Information being sent back and forth does
not change after a few iterations (stability)

Weight Computation Method
●Weightsaredeterminedusingtrainingexamples.
●Here
−Wisweightmatrix
−XiisaninputexamplerepresentedbyavectorofNvaluesfromthe
set{–1,1}.
−Here,Nisthenumberofunitsinthenetwork;1and-1represent
activeandinactiveunitsrespectively.
−(Xi)
T
isthetransposeoftheinputXi,
−Mdenotesthenumberoftraininginputvectors,
−IisanN×Nidentitymatrix. W = Xi . (Xi)
T
– M.I, for 1  i  M

Example
●LetusnowconsideraHopfieldnetworkwithfourunitsand
threetraininginputvectorsthataretobelearntbythe
network.
●Considerthreeinputexamples,namely,X1,X2,andX3
definedasfollows: 1 1 –1
X1 = –1 X2 = 1 X3 = 1
–1 –1 1
1 1 –1
W = X1 . (X1)
T
+ X2 . (X2)
T
+ X3 . (X3)
T
– 3.I

Continued…
X1 = [1 –1 –1 1]
1 -1 2
3 1
-3 -1

3 -3 4

X3 = [-1 1 1 -1]
1 -1 2
3 1
-3 -1

3 -3 4


Stable positions of the network

3 –1 –3 3 3 0 0 0 0 –1 –3 3
W = –1 3 1 –1 . – 0 3 0 0 = –1 0 1 –1
–3 1 3 –3 0 0 3 0 –3 1 0 –3
3 –1 –3 3 0 0 0 3 3 –1 –3 0

Solving TSP using Hopfield net
Source: The Traveling Salesman Problem: A Neural Network Perspective Jean-Yves Potvin Centre de Recherché sur les
Transports Université de Montréal C.P. 6128, Succ. A, Montréal (Québec) Canada H3C 3J7

Boltzmann Machines
•ABoltzmannMachineisavariantoftheHopfieldNetworkcomposedofN
unitswithactivations{xi}.Thestateofeachunitiisupdated
asynchronouslyaccordingtotherule:
withpositivetemperatureconstantT,networkweightsw
ijandthresholdsθ
j.
ThefundamentaldifferencebetweentheBoltzmannMachineanda
standardHopfieldNetworkisthestochasticactivationoftheunits.IfTis
verysmall,thedynamicsoftheBoltzmannMachineapproximatesthe
dynamicsofthediscreteHopfieldNetwork,butwhenTislarge,thenetwork
visitsthewholestatespace. Used in Deep learning
https://en.wikipedia.org

Reinforcement Learning
•A child to read more for the exam and score high
•A robot cleaning the room and recharging its battery
•How to invest in shares
•so on ..
•Itallowsmachinesandsoftwareagents
toautomaticallydeterminetheideal
behaviorwithinaspecificcontext,in
ordertomaximizeitsperformance.
Simplerewardfeedbackisrequiredfor
theagenttolearnit’sbehavior;thisis
knownasthereinforcementsignal.
reward-motivated behaviour

Passive vs. Active learning
Passive learning
The agent has a fixed policy and tries to learn the utilities
of states by observing the world go by
Active learning
The agent attempts to find an optimal (or at least good)
policy by acting in the world
https://deepmind.com/
blog/deep-
reinforcement-
learning/

Passive RL
Estimate U

(s)
Follow the policy for
many epochs giving training
sequences.
Assume that after entering +1 or -1 state the agent
enters zero reward terminal state
(1,1)(1,2)(1,3)(1,2)(1,3)(2,3)(3,3) (3,4)+1
(1,1)(1,2)(1,3)(2,3)(3,3)(3,2)(3,3)(3,4) +1
(1,1)(2,1)(3,1)(3,2)(2,4) -1

Competitive Learning: Unsupervised
learning using ANNs
•Three groups: Mammals, reptiles and birds.
•With a teacher, we can of course use BPNs.
•How will you do it without a teacher?
1.Presenttheinputvector
2.Calculatetheinitialactivationforeachoutputunit
3.Lettheoutputunitsfighttilloneisactive
4.Increasetheweightsonconnectionsbetweenthe
activeoutputandinputunitssothatnexttimeit
willbemoreactive
5.Repeatformanyepochs.
Winner-take-all
behaviour.

Self Organizing Maps (Kohonen nets) Example2
)(
2
1
p
j
j
ijp
xwE 
Sum squared error for pattern pfor all output layer
neurons (deviations of predicted from actual):
Any change w
ijin the weight is expected to cause a
reduction in error E
p. ij
p
ijp
w
E
w




Rate of change of E
pwith respect to any one weight value w
ijhas
to be measured by calculating its partial derivative p
jij
ij
p
xw
w
E



partial derivative of E
p. )()(
ij
p
j
p
jij
ij
p
ijp wxxw
w
E
w 

 

TSPusing Kohonen Self Organizing Nets
R
a
n
d
o
m
r
i
n
g
P
o
i
n
t
p
u
l
l
e
d
Img. Source: https://cs.stanford.edu
Aftermanypullsfordifferentcities,thenet
eventuallyconvergesintoaringaroundthe
cities.Giventheelasticpropertyofthering,
thelengthoftheringtendstobeminimized.
(TSPapproximation)
Consider the weight vectors
as points on a plane.
Wecanjointhesepoints
togetheraccordingtothe
positionoftheirrespective
perceptronintheringofthe
toplayerofthenetwork
Coordinatesofacity(x,y)ispresentedastheinput
vectorofthenetwork,thenetworkwillidentifythe
weightvectorclosesttothecityandmoveitandits
neighborsclosertothecity.

Applications of ANNs: OCR
Image source: https://cs.stanford.edu/people/eroberts Anil K Jain, Michigan State Uni.

Learned to Pronounce English text:
NETtalk (Sejnowski and Rosenberg, 1987)
Text input
ConverttexttoNETtalkpattern

NETtalk Network
ConvertPhonemetoAnalog
signal
Analog
signal to
speaker
“Hello”

NETtalk continued…
Producesintelligiblespeechafter10
trainingepochs.
Phonemes are encoded using 21
different features and remaining 5
for stress and syllable boundaries
80 hidden units
26
7X29 inputs
Neuralnetworktrainedhowtopronounce
eachletterinawordinasentence,given
thethreelettersbeforeandthreeletters
afteritinawindow.

Machine learning: Symbolic
symbol-basedlearning:theprimaryinfluenceonlearning
isdomainknowledge
versionspacesearch,decisiontrees,explanation-based
learning.
Image Source: http://web.media.mit.edu/~minsky/papers/SymbolicVs.Connectionist.html
done

Learning from Examples: Inductive
Winston’s Learning Program
1.Selectoneknowninstanceoftheconcept.Callthistheconcept
definition.
2.Examinedefinitionsofotherknowninstancesofthe
concept.Generalizethedefinitiontoincludethem.
3.Examinedescriptionsofnearmisses.Restrictthedefinition
toexcludethese.
House

An Example: Learning concepts from
positive and negative Examples

Generalization of descriptions

Specialization of a description

Issues with Structural concept learning
Theteachermustguidethesystemthroughcarefully
chosensequencesofexamples.
InWinston'sprogramtheorderoftheprocessis
importantsincenewlinksareaddedasandwhen
newknowledgeisgathered.
Theconceptofversionspacesisinsensitivetoorder
oftheexamplespresented.

Example of Version space
Considerthetasktoobtainadescriptionoftheconcept:
JapaneseEconomycar.
Theattributesunderconsiderationare:
Origin,Manufacturer,Color,Decade,Type
trainingdata:
Positiveex:(Japan,Honda,Blue,1980,Economy)
Positiveex:(Japan,Honda,White,1980,Economy)
Negativeex:(Japan,Toyota,Green,1970,Sports)

Themostgeneralhypothesisthatmatchesthe
positivedataanddoesnotmatchthenegative
data,is:
(x,Honda,x,x,Economy)
Themostspecifichypothesisthatmatchesthe
positiveexamplesis:
(Japan,Honda,x,1980,Economy)
Continued…

Converging boundaries
Concept space
Img. Source: Wiki

Candidate Elimination (Mitchell)
InitializeGtocontainoneelement:thenulldescriptioni.e.,the
mostgeneraldescription(allfeaturesarevariables).
InitializeStocontainoneelement:thefirstpositiveexample.
Acceptanewtrainingexample.
•RemovefromGanydescriptionsthatdonotcoverthe
example.
•GeneralizeSaslittleaspossiblesothatthenewtraining
exampleiscovered.
•RemovefromSallelementsthatcovernegativeexamples.
Process Positive Examples:

Algorithm continued…
•RemovefromSanydescriptionsthatcoverthenegativeexample.
•SpecializeGaslittleaspossiblesothatthenegativeexampleisnotcovered.
•RemovefromGallelementsthatdonotcoverthepositiveexamples.
Process Negative Examples:
Continueprocessingnewtrainingexamples,untiloneofthefollowing
occurs:
•EitherSorGbecomeempty,therearenoconsistenthypothesesoverthe
trainingspace.Stop.
•SandGarebothsingletonsets.
–iftheyareidentical,outputtheirvalueandstop.
–iftheyaredifferent,thetrainingcaseswereinconsistent.Outputthis
resultandstop.
•Nomoretrainingexamples.Ghasseveralhypotheses.

Example
Origin ManufacturerColor Decade Type Example Type
Japan Honda Blue 1980 Economy Positive
Japan Toyota Green 1970 Sports Negative
Japan Toyota Blue 1990 Economy Positive
USA Chrysler Red 1980 Economy Negative
Japan Honda White 1980 Economy Positive
Learning the concept of "Japanese Economy Car"
1. Positive Example: (Japan, Honda, Blue, 1980, Economy)
2. Negative Example: (Japan, Toyota, Green, 1970, Sports)

Example continued…
3. Positive Example: (Japan, Toyota, Blue, 1990, Economy)
4. Negative Example: (USA, Chrysler, Red, 1980, Economy)

Example continued…
5. Positive Example: (Japan, Honda, White, 1980, Economy)

Decision Trees
Decisiontreebuildsclassificationorregressionmodelsintheformofatree
structure.Itbreaksdownadatasetintosmallerandsmallersubsetswhileat
thesametimeanassociateddecisiontreeisincrementallydeveloped.
Source: http://www.saedsayad.com/decision_tree.htm

Another example

Quinlan’s ID3 (1986)
•ID3algorithmusesentropytocalculatethehomogeneityofasample.Ifthesampleis
completelyhomogeneoustheentropyiszeroandifthesampleisanequallydividedithas
entropyofone.
Informationentropyisdefined
astheaverageamount
ofinformationproduced/convey
edbyastochasticsourceof
data.

Person
Hair
Length
WeightAgeClass
Homer0” 25036 M
Marge10” 15034 F
Bart2” 90 10 M
Lisa6” 78 8 F
Maggie4” 20 1 F
Abe1” 17070 M
Selma8” 16041 F
Otto10” 18038 M
Krusty6” 20045 M
Comic 8” 290 38 ?
Source: Allan Neymark

Hair Length <= 5?
yes no
Entropy(4F,5M) = -(4/9)log
2(4/9) -(5/9)log
2(5/9)
= 0.9911



















np
n
np
n
np
p
np
p
SEntropy
22 loglog)(
Gain(Hair Length <= 5) = 0.9911–(4/9 * 0.8113+ 5/9 * 0.9710) = 0.0911)()()( setschildallEsetCurrentEAGain 
Let us try splitting on Hair
length
Source: Allan Neymark

Weight <= 160?
yes no
Entropy(4F,5M) = -(4/9)log
2(4/9) -(5/9)log
2(5/9)
= 0.9911



















np
n
np
n
np
p
np
p
SEntropy
22 loglog)(
Gain(Weight <= 160) = 0.9911–(5/9 * 0.7219+ 4/9 * 0) = 0.5900)()()( setschildallEsetCurrentEAGain 
Let us try splitting on
Weight
Source: Allan Neymark

age <= 40?
yes no
Entropy(4F,5M) = -(4/9)log
2(4/9) -(5/9)log
2(5/9)
= 0.9911



















np
n
np
n
np
p
np
p
SEntropy
22 loglog)(
Gain(Age <= 40) = 0.9911–(6/9 * 1+ 3/9 * 0.9183) = 0.0183)()()( setschildallEsetCurrentEAGain 
Let us try splitting on Age
Source: Allan Neymark

Weight <= 160?
yes no
Hair Length <= 2?
yes no
Of the 3 features we had, Weightwas best.
But while people who weigh over 160 are
perfectly classified (as males), the under
160 people are not perfectly classified…
So we simply recurse!
This time we find that we can split
on Hair length,and we are done!
Source: Allan Neymark

Weight <= 160?
yes no
Hair Length <= 2?
yes no
We don’t need to keep the data around, just the
test conditions.
Male
Male Female
How would these
people be
classified?
Source: Allan Neymark

It is trivial to convert Decision Trees to
rules…
Weight <= 160?
yes no
Hair Length <= 2?
yes no
Male
Male Female
Rules to Classify Males/Females
IfWeightgreater than160, classify as Male
Elseif Hair Lengthless than or equalto 2, classify as Male
Elseclassify as Female
Source: Allan Neymark

Pruning DTs
1.Keepasideapartofthedatasetforpost-
pruningofthetree.Thisisdifferentfromthetest
setandisknownasthepruningset.
2.ForasubtreeSofthetree,ifreplacingSbya
leafdoesnotmakemorepredictionerrorsonthe
pruningsetthantheoriginaltree,replaceSbya
leaf.
3.Performstep2onlywhennosubtreeofS
possessesthepropertymentionedin2.

Explanation Based Learning

Explanation-Based Learning
A
C
B
ABC
C
B
A
A
C
B
AB
C
Don’tstackanythingaboveablock,iftheblockhastobe
freeinthefinalstate

Atargetconcept
Thelearningsystemfindsan“operational”definitionofthis
concept,expressedintermsofsomeprimitives.Thetarget
conceptisrepresentedasapredicate.
Atrainingexample
Thisisaninstanceofthetargetconcept.Ittakestheformofa
setofsimplefacts,notallofthemnecessarilyrelevanttothe
theory.
Adomaintheory
Thisisasetofrules,usuallyinpredicatelogic,thatcanexplain
howthetrainingexamplefitsthetargetconcept.
Operationalitycriteria
Thesearethepredicates(features)thatshouldappearinan
effectivedefinitionofthetargetconcept.
Explanation Based Learning

Aclassicexample:atheoryandaninstanceofacup.Acupis
acontainerforliquidsthatcanbeeasilylifted.Ithassome
typicalparts,suchasahandleandabowl,theactual
containers,mustbeconcave.Becauseacupcanbelifted,it
shouldbelight.Andsoon…
Thetargetconceptiscup(X).
Thedomaintheoryhasfiverules.
liftable(X)holds_liquid(X)cup(X)
part(Z,W)concave(W)points_up(W)
holds_liquid(Z)
light(X)part(X,handle)liftable(X)
small(A)light(A)
made_of(A,feathers)light(A)
Continued…

The training example lists nine facts (some of them are
not relevant).
cup( obj1 ) small( obj1 )
part( obj1, handle )owns( bob, obj1 )
part( obj1, bottom )part( obj1, bowl )
points_up( bowl )concave( bowl )
color( obj1, red )
Operationality criteria require a definition in terms of
structural properties of objects (part, points_up, small,
concave).
Continued…

Step 1: prove the target concept using the training example
Continued…

Step2:generalizetheproof.Constantsfromthedomaintheory,for
examplehandle,arenotgeneralized.
Continued…

Step3:Takethedefinition“offthetree”,
onlytherootandtheleaves.
In our example, we get this rule:
small( X ) 
part( X, handle ) 
part( X, W ) 
concave( W ) 
points_up( W ) cup( X )
Continued…

Ensemble learning: Motivation
•Whendesigningalearningagent,wegenerally
makesomechoices:
•parameters,trainingdata,representation,etc…
•Thisimpliessomesortofvarianceinperformance
•Whynotkeepalllearnersandaverage?
•...

Example: Weather Forecast
Reality
1
2
3
4
5
Combine
X
X X
X X X
X XX
X X
X X
Source: Carla P. Gomes

•Same algorithm, different versions of data set, e.g.
•Bagging: resample training data
•Boosting: Reweight training data
•Decorate: Add additional artificial training data
•Random Subspace (random forest): random subsets of
features
InWEKA,thesearecalledmeta-learners,theytakea
learningalgorithmasanargument(baselearner)and
createanewlearningalgorithm.
Different types of ensemble learning

Ensemble learning: Stacking
Img source: https://rasbt.github.io

Bootstrap
Sampling with replacement
Contains around 63.2% original records in each sample
Bootstrap Aggregation
Train a classifier on each bootstrap sample
Use majority voting to determine the class label of
ensemble classifier
Bagging

Bootstrap samples and classifiers:
Combine predictions by majority voting
Continued…

•Principles
•Boost a set of weak learners to a strong learner
•Make records currently misclassified more important
•Example
–Record 4 is hard to classify
–It’s weight is increased, therefore it is more likely to be chosen
again in subsequent rounds
Boosting

Inductive Logic Programming (ILP)
•InML,attribute-valuerepresentationsaremore
usual(decisiontrees,rules,...)
•Whypredicatelogic?
•Moreexpressivethanattribute-valuerepresentations
•Enablesflexibleuseofbackgroundknowledge
(knowledgeknowntolearnerpriortolearning)
1. The chair in the living room is red. The chair in the dining room is red. The
chair in the bedroom is red. All chairs in the house are red.
2.Amitleavesforschoolat7:00a.m.Amitisalwaysontime.Amitassumes,
then,thathewillalwaysbeontimeifheleavesat7:00a.m.

Deductive Vs Inductive Reasoning
T B  E (deduce)
parent(X,Y) :-mother(X,Y).
parent(X,Y) :-father(X,Y).
mother(mary,vinni).
mother(mary,andre).
father(carrey,vinni).
father(carrey,andre).
parent(mary,vinni).
parent(mary,andre).
parent(carrey,vinni).
parent(carrey,andre).
parent(mary,vinni).
parent(mary,andre).
parent(carrey,vinni).
parent(carrey,andre).
mother(mary,vinni).
mother(mary,andre).
father(carrey,vinni).
father(carry,andre).
parent(X,Y) :-mother(X,Y).
parent(X,Y) :-father(X,Y).
E B  T (induce) 

Learning definition of daughter
% Background knowledge
parent( pam, bob).
parent( tom, liz).
...
female( liz).
male( tom).
...
% Positive examples
ex( has_a_daughter(tom)). % Tom has a daughter
ex( has_a_daughter(bob)).
...
% Negative examples
nex( has_a_daughter(pam)). % Pam doesn't have a daughter
...

Top-down induction
has_a_daughter(X).
has_a_daughter(X) :- has_a_daughter(X) :- has_a_daughter(X) :-
male(Y). female(Y). parent(Y,Z).
has_a_daughter(X) :- has_a_daughter(X) :- has_a_daughter(X) :-
male(X). female(Y), parent(X,Z).
parent(S,T).
... ... ... ... ... has_a_daughter(X) : -
parent(X,Z),
female(U).
has_a_daughter(X) :-
parent(X,Z),
female(Z).
•Applyasubstitutiontoclause
•Addaliteraltothebodyof
clause

First-Order Inductive Logic
•Example:Considerlearningadefinitionforthetarget
predicatepathforfindingapathinadirectedacyclic
graph.
path(X,Y):-edge(X,Y).
path(X,Y):-edge(X,Z),path(Z,Y).
1
2
3
4
65
edge: {<1,2>,<1,3>,<3,6>,<4,2>,<4,6>,<6,5>}
path: {<1,2>,<1,3>,<1,6>,<1,5>,<3,6>,<3,5>,
<4,2>,<4,6>,<4,5>,<6,5>}
Training data:

Negative Training Data
Negativeexamplesoftargetpredicatecanbeprovided
directly,orgeneratedindirectlybymakingaclosedworld
assumption.
Everypairofconstants<X,Y>notinpositivetuplesforpathpredicate.
1
2
3
4
65
Negative pathtuples:
{<1,1>,<1,4>,<2,1>,<2,2>,<2,3>,<2,4>,<2,5>,<2,6>,
<3,1>,<3,2>,<3,3>,<3,4>,<4,1>,<4,3>,<4,4>,<5,1>,
<5,2>,<5,3>,<5,4>,<5,5>,<5,6>,<6,1>,<6,2>,<6,3>,
<6,4>,<6,6>}

Sample FOILInduction
1
2
3
4
65
Pos: {<1,2>,<1,3>,<1,6>,<1,5>,<3,6>,<3,5>,
<4,2>,<4,6>,<4,5>,<6,5>}
Start with clause:
path(X,Y):-.
Possible literals to add:
edge(X,X),edge(Y,Y),edge(X,Y),edge(Y,X),edge(X,Z),
edge(Y,Z),edge(Z,X),edge(Z,Y),path(X,X),path(Y,Y),
path(X,Y),path(Y,X),path(X,Z),path(Y,Z),path(Z,X),
path(Z,Y), plus negations of all of these.
Neg: {<1,1>,<1,4>,<2,1>,<2,2>,<2,3>,<2,4>,<2,5>,<2,6>,
<3,1>,<3,2>,<3,3>,<3,4>,<4,1>,<4,3>,<4,4>,<5,1>,
<5,2>,<5,3>,<5,4>,<5,5>,<5,6>,<6,1>,<6,2>,<6,3>,
<6,4>,<6,6>}

Neg: {<1,1>,<1,4>,<2,1>,<2,2>,<2,3>,<2,4>,<2,5>,<2,6>,
<3,1>,<3,2>,<3,3>,<3,4>,<4,1>,<4,3>,<4,4>,<5,1>,
<5,2>,<5,3>,<5,4>,<5,5>,<5,6>,<6,1>,<6,2>,<6,3>,
<6,4>,<6,6>}
Sample FOILInduction
1
2
3
4
65
Pos: {<1,2>,<1,3>,<1,6>,<1,5>,<3,6>,<3,5>,
<4,2>,<4,6>,<4,5>,<6,5>}
Test:
path(X,Y):-edge(X,X).
Covers 0 positive examples
Covers 6 negative examples
Not a good literal.

Sample FOILInduction
1
2
3
4
65
Pos: {<1,2>,<1,3>,<1,6>,<1,5>,<3,6>,<3,5>,
<4,2>,<4,6>,<4,5>,<6,5>}
Test:
path(X,Y):-edge(X,Y).
Covers 6 positive examples
Covers 0 negative examples
Chosen as best literal. Result is base clause.
Neg: {<1,1>,<1,4>,<2,1>,<2,2>,<2,3>,<2,4>,<2,5>,<2,6>,
<3,1>,<3,2>,<3,3>,<3,4>,<4,1>,<4,3>,<4,4>,<5,1>,
<5,2>,<5,3>,<5,4>,<5,5>,<5,6>,<6,1>,<6,2>,<6,3>,
<6,4>,<6,6>}

Sample FOILInduction
1
2
3
4
65
Pos: {<1,6>,<1,5>,<3,5>,
<4,5>}
Test:
path(X,Y):-edge(X,Y).
Covers 6 positive examples
Covers 0 negative examples
Chosen as best literal. Result is base clause.
Neg: {<1,1>,<1,4>,<2,1>,<2,2>,<2,3>,<2,4>,<2,5>,<2,6>,
<3,1>,<3,2>,<3,3>,<3,4>,<4,1>,<4,3>,<4,4>,<5,1>,
<5,2>,<5,3>,<5,4>,<5,5>,<5,6>,<6,1>,<6,2>,<6,3>,
<6,4>,<6,6>}
Remove covered positive tuples.

Sample FOILInduction
1
2
3
4
65
Pos: {<1,6>,<1,5>,<3,5>,
<4,5>}
Start new clause
path(X,Y):-.
Neg: {<1,1>,<1,4>,<2,1>,<2,2>,<2,3>,<2,4>,<2,5>,<2,6>,
<3,1>,<3,2>,<3,3>,<3,4>,<4,1>,<4,3>,<4,4>,<5,1>,
<5,2>,<5,3>,<5,4>,<5,5>,<5,6>,<6,1>,<6,2>,<6,3>,
<6,4>,<6,6>}

Sample FOILInduction
1
2
3
4
65
Pos: {<1,6>,<1,5>,<3,5>,
<4,5>}
Test:
path(X,Y):-edge(X,Y).
Neg: {<1,1>,<1,4>,<2,1>,<2,2>,<2,3>,<2,4>,<2,5>,<2,6>,
<3,1>,<3,2>,<3,3>,<3,4>,<4,1>,<4,3>,<4,4>,<5,1>,
<5,2>,<5,3>,<5,4>,<5,5>,<5,6>,<6,1>,<6,2>,<6,3>,
<6,4>,<6,6>}
Covers 0 positive examples
Covers 0 negative examples
Not a good literal.

Sample FOILInduction
1
2
3
4
65
Pos: {<1,6>,<1,5>,<3,5>,
<4,5>}
Test:
path(X,Y):-edge(X,Z).
Neg: {<1,1>,<1,4>,<2,1>,<2,2>,<2,3>,<2,4>,<2,5>,<2,6>,
<3,1>,<3,2>,<3,3>,<3,4>,<4,1>,<4,3>,<4,4>,<5,1>,
<5,2>,<5,3>,<5,4>,<5,5>,<5,6>,<6,1>,<6,2>,<6,3>,
<6,4>,<6,6>}
Covers all 4 positive examples
Covers 14 of 26 negative examples
Eventually chosen as best possible literal

Next: Intro to NLP
Tags