Classification in Data Mining

2,037 views 79 slides Jul 31, 2022
Slide 1
Slide 1 of 79
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79

About This Presentation

Process of Classification in Data Mining


Slide Content

Classification and Prediction

2

What is Classification?
•In Classification, a modelor classifieris constructed to predict categorical labels
•Data classification is a two-step process
1. Learning 2. Classification
Training
Data
Classification
algorithm
Classification
rules
Test Data
New Data
(unknown
class label)
Class Label
Data Mining: Classification and Prediction 3

What is Classification?
•Learning step:
•A classification algorithm builds the classifierby analyzing or “learning from” a training
set made up of database tuples and their associated class labels.
•The individual tuples making up the training set are referred to as training tuples.
•Data tuples can be referred to as samples, examples, instances, data points, or objects
•This is supervised learning step
•The class label of each training tuple is provided
•This process can be viewed as the learning of a mapping or function y=�(�)
•Predicts the associated class label �of a given tuple �
•This mapping is represented in the form of classification rules, decision trees, or
mathematical formulae
Data Mining: Classification and Prediction 4

What is Classification?
•Classification Step
•The model is used for classification.
•A test set is used, made up of test tuples and their associated class labels.
•Randomly selected tuples from the general data set
•The accuracy of a classifier on a given test set is the percentage of test set tuples that
are correctly classifiedby the classifier.
•The associated class label of each test tuple is compared with the learned classifier’s class
prediction for that tuple.
•If the accuracy of the classifier is considered acceptable, the classifier can be used to
classify future data tuples for which the class label is not known.
Data Mining: Classification and Prediction 5

Data Mining: Classification and Prediction 6

What is Prediction?
•Data prediction is a two step process similar to data classification
•There is no class attribute
•Because attribute values to be predicted are continuous-valued (ordered) rather than
categorical (discrete-valued)
•Predicted attribute
•Prediction can also be viewed as a mapping or functiony=�(�)
Data Mining: Classification and Prediction 7

How classification and prediction are
different?
•Data classification classifies categoricalattribute values
•Data prediction predicts continuous-valuedattribute value
•Testing data is used to assess accuracy of a classifier
•The accuracy of a predictor is estimated by computing an error based on the
difference between the predicted value and the actual known value of y for each
of the test tuples, X
Data Mining: Classification and Prediction 8

Decision Tree Induction
•It is the learning of decision trees from class-labeled training tuples.
•A decision tree is a flowchart-like tree structure, where
•Each internal node (non-leaf node) denotes a test on an attribute,
•Each branchrepresents an outcome of the test, and
•Each leaf node (or terminal node) holds a class label.
•The topmost node in a tree is the rootnode
•Is a person fit?????
•Binary decision tree
•Non-binary decision tree
Data Mining: Classification and Prediction 9
Age <30?
Eats lots of
pizza?
Exercises
daily?
Fit Fit Unfit! Unfit!
Yes No
Yes Yes No No
Fig. Decision Tree for the concept being fit

Decision Tree Induction
•How are decision trees used for classification?
•Given a tuple, X, for which the associated class label is unknown, the attribute values of
the tuple are tested against the decision tree.
•A path is traced from the root to a leaf node, which holds the class prediction for that
tuple.
•Advantages of decision trees:
•Does not require any domain knowledge or parameter setting
•Can handle high dimensional data.
•The learning and classification steps of decision tree induction are simple and fast
•Have good accuracy
Data Mining: Classification and Prediction 10

Decision Tree Induction
•Attribute selection measures
•Used to select the attribute that best partitions the tuples into distinct
classes.
•Information gain, Gain Ratio, Gini Index
•A decision tree algorithm is known as ID3(Iterative Dichotomiser).
•C4.5algorithm (successor of ID3) benchmark to newer supervised
learning algorithms
•Classification and Regression Trees (CART)
•Adopt a greedy (i.e., nonbacktracking) approach in which decision trees
are constructed in a top-downrecursive divide-and-conquermanner
Data Mining: Classification and Prediction 11
Info Gain
Gain Ratio
Gini Index

Decision Tree Induction
•Information Gain
•ID3 uses information gain as its attribute selection measure
•The expected information needed to classify a tuple in &#3627408439;is given by
&#3627408496;&#3627408527;&#3627408519;&#3627408528;&#3627408491;=−෍
&#3627408522;=&#3627409359;
&#3627408526;
&#3627408529;
&#3627408522;??????????????????
&#3627409360;(&#3627408529;
&#3627408522;)
•where &#3627408477;
&#3627408470;is the probability that an arbitrary tuple in &#3627408439;belongs to class &#3627408438;
&#3627408470;,estimated as ൗ
&#3627408438;
??????,??????
&#3627408439;
•Info(D) is also known as the entropyof D.
•The expected information required to classify a tuple from &#3627408439;based on the partitioning by
attribute &#3627408436;.
&#3627408496;&#3627408527;&#3627408519;&#3627408528;
&#3627408488;&#3627408491;=෍
&#3627408523;=&#3627409359;
&#3627408535;
&#3627408491;
&#3627408523;
&#3627408491;
×&#3627408496;&#3627408527;&#3627408519;&#3627408528;(&#3627408491;
&#3627408523;)
Data Mining: Classification and Prediction 12

Decision Tree Induction
•Information Gain
•Information gain is defined as the difference between the original information
requirement and the new requirement
&#3627408494;&#3627408514;&#3627408522;&#3627408527;&#3627408488;=&#3627408496;&#3627408527;&#3627408519;&#3627408528;&#3627408491;−&#3627408496;&#3627408527;&#3627408519;&#3627408528;
&#3627408488;&#3627408491;
•The attribute &#3627408436;with the highest information gain, &#3627408494;&#3627408514;&#3627408522;&#3627408527;(&#3627408488;), is chosen as the splitting
attribute at node &#3627408449;.
Data Mining: Classification and Prediction 13

Decision Tree Induction
•Decision Tree Generation Algorithm
•Input:
•Data partition, D,
which is a set of training tuples and their associated class labels;
•Attribute_list,
the set of candidate attributes;
•Attribute_selection_method,
a procedure to determine the splitting criterion that “best” partitions the data tuples into
individual classes. This criterion consists of a splitting attribute and, possibly, either a split point
or splitting subset.
Data Mining: Classification and Prediction 14

Decision Tree Induction
•Generate_decision_treeAlgorithm
•Method
1.create a node N;
2.if tuples in Dare all of the same class, Cthen
3. return Nas a leaf node labeled with the class C;
4.if Attribute_listis empty then
5. return Nas a leaf node labeled with the majority class in D;
6.apply Attribute_selection_method(D, Attribute_list) to find the “best” splitting
criterion;
7.label node Nwith splitting criterion;
Data Mining: Classification and Prediction 15

Decision Tree Induction
•Decision Tree Generation Algorithm
•Method
8.if splitting_attributeis discrete-valued and multiway splits allowed then
9. Attribute_list←Attribute_list −Splitting_attribute; // remove splitting attribute
10.for each outcome jof splitting criterion
11.let D
jbe the set of data tuples in Dsatisfying outcome j; // a partition
12.if D
jis empty then
13. attach a leaf labeled with the majority class in D to node N;
14.else attach the node returned by Generate_decision_tree(D
j, attribute_list) to node N;
15.End for
16.return N;
Data Mining: Classification and Prediction 16

Example
Data Mining: Classification and Prediction 17
Patient IDAge Sex BP CholesterolClass: Drug
P1 <=30 F High Normal Drug A
P2 <=30 F High High Drug A
P3 31…50 F High Normal Drug B
P4 >50 F Normal Normal Drug B
P5 >50 M Low Normal Drug B
P6 >50 M Low High Drug A
P7 31…50 M Low High Drug B
P8 <=30 F Normal Normal Drug A
P9 <=30 M Low Normal Drug B
P10 >50 M Normal Normal Drug B
P11 <=30 M Normal High Drug B
P12 31…50 F Normal High Drug B
P13 31…50 M High Normal Drug B
P14 >50 F Normal High Drug A
P15 31…50 F Low Normal ?

Example
•Reduced Training Data
•Establish the target classification
Which Drug to advice???
•5/14 →Drug A
•9/14 →Drug B
Data Mining: Classification and Prediction 18
Age Gender BP CholesterolClass: Drug
<=30 F High Normal Drug A
<=30 F High High Drug A
31…50 F High Normal Drug B
>50 F Normal Normal Drug B
>50 M Low Normal Drug B
>50 M Low High Drug A
31…50 M Low High Drug B
<=30 F Normal Normal Drug A
<=30 M Low Normal Drug B
>50 M Normal Normal Drug B
<=30 M Normal High Drug B
31…50 F Normal High Drug B
31…50 M High Normal Drug B
>50 F Normal High Drug A

Example
•Calculate Information gain of class attribute: Drug
&#3627408444;&#3627408475;&#3627408467;&#3627408476;&#3627408439;=−
5
14
log
2
5
14

9
14
log
2
9
14
&#3627408444;&#3627408475;&#3627408467;&#3627408476;&#3627408439;=0.9403
•Calculate information gain of remaining attributes to determine the root node
Data Mining: Classification and Prediction 19

Example
•Attribute: Age
•<=30 →5, 31-50 →4, >50 →5
•3 distinct values for attribute Age, so we need to calculate 3 entropy calculations
&#3627408444;&#3627408475;&#3627408467;&#3627408476;
&#3627408436;??????&#3627408466;&#3627408439;=&#3627408444;&#3627408475;&#3627408467;&#3627408476;&#3627408439;−ൗ
5
14
×&#3627408444;&#3627408475;&#3627408467;&#3627408476;
(≤30)+ൗ
4
14
×&#3627408444;&#3627408475;&#3627408467;&#3627408476;
(31−50)+ൗ
5
14
×&#3627408444;&#3627408475;&#3627408467;&#3627408476;
(>50)
&#3627408444;&#3627408475;&#3627408467;&#3627408476;
&#3627408436;??????&#3627408466;&#3627408439;=&#3627409358;.&#3627409360;&#3627409362;&#3627409364;&#3627409365;
Data Mining: Classification and Prediction 21
<=30: 3-Drug A , 2-Drug B&#3627408444;&#3627408475;&#3627408467;&#3627408476;
(≤30)=−Τ
3
5log

3
5−Τ
2
5log

2
5 ≈0.9710
31-50: 0-Drug A , 4-Drug B&#3627408444;&#3627408475;&#3627408467;&#3627408476;
(31−50)=−Τ
0
4log

0
4−Τ
4
4log

4
4 =0
>50 : 2-Drug A , 3-Drug B &#3627408444;&#3627408475;&#3627408467;&#3627408476;
(>50)=−Τ
2
5log

2
5−Τ
3
5log

3
5 ≈0.9710

Example
•Attribute: Gender
•M→7, F→7
•2 distinct values for attribute Gender, so we need to calculate 2 entropy calculations
&#3627408444;&#3627408475;&#3627408467;&#3627408476;
&#3627408442;&#3627408466;&#3627408475;&#3627408465;&#3627408466;&#3627408479;&#3627408439;=&#3627408444;&#3627408475;&#3627408467;&#3627408476;&#3627408439;−ൗ
7
14
×&#3627408444;&#3627408475;&#3627408467;&#3627408476;
&#3627408448;+ൗ
17
14
×&#3627408444;&#3627408475;&#3627408467;&#3627408476;
&#3627408441;
&#3627408444;&#3627408475;&#3627408467;&#3627408476;
&#3627408442;&#3627408466;&#3627408475;&#3627408465;&#3627408466;&#3627408479;&#3627408439;=0.9403−0.7885=&#3627409358;.&#3627409359;&#3627409363;&#3627409359;&#3627409367;
Data Mining: Classification and Prediction 23
F: 4 Drug A, 3 drug B&#3627408444;&#3627408475;&#3627408467;&#3627408476;
&#3627408449;&#3627408476;=−Τ
4
7log

4
7−Τ
3
7log
2
Τ
3
7 ≈0.9852
M: 1 Drug A, 6 drug B&#3627408444;&#3627408475;&#3627408467;&#3627408476;
&#3627408449;&#3627408476;=−Τ
1
7log

1
7−Τ
6
7log
2
Τ
6
7 ≈0.5917

Example
•Attribute: BP
•High→4 , Normal→6 , Low→4
•3 distinct values for attribute BP, so we need to calculate 3 entropy calculations
&#3627408444;&#3627408475;&#3627408467;&#3627408476;
&#3627408437;??????&#3627408439;=&#3627408444;&#3627408475;&#3627408467;&#3627408476;&#3627408439;−ൗ
4
14
×&#3627408444;&#3627408475;&#3627408467;&#3627408476;
&#3627408443;&#3627408470;??????ℎ+ൗ
6
14
×&#3627408444;&#3627408475;&#3627408467;&#3627408476;
&#3627408449;&#3627408476;&#3627408479;&#3627408474;??????&#3627408473;+ൗ
4
14
×&#3627408444;&#3627408475;&#3627408467;&#3627408476;
&#3627408447;&#3627408476;&#3627408484;
&#3627408444;&#3627408475;&#3627408467;&#3627408476;
&#3627408437;??????&#3627408439;=0.9403−0.9111=&#3627409358;.&#3627409358;&#3627409360;&#3627409367;&#3627409360;
Data Mining: Classification and Prediction 25
High: 2-Drug A , 2-Drug B &#3627408444;&#3627408475;&#3627408467;&#3627408476;
(&#3627408443;&#3627408470;??????ℎ)=−Τ
2
4log

2
4−Τ
2
4log

2
4 ≈1.00
Normal: 2-Drug A , 4-Drug B&#3627408444;&#3627408475;&#3627408467;&#3627408476;
(&#3627408449;&#3627408476;&#3627408479;&#3627408474;??????&#3627408473;)=−Τ
2
6log

2
6−Τ
4
6log

4
6 =0.9183
Low: 1-Drug A , 3-Drug B &#3627408444;&#3627408475;&#3627408467;&#3627408476;
(&#3627408447;&#3627408476;&#3627408484;)=−Τ
1
4log

1
4−Τ
3
4log

3
4 ≈0.8113

Data Mining: Classification and Prediction 26
CholesterolClass: Drug
High Drug A
High Drug A
High Drug B
High Drug B
High Drug B
High Drug A
CholesterolClass: Drug
Normal Drug A
Normal Drug B
Normal Drug B
Normal Drug B
Normal Drug A
Normal Drug B
Normal Drug B
Normal Drug B

Example
•Attribute: Cholesterol
•High→6 , Normal →8
•2 distinct values for attribute Cholesterol, so we need to calculate 2 entropy calculations
&#3627408444;&#3627408475;&#3627408467;&#3627408476;
&#3627408438;ℎ&#3627408476;&#3627408473;&#3627408466;&#3627408480;&#3627408481;&#3627408466;&#3627408479;&#3627408476;&#3627408473;&#3627408439;=&#3627408444;&#3627408475;&#3627408467;&#3627408476;&#3627408439;−ൗ
6
14
×&#3627408444;&#3627408475;&#3627408467;&#3627408476;
(&#3627408443;&#3627408470;??????ℎ)+ൗ
8
14
×&#3627408444;&#3627408475;&#3627408467;&#3627408476;
(&#3627408449;&#3627408476;&#3627408479;&#3627408474;??????&#3627408473;)
&#3627408444;&#3627408475;&#3627408467;&#3627408476;
&#3627408438;ℎ&#3627408476;&#3627408473;&#3627408466;&#3627408480;&#3627408481;&#3627408466;&#3627408479;&#3627408476;&#3627408473;&#3627408439;=0.9403−0.8922=&#3627409358;.&#3627409358;&#3627409362;&#3627409366;&#3627409359;
Data Mining: Classification and Prediction 27
High: 3-Drug A , 3-Drug B &#3627408444;&#3627408475;&#3627408467;&#3627408476;
(&#3627408443;&#3627408470;??????ℎ)=−Τ
3
6log

3
6−Τ
3
6log

3
6 =1.00
Normal: 2-Drug A , 6-Drug B&#3627408444;&#3627408475;&#3627408467;&#3627408476;
(&#3627408449;&#3627408476;&#3627408479;&#3627408474;??????&#3627408473;)=−Τ
2
8log

2
8−Τ
6
8log

6
8 ≈0.8113

Example
•Recap
•We choose Age being a rootnode.
Data Mining: Classification and Prediction 29
&#3627408444;&#3627408475;&#3627408467;&#3627408476;
&#3627408436;??????&#3627408466;&#3627408439;0.2467
&#3627408444;&#3627408475;&#3627408467;&#3627408476;
&#3627408442;&#3627408466;&#3627408475;&#3627408465;&#3627408466;&#3627408479;&#3627408439;0.1519
&#3627408444;&#3627408475;&#3627408467;&#3627408476;
&#3627408437;??????&#3627408439;0.0292
&#3627408444;&#3627408475;&#3627408467;&#3627408476;
&#3627408438;ℎ&#3627408476;&#3627408473;&#3627408466;&#3627408480;&#3627408481;&#3627408466;&#3627408479;&#3627408476;&#3627408473;&#3627408439;0.0481
&#3627408496;&#3627408527;&#3627408519;&#3627408528;
&#3627408488;&#3627408520;&#3627408518;&#3627408491;0.2467
&#3627408444;&#3627408475;&#3627408467;&#3627408476;
&#3627408442;&#3627408466;&#3627408475;&#3627408465;&#3627408466;&#3627408479;&#3627408439;0.1519
&#3627408444;&#3627408475;&#3627408467;&#3627408476;
&#3627408437;??????&#3627408439;0.0292
&#3627408444;&#3627408475;&#3627408467;&#3627408476;
&#3627408438;ℎ&#3627408476;&#3627408473;&#3627408466;&#3627408480;&#3627408481;&#3627408466;&#3627408479;&#3627408476;&#3627408473;&#3627408439;0.0481
Age
<=30 31-50 >50
Drug B? ?
Repeat the steps

Example
Data Mining: Classification and Prediction 30
Age
<=30 31-50 >50
Drug B? ?
Gender
Male Female
Drug B Drug A
Cholesterol
Male
Normal High
Drug B
Drug A

Decision Tree Induction
•What if the splitting attribute &#3627408436;is continuous-valued?
•The test at node N has two possible outcomes, corresponding to the conditions &#3627408436;≤
&#3627408480;&#3627408477;&#3627408470;&#3627408473;&#3627408481;_&#3627408477;&#3627408476;&#3627408470;&#3627408475;&#3627408481;and &#3627408436;>&#3627408480;&#3627408477;&#3627408470;&#3627408473;&#3627408481;_&#3627408477;&#3627408476;&#3627408470;&#3627408475;&#3627408481;, respectively
•where &#3627408532;&#3627408529;&#3627408522;&#3627408525;&#3627408533;_&#3627408529;&#3627408528;&#3627408522;&#3627408527;&#3627408533;is the split-point returned by Attribute selection method as part of the
splitting criterion.
•When A is discrete-valued and a binary tree must be produced
•The test at node N is of the form “&#3627408436;∈&#3627408454;
&#3627408436;”,
•where &#3627408454;
&#3627408436;is the splitting subset for &#3627408436;returned by Attribute selection method as part of the
splitting criterion.
Data Mining: Classification and Prediction 31

Decision Tree Induction
Data Mining: Classification and Prediction 32
1
2 3

Attribute Selection Measures
•Gain Ratio
•The information gain measure is biased toward tests with many outcomes.
•C4.5, a successor of ID3, uses an extension to information gain known as gain ratio
•Applies a kind of normalization to information gain using a “split information” value defined
analogously with Info(D) as
&#3627408454;&#3627408477;&#3627408473;&#3627408470;&#3627408481;&#3627408444;&#3627408475;&#3627408467;&#3627408476;
&#3627408436;=−෍
&#3627408471;=1
&#3627408483;
|&#3627408439;
&#3627408471;|
|&#3627408439;|
×log
2
|&#3627408439;
&#3627408471;|
|&#3627408439;|
•This represents the potential information generated by splitting the training data set, &#3627408439;, into &#3627408483;
partitions, corresponding to the &#3627408483;outcomes of a test on attribute &#3627408436;.
•The gain ratio is defined as
&#3627408442;&#3627408462;&#3627408470;&#3627408475;&#3627408453;&#3627408462;&#3627408481;&#3627408470;&#3627408476;&#3627408436;=
&#3627408442;&#3627408462;&#3627408470;&#3627408475;(&#3627408436;)
&#3627408454;&#3627408477;&#3627408473;&#3627408470;&#3627408481;&#3627408444;&#3627408475;&#3627408467;&#3627408476;(&#3627408436;)
•The attribute with the maximum gain ratio is selected as the splitting attribute.
Data Mining: Classification and Prediction 33

Attribute Selection Measures
•Gain Ratio
•Computation of gain ratio for the attribute weight.
•Attribute: Weight has three values as Heavy, Average and Light containing 5, 6 and 4 tuples
respectively.
&#3627408454;&#3627408477;&#3627408473;&#3627408470;&#3627408481;&#3627408444;&#3627408475;&#3627408467;&#3627408476;
&#3627408436;=−
5
15
×log
2
5
15

6
15
×log
2
6
15

4
15
×log
2
4
15
=1.5655
&#3627408442;&#3627408462;&#3627408470;&#3627408475;&#3627408458;&#3627408466;&#3627408470;&#3627408468;ℎ&#3627408481;=0.0622
&#3627408442;&#3627408462;&#3627408470;&#3627408475;&#3627408453;&#3627408462;&#3627408481;&#3627408470;&#3627408476;&#3627408458;&#3627408466;&#3627408470;&#3627408468;ℎ&#3627408481;=
0.0622
1.5655
=0.040
Data Mining: Classification and Prediction 34

Attribute Selection Measures
•Gini Index
•The Gini index is used in CART
•The Gini index measures the impurity of &#3627408439;, a data partition or set of training tuples, as
&#3627408442;&#3627408470;&#3627408475;&#3627408470;&#3627408439;=1−෍
&#3627408470;=1
&#3627408474;
&#3627408477;
&#3627408470;
2
•where &#3627408477;
&#3627408470;is the probability that a tuple in &#3627408439;belongs to class &#3627408438;
&#3627408470;and is estimated by ൗ
|&#3627408438;
??????,??????|
|&#3627408439;|
.
•The sum is computed over &#3627408474;classes.
•The Gini index considers a binary split for each attribute
•If a binary split on A partitions D into D1 and D2, the Gini index of D given that partitioning
is
&#3627408442;&#3627408470;&#3627408475;&#3627408470;
&#3627408436;&#3627408439;=
|&#3627408439;
1|
|&#3627408439;|
×&#3627408442;&#3627408470;&#3627408475;&#3627408470;(&#3627408439;
1)+
&#3627408439;
2
&#3627408439;
×&#3627408442;&#3627408470;&#3627408475;&#3627408470;(&#3627408439;
2)
Data Mining: Classification and Prediction 35

Attribute Selection Measures
•Gini Index
•For a discrete-valued attribute, the subset that gives the minimum giniindex for
that attribute is selected as its splitting subset.
•For continuous-valued attributes, each possible split-point must be considered.
•The reduction in impurity that would be incurred by a binary split on a discrete-
or continuous-valued attribute A is
Δ&#3627408442;&#3627408470;&#3627408475;&#3627408470;&#3627408436;=&#3627408442;&#3627408470;&#3627408475;&#3627408470;&#3627408439;−&#3627408442;&#3627408470;&#3627408475;&#3627408470;
&#3627408436;&#3627408439;
•The attribute that maximizes the reduction in impurity is selected as the splitting attribute.
Data Mining: Classification and Prediction 36

Data Mining: Classification and Prediction 37

Data Mining: Classification and Prediction 38

Bayesian Classification
•Bayesian classifiers are statistical classifiers
•Predicts class membership probabilities.
•based on Bayes’ theorem.
•Exhibits high accuracy and speed when applied to large databases.
•A simple Bayesian classifier is known as the naïve Bayesian classifier
•Assumes that the effect of an attribute value on a given class is independent of the values of
the other attributes: class conditional independence.
•Bayesian belief networks are graphical models, allow the representation of
dependencies among subsets of attributes
Data Mining: Classification and Prediction 39

Bayesian Classification
•Bayes’ Theorem
•Let ??????be a data tuple (X is considered “evidence”).
•Let &#3627408495;be some hypothesis, such as that the data tuple ??????belongs to a specified class &#3627408490;.
•Determine ??????&#3627408495;|??????, the probability that the hypothesis &#3627408495;holds given the “evidence” or
observed data tuple X.
•??????&#3627408495;|??????is the posteriorprobability of &#3627408495;conditioned on X.
•??????&#3627408495;is the priorprobability of &#3627408495;.
•????????????|&#3627408495;is the posteriorprobability of ??????conditioned on H.
•????????????is the priorprobability of ??????.
•“How are these probabilities estimated?”
Data Mining: Classification and Prediction 40
??????&#3627408495;|??????=
????????????|&#3627408495;??????&#3627408495;
????????????
…Bayes’ Theorem

Naïve Bayesian Classification
•A simple Bayesian classifier is known as the naïve Bayesian classifier
•Assumes that the effect of an attribute value on a given class is independent of
the values of the other attributes: class conditional independence.
•It is made to simplify the computations involved and, in this sense, is
considered “naïve.”
Data Mining: Classification and Prediction 41

Naïve Bayesian Classification
•Let &#3627408439;be a training set of tuples and their associated class labels.
•Suppose that there are m classes, &#3627408438;
1,&#3627408438;
2,…&#3627408438;
&#3627408474;.
•Given a tuple, &#3627408459;=(&#3627408485;
1,&#3627408485;
2,…,&#3627408485;
&#3627408475;)depicting &#3627408475;measurements made on the tuple from &#3627408475;
attributes, the classifier will predict that &#3627408459;belongs to the class having the highest
posterior probability, conditioned on &#3627408459;.
•The naïve Bayesian classifier predicts that tuple X belongs to the class &#3627408438;
&#3627408470;if and only if
??????&#3627408438;
&#3627408470;|&#3627408459;>??????&#3627408438;
&#3627408471;|&#3627408459;&#3627408467;&#3627408476;&#3627408479;1≤&#3627408471;≤&#3627408474;,&#3627408471;≠&#3627408470;
•We maximize ??????&#3627408438;
&#3627408470;|&#3627408459;
•The class &#3627408438;
&#3627408470;for which ??????&#3627408438;
&#3627408470;|&#3627408459;is maximized is called the maximum posteriori
hypothesis.
Data Mining: Classification and Prediction 42

Naïve Bayesian Classification
•By Bayes’ theorem
??????&#3627408490;
&#3627408522;|??????=
????????????|&#3627408490;
&#3627408522;??????&#3627408490;
&#3627408522;
????????????
•As ??????&#3627408459;is constant for all classes, only ??????&#3627408459;|&#3627408438;
&#3627408470;??????&#3627408438;
&#3627408470;need be maximized.
•The naïve Bayesian classifier predicts that tuple X belongs to the class Ci if and only if
??????&#3627408438;
&#3627408470;|&#3627408459;>??????&#3627408438;
&#3627408471;|&#3627408459;&#3627408467;&#3627408476;&#3627408479;1≤&#3627408471;≤&#3627408474;,&#3627408471;≠&#3627408470;
•The class &#3627408438;
&#3627408470;for which ??????&#3627408438;
&#3627408470;|&#3627408459;is maximized is called the maximum posteriori
hypothesis.
•The class prior probabilities may be estimated by
??????&#3627408438;
&#3627408470;=
&#3627408438;
&#3627408470;,&#3627408439;
&#3627408439;
……&#3627408484;ℎ&#3627408466;&#3627408479;&#3627408466;&#3627408438;
&#3627408470;,&#3627408439;&#3627408470;&#3627408480;&#3627408481;ℎ&#3627408466;&#3627408475;&#3627408482;&#3627408474;&#3627408463;&#3627408466;&#3627408479;&#3627408476;&#3627408467;&#3627408481;&#3627408479;&#3627408462;&#3627408470;&#3627408475;&#3627408470;&#3627408475;&#3627408468;&#3627408481;&#3627408482;&#3627408477;&#3627408473;&#3627408466;&#3627408480;&#3627408476;&#3627408467;&#3627408464;&#3627408473;&#3627408462;&#3627408480;&#3627408480;&#3627408438;
&#3627408470;&#3627408470;&#3627408475;&#3627408439;
Data Mining: Classification and Prediction 43

Naïve Bayesian Classification
•In order to reduce computation in evaluating ??????&#3627408459;|&#3627408438;
&#3627408470;, the naive assumption of class
conditional independence is made.
????????????|&#3627408490;
&#3627408522;=ෑ
&#3627408524;=&#3627409359;
&#3627408527;
??????&#3627408537;
&#3627408524;|&#3627408490;
&#3627408522;=??????&#3627408537;
&#3627409359;|&#3627408490;
&#3627408522;×??????&#3627408537;
&#3627409360;|&#3627408490;
&#3627408522;×⋯×??????&#3627408537;
&#3627408527;|&#3627408490;
&#3627408522;
•Bayesian classifiers have the minimum error rate in comparison to all other classifiers.
Data Mining: Classification and Prediction 44

Naïve Bayesian Classification: Example
RID age incomestudentcredit_ratingclass: Buys_computer
1 youth high no fair no
2 youth high no excellent no
3middle_agedhigh no fair yes
4 seniormedium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7middle_agedlow yes excellent yes
8 youth medium no fair no
9 youth low yes fair yes
10 seniormedium yes fair yes
11 youth medium yes excellent yes
12middle_agedmedium no excellent yes
13middle_agedhigh yes fair yes
14 seniormedium no excellent no
Data Mining: Classification and Prediction 45

Naïve Bayesian Classification: Example
Data Mining: Classification and Prediction 46
•Let &#3627408438;
1be the &#3627408464;&#3627408473;&#3627408462;&#3627408480;&#3627408480;:&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;r=&#3627408486;&#3627408466;&#3627408480;and &#3627408438;
2be the &#3627408464;&#3627408473;&#3627408462;&#3627408480;&#3627408480;:&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;r=&#3627408475;&#3627408476;
•The tuple we wish to classify is
&#3627408459;=(&#3627408462;&#3627408468;&#3627408466;=&#3627408486;&#3627408476;&#3627408482;&#3627408481;ℎ,&#3627408470;&#3627408475;&#3627408464;&#3627408476;&#3627408474;&#3627408466;=&#3627408474;&#3627408466;&#3627408465;&#3627408470;&#3627408482;&#3627408474;,&#3627408480;&#3627408481;&#3627408482;&#3627408465;&#3627408466;&#3627408475;&#3627408481;=&#3627408486;&#3627408466;&#3627408480;,&#3627408464;&#3627408479;&#3627408466;&#3627408465;&#3627408470;&#3627408481;&#3627408479;&#3627408462;&#3627408481;&#3627408470;&#3627408475;&#3627408468;=&#3627408467;&#3627408462;&#3627408470;&#3627408479;)
•We need to maximize ??????&#3627408459;|&#3627408438;
&#3627408470;??????&#3627408438;
&#3627408470;, for &#3627408470;=1,2
•Calculate ??????&#3627408438;
&#3627408470;, for &#3627408470;=1,2
•??????&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408486;&#3627408466;&#3627408480;=
•??????&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408475;&#3627408476;=
•Calculate ??????&#3627408459;|&#3627408438;
&#3627408470;, for &#3627408470;=1,2
9/14 = 0.643
5/14 = 0.357

Naïve Bayesian Classification: Example
RID age incomestudentcredit_ratingclass: Buys_computer
1 youth high no fair no
2 youth high no excellent no
3middle_agedhigh no fair yes
4 seniormedium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7middle_agedlow yes excellent yes
8 youth medium no fair no
9 youth low yes fair yes
10 seniormedium yes fair yes
11 youth medium yes excellent yes
12middle_agedmedium no excellent yes
13middle_agedhigh yes fair yes
14 seniormedium no excellent no
Data Mining: Classification and Prediction 47

Naïve Bayesian Classification: Example
Data Mining: Classification and Prediction 48
•Calculate ??????&#3627408459;|&#3627408438;
&#3627408470;, for &#3627408470;=1,2
•??????&#3627408485;
1|&#3627408438;
1=??????(&#3627408462;&#3627408468;&#3627408466;=&#3627408486;&#3627408476;&#3627408482;&#3627408481;ℎ|&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408486;&#3627408466;&#3627408480;)
•??????&#3627408485;
1|&#3627408438;
2=??????(&#3627408462;&#3627408468;&#3627408466;=&#3627408486;&#3627408476;&#3627408482;&#3627408481;ℎ|&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408475;&#3627408476;)
•??????&#3627408485;
2|&#3627408438;
1=??????(&#3627408470;&#3627408475;&#3627408464;&#3627408476;&#3627408474;&#3627408466;=&#3627408474;&#3627408466;&#3627408465;&#3627408470;&#3627408482;&#3627408474;|&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408486;&#3627408466;&#3627408480;)
•??????&#3627408485;
2|&#3627408438;
2=??????(&#3627408470;&#3627408475;&#3627408464;&#3627408476;&#3627408474;&#3627408466;=&#3627408474;&#3627408466;&#3627408465;&#3627408470;&#3627408482;&#3627408474;|&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408475;&#3627408476;)
•??????&#3627408485;
3|&#3627408438;
1=??????(&#3627408480;&#3627408481;&#3627408482;&#3627408465;&#3627408466;&#3627408475;&#3627408481;=&#3627408486;&#3627408466;&#3627408480;|&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408486;&#3627408466;&#3627408480;)
•??????&#3627408485;
3|&#3627408438;
2=??????(&#3627408480;&#3627408481;&#3627408482;&#3627408465;&#3627408466;&#3627408475;&#3627408481;=&#3627408486;&#3627408466;&#3627408480;|&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408475;&#3627408476;)
•??????&#3627408485;
4|&#3627408438;
1=??????(&#3627408464;&#3627408479;&#3627408466;&#3627408465;&#3627408470;&#3627408481;&#3627408479;&#3627408462;&#3627408481;&#3627408470;&#3627408475;&#3627408468;=&#3627408467;&#3627408462;&#3627408470;&#3627408479;|&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408486;&#3627408466;&#3627408480;)
•??????&#3627408485;
4|&#3627408438;
2=??????(&#3627408464;&#3627408479;&#3627408466;&#3627408465;&#3627408470;&#3627408481;&#3627408479;&#3627408462;&#3627408481;&#3627408470;&#3627408475;&#3627408468;=&#3627408467;&#3627408462;&#3627408470;&#3627408479;|&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408475;&#3627408476;)
= 2/9 = 0.222
= 3/5 = 0.600
= 4/9 = 0.444
= 2/5 = 0.400
= 6/9 = 0.667
= 1/5 = 0.200
= 6/9= 0.667
= 2/5 = 0.400
&#3627408459;=(&#3627408462;&#3627408468;&#3627408466;=&#3627408486;&#3627408476;&#3627408482;&#3627408481;ℎ,&#3627408470;&#3627408475;&#3627408464;&#3627408476;&#3627408474;&#3627408466;=&#3627408474;&#3627408466;&#3627408465;&#3627408470;&#3627408482;&#3627408474;,&#3627408480;&#3627408481;&#3627408482;&#3627408465;&#3627408466;&#3627408475;&#3627408481;=&#3627408486;&#3627408466;&#3627408480;,&#3627408464;&#3627408479;&#3627408466;&#3627408465;&#3627408470;&#3627408481;&#3627408479;&#3627408462;&#3627408481;&#3627408470;&#3627408475;&#3627408468;=&#3627408467;&#3627408462;&#3627408470;&#3627408479;)

Naïve Bayesian Classification: Example
Data Mining: Classification and Prediction 49
•Now we calculate from above probabilities
??????(&#3627408459;|&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408486;&#3627408466;&#3627408480;)=0.222×0.444×0.667×0.667
Similarly
??????(&#3627408459;|&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408475;&#3627408476;)=0.600×0.400×0.200×0.400
•To find the class, &#3627408438;
&#3627408470;, that maximizes ??????&#3627408459;|&#3627408438;
&#3627408470;??????&#3627408438;
&#3627408470;, we compute
??????(&#3627408459;|&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408486;&#3627408466;&#3627408480;)??????&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408486;&#3627408466;&#3627408480;=
??????(&#3627408459;|&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408475;&#3627408476;)??????&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408475;&#3627408476;=
•Therefore, the naïve Bayesian classifier predicts, &#3627408489;&#3627408534;&#3627408538;&#3627408532;_&#3627408516;&#3627408528;&#3627408526;&#3627408529;&#3627408534;&#3627408533;&#3627408518;&#3627408531;=&#3627408538;&#3627408518;&#3627408532;for
tuple &#3627408459;.
=0.044
=0.019
0.028
0.007

Naïve Bayesian Classification: Example
RID age incomestudentcredit_ratingclass: Buys_computer
1 youth high no fair no
2 youth high no excellent no
3middle_agedhigh no fair yes
4 seniormedium no fair yes
5 senior low yes fair yes
6 senior low yes excellent yes
7middle_agedlow yes excellent yes
8 youthmedium no fair no
9 youth low yes fair yes
10 seniormediumyes fair yes
11 youthmediumyes excellent yes
12middle_agedmedium no excellent yes
13middle_agedhigh yes fair yes
14 seniormedium no excellent no
Data Mining: Classification and Prediction 50
Calculate ??????&#3627408438;
&#3627408470;, for &#3627408470;=1,2
??????&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408486;&#3627408466;&#3627408480;=
10/14 = 0.714
??????&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408475;&#3627408476;=
4/14 = 0.286

Naïve Bayesian Classification: Example
Data Mining: Classification and Prediction 51
•Calculate ??????&#3627408459;|&#3627408438;
&#3627408470;, for &#3627408470;=1,2
•??????&#3627408485;
1|&#3627408438;
1=??????(&#3627408462;&#3627408468;&#3627408466;=&#3627408486;&#3627408476;&#3627408482;&#3627408481;ℎ|&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408486;&#3627408466;&#3627408480;)
•??????&#3627408485;
1|&#3627408438;
2=??????(&#3627408462;&#3627408468;&#3627408466;=&#3627408486;&#3627408476;&#3627408482;&#3627408481;ℎ|&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408475;&#3627408476;)
•??????&#3627408485;
2|&#3627408438;
1=??????(&#3627408470;&#3627408475;&#3627408464;&#3627408476;&#3627408474;&#3627408466;=&#3627408474;&#3627408466;&#3627408465;&#3627408470;&#3627408482;&#3627408474;|&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408486;&#3627408466;&#3627408480;)
•??????&#3627408485;
2|&#3627408438;
2=??????(&#3627408470;&#3627408475;&#3627408464;&#3627408476;&#3627408474;&#3627408466;=&#3627408474;&#3627408466;&#3627408465;&#3627408470;&#3627408482;&#3627408474;|&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408486;&#3627408466;&#3627408480;)
•??????&#3627408485;
3|&#3627408438;
1=??????(&#3627408480;&#3627408481;&#3627408482;&#3627408465;&#3627408466;&#3627408475;&#3627408481;=&#3627408486;&#3627408466;&#3627408480;|&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408486;&#3627408466;&#3627408480;)
•??????&#3627408485;
3|&#3627408438;
2=??????(&#3627408480;&#3627408481;&#3627408482;&#3627408465;&#3627408466;&#3627408475;&#3627408481;=&#3627408486;&#3627408466;&#3627408480;|&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408475;&#3627408476;)
•??????&#3627408485;
4|&#3627408438;
1=??????(&#3627408464;&#3627408479;&#3627408466;&#3627408465;&#3627408470;&#3627408481;&#3627408479;&#3627408462;&#3627408481;&#3627408470;&#3627408475;&#3627408468;=&#3627408467;&#3627408462;&#3627408470;&#3627408479;|&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408486;&#3627408466;&#3627408480;)
•??????&#3627408485;
4|&#3627408438;
2=??????(&#3627408464;&#3627408479;&#3627408466;&#3627408465;&#3627408470;&#3627408481;&#3627408479;&#3627408462;&#3627408481;&#3627408470;&#3627408475;&#3627408468;=&#3627408467;&#3627408462;&#3627408470;&#3627408479;|&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408475;&#3627408476;)
= 2/10 = 0.200
= 3/4 = 0.750
= 4/10 = 0.400
= 2/4 = 0.500
= 7/10= 0.700
= 0/5 = 0
= 6/10 = 0.600
= 2/4 = 0.500

Naïve Bayesian Classification: Example
Data Mining: Classification and Prediction 52
•Now we calculate from above probabilities
??????(&#3627408459;|&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408486;&#3627408466;&#3627408480;)=0.200×0.400×0.700×0.600
Similarly
??????(&#3627408459;|&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408475;&#3627408476;)=0.750×0.500×0×0.500
•To find the class, &#3627408438;
&#3627408470;, that maximizes ??????&#3627408459;|&#3627408438;
&#3627408470;??????&#3627408438;
&#3627408470;, we compute
??????(&#3627408459;|&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408486;&#3627408466;&#3627408480;)??????&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408486;&#3627408466;&#3627408480;=
??????(&#3627408459;|&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408475;&#3627408476;)??????&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408475;&#3627408476;=
•Therefore, the naïve Bayesian classifier predicts, &#3627408489;&#3627408534;&#3627408538;&#3627408532;_&#3627408516;&#3627408528;&#3627408526;&#3627408529;&#3627408534;&#3627408533;&#3627408518;&#3627408531;=&#3627408538;&#3627408518;&#3627408532;for
tuple &#3627408459;.
=0.034
=0
0.024
0
IS IT CORRECT CLASSIFICATION ??????????

Naïve Bayesian Classification
Data Mining: Classification and Prediction 53
•A zero probability cancels the effects of all of the other (posteriori) probabilities
(on &#3627408438;
&#3627408470;) involved in the product.
•To avoid the effect of zero probability value, Laplacian correction or Laplace
estimatoris used.
•We add one to each count.

Naïve Bayesian Classification
Data Mining: Classification and Prediction 54
•E.g. If we have a training database D having 1500 tuples.
•Out of which, 1000 tuples are of class Buys_computer= yes.
•For income attribute we have
•0 tuples for income = low,
•960 tuple for income = medium,
•40 tuples for income = high.
•Using the Laplacian correction for the three quantities, we pretend that we have 1 extra tuple for
each income-valuepair.
1
1003
=0.001,
961
1003
=0.958,
41
1003
=0.040
•The “corrected” probability estimates are close to their “uncorrected” counterparts, yet the zero
probability value is avoided.

Rule-Based Classification
•The learned model is represented as a set of IF-THEN rules.
•An IF-THEN rule is an expression of the form
&#3627408496;&#3627408493;&#3627408464;&#3627408476;&#3627408475;&#3627408465;&#3627408470;&#3627408481;&#3627408470;&#3627408476;&#3627408475;??????&#3627408495;&#3627408492;&#3627408501;&#3627408464;&#3627408476;&#3627408475;&#3627408464;&#3627408473;&#3627408482;&#3627408480;&#3627408470;&#3627408476;&#3627408475;
•Example: &#3627408453;1:&#3627408444;&#3627408441;&#3627408462;&#3627408468;&#3627408466;=&#3627408486;&#3627408476;&#3627408482;&#3627408481;ℎ&#3627408436;&#3627408449;&#3627408439;&#3627408480;&#3627408481;&#3627408482;&#3627408465;&#3627408466;&#3627408475;&#3627408481;=&#3627408486;&#3627408466;&#3627408480;&#3627408455;&#3627408443;&#3627408440;&#3627408449;&#3627408437;&#3627408482;&#3627408486;&#3627408480;
&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408486;&#3627408466;&#3627408480;
•R1 can also be written as
&#3627408453;1:(&#3627408462;&#3627408468;&#3627408466;=&#3627408486;&#3627408476;&#3627408482;&#3627408481;ℎ)∧(&#3627408480;&#3627408481;&#3627408482;&#3627408465;&#3627408466;&#3627408475;&#3627408481;=&#3627408486;&#3627408466;&#3627408480;)⇒(&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408486;&#3627408466;&#3627408480;)
Data Mining: Classification and Prediction 55
Rule antecedent
or precondition
Rule consequent
Attribute sets Class Prediction

Rule-Based Classification
•If the condition in a rule antecedent holds true for a given tuple, the rule antecedent is
satisfiedand that the rule coversthe tuple.
•Evaluation of Rule R:
&#3627408464;&#3627408476;&#3627408483;&#3627408466;&#3627408479;&#3627408462;&#3627408468;&#3627408466;&#3627408453;=
&#3627408475;
&#3627408464;&#3627408476;&#3627408483;&#3627408466;&#3627408479;&#3627408480;
|&#3627408439;|
&#3627408462;&#3627408464;&#3627408464;&#3627408482;&#3627408479;&#3627408462;&#3627408464;&#3627408486;&#3627408453;=
&#3627408475;
&#3627408464;&#3627408476;&#3627408479;&#3627408479;&#3627408466;&#3627408464;&#3627408481;
&#3627408475;
&#3627408464;&#3627408476;&#3627408483;&#3627408466;&#3627408479;&#3627408480;
•Let &#3627408475;
&#3627408464;&#3627408476;&#3627408483;&#3627408466;&#3627408479;&#3627408480;be the number of tuples covered by R
•&#3627408475;
&#3627408464;&#3627408476;&#3627408479;&#3627408479;&#3627408466;&#3627408464;&#3627408481;be the number of tuples correctly classified by R
•|&#3627408439;|be the number of tuples in D.
Data Mining: Classification and Prediction 56

Naïve Bayesian Classification: Example
RID age incomestudentcredit_ratingclass: Buys_computer
1 youth high no fair no
2 youth high no excellent no
8 youthmedium no fair no
9 youth low yes fair yes
11 youthmediumyes excellent yes
12middle_agedmedium no excellent yes
13middle_agedhigh yes fair yes
14 seniormedium no excellent no
Data Mining: Classification and Prediction 57

Rule-Based Classification
•If a rule is satisfiedby X, the rule is said to be triggered.
X=(age=youth,income=medium,student=yes,creditrating=fair)
•Xsatisfies the rule R1, which triggers the rule.
•If R1 is the only rule satisfied, then the rule firesby returning the class prediction for X.
•If more than one rule is triggered, we need a conflict resolution strategy.
•Size ordering: assigns the highest priority to the triggering rule that has the “toughest”
requirements
•Rule ordering: prioritizes the rules beforehand. The ordering may be class-based or rule-
based.
•Class-based ordering: the classes are sorted in order of decreasing “importance”
•Rule-based ordering, the rules are organized into one long priority list
Data Mining: Classification and Prediction 58

Rule-Based Classification
•Extracting rules from a decision tree
•One rule is created for each path from the root to a leaf node.
•Each splitting criterion along a given path is logically ANDed to form the rule
antecedent (“IF” part).
•The leaf node holds the class prediction, forming the rule consequent (“THEN”
part).
Data Mining: Classification and Prediction 59

Rule-Based Classification
•Extracting rules from a decision tree
Data Mining: Classification and Prediction 60
age?
student? credit_rating?yes
middle_aged
youth
senior
no yes no yes
no yes fair excellent

Rule-Based Classification
•Extracted rules from a decision tree are
&#3627408453;1:&#3627408444;&#3627408441;&#3627408462;&#3627408468;&#3627408466;=&#3627408454;&#3627408466;&#3627408475;&#3627408470;&#3627408476;&#3627408479;&#3627408436;&#3627408449;&#3627408439;&#3627408464;&#3627408479;&#3627408466;&#3627408465;&#3627408470;&#3627408481;_&#3627408479;&#3627408462;&#3627408481;&#3627408470;&#3627408475;&#3627408468;=&#3627408466;&#3627408485;&#3627408464;&#3627408466;&#3627408473;&#3627408473;&#3627408466;&#3627408475;&#3627408481;&#3627408455;&#3627408443;&#3627408440;&#3627408449;&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408486;&#3627408466;&#3627408480;
&#3627408453;2:&#3627408444;&#3627408441;&#3627408462;&#3627408468;&#3627408466;=&#3627408454;&#3627408466;&#3627408475;&#3627408470;&#3627408476;&#3627408479;&#3627408436;&#3627408449;&#3627408439;&#3627408464;&#3627408479;&#3627408466;&#3627408465;&#3627408470;&#3627408481;_&#3627408479;&#3627408462;&#3627408481;&#3627408470;&#3627408475;&#3627408468;=&#3627408467;&#3627408462;&#3627408470;&#3627408479;&#3627408455;&#3627408443;&#3627408440;&#3627408449;&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=no
&#3627408453;3:&#3627408444;&#3627408441;&#3627408462;&#3627408468;&#3627408466;=&#3627408474;&#3627408470;&#3627408465;&#3627408465;&#3627408473;&#3627408466;_&#3627408462;&#3627408468;&#3627408466;&#3627408465;&#3627408455;&#3627408443;&#3627408440;&#3627408449;&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408486;&#3627408466;&#3627408480;
&#3627408453;4:&#3627408444;&#3627408441;&#3627408462;&#3627408468;&#3627408466;=&#3627408486;&#3627408476;&#3627408482;&#3627408481;ℎ&#3627408436;&#3627408449;&#3627408439;student=yes &#3627408455;&#3627408443;&#3627408440;&#3627408449;&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=&#3627408486;&#3627408466;&#3627408480;
&#3627408453;5:&#3627408444;&#3627408441;&#3627408462;&#3627408468;&#3627408466;=&#3627408486;&#3627408476;&#3627408482;&#3627408481;ℎ&#3627408436;&#3627408449;&#3627408439;student=&#3627408475;&#3627408476;&#3627408455;&#3627408443;&#3627408440;&#3627408449;&#3627408437;&#3627408482;&#3627408486;&#3627408480;_&#3627408464;&#3627408476;&#3627408474;&#3627408477;&#3627408482;&#3627408481;&#3627408466;&#3627408479;=no
Data Mining: Classification and Prediction 61

Data Mining: Classification and Prediction 62
Extract the classification rule from given decision tree

Data Mining: Classification and Prediction 63
X=(Color=Yellow, Type = SUV, Origin = Imported)

Prediction
•Numeric prediction is the task of predicting continuous (or ordered) values for
given input.
•Widely used approach for numeric prediction is regression.
•Regression is used to model the relationship between one or more independent
or predictorvariables and a dependentor responsevariable.
•The predictor variables are the attributes of interest describing the tuple.
•The response variable is what we want to predict.
Data Mining: Classification and Prediction 64
Predictor Variables Response Variable
&#3627408459;={age=youth,"income=medium,student=yes,creditrating=fair",Buys_computer=?}

Prediction: Linear Regression
•Straight-line regression analysis involves a response variable,&#3627408486;, and a single
predictor variable, &#3627408485;.
•Simplest regression technique which models &#3627408486;as a linear function of &#3627408485;.
&#3627408486;=&#3627408463;+&#3627408484;&#3627408485;
•&#3627408463;and &#3627408484;are regression coefficients specifying the Y-intercept and slope of the line.
•Coefficients can also be thought as weights
&#3627408486;=&#3627408484;
0+&#3627408484;
1&#3627408485;
•These coefficients can be solved for by the method of least squares, which estimates
the best-fitting straight line as the one that minimizes the error between the actual
data and the estimate of the line.
Data Mining: Classification and Prediction 65

Prediction: Linear Regression
•The regression coefficients can be estimated
&#3627408484;
1=
σ
&#3627408470;=1
|&#3627408439;|
&#3627408485;
&#3627408470;−ҧ&#3627408485;&#3627408486;
&#3627408470;−ത&#3627408486;
σ
&#3627408470;=1
|&#3627408439;|
&#3627408485;
&#3627408470;−ҧ&#3627408485;
2
&#3627408484;
0=ത&#3627408486;−&#3627408484;
1ҧ&#3627408485;
Data Mining: Classification and Prediction 66

Prediction: Linear Regression
Age
(x)
Avg. amount spent on
medical expenses
(per month in Rs.) (y)
15 100
20 135
25 135
37 150
40 250
45 270
48 290
50 360
55 375
61 400
64 500
67 1000
70 1500
Data Mining: Classification and Prediction 67
ҧ&#3627408485;=45.92
ത&#3627408486;=412.69
The regression coefficients are
&#3627408484;
1=16.89
&#3627408484;
0=−355.32
The equation of the least square (best fitting) line is
&#3627408486;=−355.32+16.89&#3627408485;

Prediction: Linear Regression
Age
(x)
Avg. amount spent on
medical expenses
(per month in Rs.) (y)
15 100
20 135
25 135
37 150
40 250
45 270
48 290
50 360
55 375
61 400
64 500
67 1000
70 1500
Data Mining: Classification and Prediction 68
y = 16.891x -355.32
-200
0
200
400
600
800
1000
1200
1400
1600
0 10 20 30 40 50 60 70 80
Avg. amount spent on medical expenses (per month in Rs.) (y)

Classifier Accuracy Measures
Data Mining: Classification and Prediction 69
•Confusion Matrix:
•Given &#3627408526;classes, a confusion matrix is a table of at least size &#3627408526;by &#3627408526;
•where an entry is row &#3627408522;and column &#3627408523;shows the number of tuples of class &#3627408522;that were
labeled by the classifier as class &#3627408523;.

Class –Low Class –Medium Class -High
Class –Low 250 10 0
Class –Medium 10 440 10
Class -High 0 10 270
Data Mining: Classification and Prediction 70
1000 tuples

Classifier Accuracy Measures
Data Mining: Classification and Prediction 71
•Classifier Accuracy
•The percentage of test set tuples that are correctly classified by the classifier.
•Also referred to as the overall recognition rate of the classifier.
•Error Measure
•An error rate or misclassification rate of a classifier M, which is simply
1−&#3627408436;&#3627408464;&#3627408464;&#3627408448;
where &#3627408436;&#3627408464;&#3627408464;(&#3627408448;)is the accuracy of M.

Classifier Accuracy Measures
Data Mining: Classification and Prediction 72
•Confusion Matrix: Given 2 classes
•Positive tuples:
•tuples of the main class of interest
•Negative tuples:
•True Positive:
•The positive tuples that were correctly labeled by the classifier
•True negatives
•The negative tuples that were correctly labeled by the classifier
•False positives
•The negative tuples that were incorrectly labeled
•False negatives
•The positive tuples that were incorrectly labeled

Classifier Accuracy Measures
Data Mining: Classification and Prediction 73
•We would like to be able to access how well the classifier can recognize the
positive tuples and how well it can recognize the negative tuples.
•Sensitivity(true positive (recognition) rate)
•The proportion of positive tuples that are correctly identified.
&#3627408480;&#3627408466;&#3627408475;&#3627408480;&#3627408470;&#3627408481;&#3627408470;&#3627408483;&#3627408470;&#3627408481;&#3627408486;=
&#3627408481;_&#3627408477;&#3627408476;&#3627408480;
&#3627408477;&#3627408476;&#3627408480;
•Specificity(true negative rate)
•The proportion of negative tuples that are correctly identified.
&#3627408480;&#3627408477;&#3627408466;&#3627408464;&#3627408470;&#3627408467;&#3627408470;&#3627408464;&#3627408470;&#3627408481;&#3627408486;=
&#3627408481;_&#3627408475;&#3627408466;&#3627408468;
&#3627408475;&#3627408466;&#3627408468;
•Precision
??????&#3627408479;&#3627408466;&#3627408464;&#3627408470;&#3627408480;&#3627408470;&#3627408476;&#3627408475;=
&#3627408481;_&#3627408477;&#3627408476;&#3627408480;
(&#3627408481;_&#3627408477;&#3627408476;&#3627408480;+&#3627408467;_&#3627408477;&#3627408476;&#3627408480;)

Classifier Accuracy Measures
Data Mining: Classification and Prediction 74
•It can be shown that accuracy is a function of sensitivity and specificity.
&#3627408462;&#3627408464;&#3627408464;&#3627408482;&#3627408479;&#3627408462;&#3627408464;&#3627408486;=&#3627408480;&#3627408466;&#3627408475;&#3627408480;&#3627408470;&#3627408481;&#3627408470;&#3627408483;&#3627408470;&#3627408481;&#3627408486;×
&#3627408477;&#3627408476;&#3627408480;
&#3627408477;&#3627408476;&#3627408480;+&#3627408475;&#3627408466;&#3627408468;
+&#3627408480;&#3627408477;&#3627408466;&#3627408464;&#3627408470;&#3627408467;&#3627408470;&#3627408464;&#3627408470;&#3627408481;&#3627408486;×
&#3627408475;&#3627408466;&#3627408468;
&#3627408477;&#3627408476;&#3627408480;+&#3627408475;&#3627408466;&#3627408468;
&#3627408436;&#3627408464;&#3627408464;&#3627408482;&#3627408479;&#3627408462;&#3627408464;&#3627408486;=
&#3627408481;
&#3627408477;&#3627408476;&#3627408480;+&#3627408481;
&#3627408475;&#3627408466;??????
&#3627408455;&#3627408476;&#3627408481;&#3627408462;&#3627408473;&#3627408475;&#3627408476;.&#3627408476;&#3627408467;&#3627408481;&#3627408482;&#3627408477;&#3627408473;&#3627408466;&#3627408480;

Predictor Accuracy Measures
Data Mining: Classification and Prediction 75
•Instead of focusing on whether the predicted value &#3627408486;′
&#3627408470;is an “exact” match with actual
value &#3627408486;
&#3627408470;, we check how far off the predicted value is from the actual known value.
•Loss functions measures the error between the actual value &#3627408486;
&#3627408470;and the predicted value
&#3627408486;′
&#3627408470;.
&#3627408488;&#3627408515;&#3627408532;&#3627408528;&#3627408525;&#3627408534;&#3627408533;&#3627408518;&#3627408518;&#3627408531;&#3627408531;&#3627408528;&#3627408531;:&#3627408538;
&#3627408522;−&#3627408538;′
&#3627408522;
&#3627408532;&#3627408530;&#3627408534;&#3627408514;&#3627408531;&#3627408518;&#3627408517;&#3627408518;&#3627408531;&#3627408531;&#3627408528;&#3627408531;:(&#3627408538;
&#3627408522;−&#3627408538;′
&#3627408522;)
&#3627409360;
•The test error (rate), or generalization error, is the average loss over the test set.
&#3627408500;&#3627408518;&#3627408514;&#3627408527;&#3627408514;&#3627408515;&#3627408532;&#3627408528;&#3627408525;&#3627408534;&#3627408533;&#3627408518;&#3627408518;&#3627408531;&#3627408531;&#3627408528;&#3627408531;=
σ
&#3627408522;=&#3627409359;
&#3627408517;
&#3627408538;
&#3627408522;−&#3627408538;′
&#3627408522;
&#3627408517;
&#3627408500;&#3627408518;&#3627408514;&#3627408527;&#3627408532;&#3627408530;&#3627408534;&#3627408514;&#3627408531;&#3627408518;&#3627408517;&#3627408518;&#3627408531;&#3627408531;&#3627408528;&#3627408531;=
σ
&#3627408522;=&#3627409359;
&#3627408517;
(&#3627408538;
&#3627408522;−&#3627408538;′
&#3627408522;)
&#3627409360;
&#3627408517;
•If we were to take the square root of the mean squared error, the resulting error measure
is called the root mean squared error.

Predictor Accuracy Measures
Data Mining: Classification and Prediction 76
•Relative measures of error include
??????&#3627408518;&#3627408525;&#3627408514;&#3627408533;&#3627408522;&#3627408535;&#3627408518;&#3627408514;&#3627408515;&#3627408532;&#3627408528;&#3627408525;&#3627408534;&#3627408533;&#3627408518;&#3627408518;&#3627408531;&#3627408531;&#3627408528;&#3627408531;=
σ
&#3627408522;=&#3627409359;
&#3627408517;
&#3627408538;
&#3627408522;−&#3627408538;′
&#3627408522;
σ
&#3627408522;=&#3627409359;
&#3627408517;
&#3627408538;
&#3627408522;−ഥ&#3627408538;
??????&#3627408518;&#3627408525;&#3627408514;&#3627408533;&#3627408522;&#3627408535;&#3627408518;&#3627408532;&#3627408530;&#3627408534;&#3627408514;&#3627408531;&#3627408518;&#3627408517;&#3627408518;&#3627408531;&#3627408531;&#3627408528;&#3627408531;=
σ
&#3627408522;=&#3627409359;
&#3627408517;
(&#3627408538;
&#3627408522;−&#3627408538;′
&#3627408522;)
&#3627409360;
σ
&#3627408522;=&#3627409359;
&#3627408517;
(&#3627408538;
&#3627408522;−ഥ&#3627408538;)
&#3627409360;
•We can take the root of the relative squared error to obtain the root relative squared
error so that the resulting error is of the same magnitude as the quantity predicted.

Accuracy Measures
Data Mining: Classification and Prediction 77
•Evaluating the Accuracy of a Classifier or Predictor
Holdout
Random Subsampling
Cross Validation
Bootstrap

Accuracy Measures
Data Mining: Classification and Prediction 78
•Holdout
•The given data are randomly partitioned into two independent sets, a training set and a
test set.
•Two-thirds of the data are allocated to the training set, and the remaining one-third is
allocated to the test set.
Data
Training
set
Test set
Derive model
Estimate
Accuracy

Accuracy Measures
Data Mining: Classification and Prediction 79
•Random Subsampling
•A variation of the holdout method in which the holdout method is repeated &#3627408524;times.
•The overall accuracy estimate is taken as the average of the accuracies obtained from
each iteration
Data
Training
set 1
Test set
1
Derive model
Estimate
Accuracy
Data
Training
set 2
Test set
2
Derive model
Estimate
Accuracy
Data
Training
set k
Test set
k
Derive model
Estimate
Accuracy
Iteration 1
Iteration 2 Iteration k. . .

Accuracy Measures
Data Mining: Classification and Prediction 80
•Cross Validation
Data
&#3627408439;
2 &#3627408439;
3&#3627408439;
1 &#3627408439;
&#3627408472;
&#3627408439;
2
&#3627408439;
3
&#3627408439;
1
&#3627408439;
&#3627408472;

&#3627408439;
2
&#3627408439;
3
&#3627408439;
1
&#3627408439;
&#3627408472;

&#3627408439;
2
&#3627408439;
3
&#3627408439;
1
&#3627408439;
&#3627408472;

Iteration 1Iteration 2Iteration3 Iteration &#3627408472;
&#3627408472;mutually exclusive folds
i.e.&#3627408472;data partitions
. . .
&#3627408439;
2
&#3627408439;
3
&#3627408439;
1
&#3627408439;
&#3627408472;

Training Set Test Set

Accuracy Measures
Data Mining: Classification and Prediction 81
•Cross Validation
•Each sample is used the same number of times for training and once for testing.
•For Classification, the accuracy estimate is the overall number of correct classifications
from the k iterations, divided by the total number of tuples in the initial data.
•For Prediction, the error estimate can be computed as the total loss from the k iterations,
divided by the total number of initial tuples.
•Leave-one-out
•&#3627408472;is set to the number of initial tuples. So, only one sample is “left out” at a time for the test set.
•Stratified cross-validation
•The folds are stratified so that the class distribution of the tuples in each fold is approximately
the same as that in the initial data

Accuracy Measures
Data Mining: Classification and Prediction 82
•Bootstrap
•The bootstrap method samples the given training tuples uniformly with replacement.
•i.e. each time a tuple is selected, it is equally likely to be selected again and readded to the
training set.
•.632 Bootstrap
•On an average, 63.2% of the original data tuples will end up in the bootstrap, and the remaining
36.8% will form the test set
•Each tuple has a probability of Τ
1
&#3627408465;of being selected, so the probability of not being chosen is (1−
Τ
1
&#3627408465;).
•We have to select &#3627408465;times, so the probability that a tuple will not be chosen during this whole time is
(1−Τ
1
&#3627408465;)
&#3627408465;
.
•If &#3627408465;is large, the probability approaches e
−1
=0.368.
•Thus, 36.8% of tuples will not be selected for training and thereby end up in the test set, and the
remaining 63.2% will form the training set.

Accuracy Measures
Data Mining: Classification and Prediction 83
•Bootstrap
•.632 Bootstrap
•Repeat the sampling procedure &#3627408472;times, where in each iteration, we use the current test set to
obtain an accuracy estimate of the model obtained from the current bootstrap sample.
•The overall accuracy of the model is
&#3627408436;&#3627408464;&#3627408464;&#3627408448;=෍
&#3627408470;=1
&#3627408472;
0.632×&#3627408436;&#3627408464;&#3627408464;(&#3627408448;
&#3627408470;)
&#3627408481;&#3627408466;&#3627408480;&#3627408481;_&#3627408480;&#3627408466;&#3627408481;+0.368×&#3627408436;&#3627408464;&#3627408464;(&#3627408448;
&#3627408470;)
&#3627408481;&#3627408479;??????&#3627408470;&#3627408475;_&#3627408480;&#3627408466;&#3627408481;