Decision Tree machine learning classification .pptx
asmaashalma456
21 views
30 slides
Sep 21, 2024
Slide 1 of 30
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
About This Presentation
Machine learning
Size: 1.29 MB
Language: en
Added: Sep 21, 2024
Slides: 30 pages
Slide Content
Decision Tree
Agenda Introduction to classification. Introduction to decision tree. Design issues. Refrences
Introduction To Classification
Introduction To Classification Classification is the task of assigning objects to one of several predefines class . The set of records available for developing classification methods is divided into two subsets __ a training set and a test set . Training set used to build the model and test set used to validate it.
Introduction To Classification Training Phase
Introduction To Classification Classification Phase
Introduction To Classification Evaluation of classification models Counts of test records that are correctly (or incorrectly) predicted by the classification model. Confusion matrix Class = 1 Class = 0 Class = 1 f 11 f 10 Class = 0 f 01 f 00 Predicted Class Actual Class
Decision Tree
Introduction To Decision Tree A decision tree is a flowchart-like tree structure, where each internal node (non leaf node) denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (or terminal node ) holds a class label. Construction of decision tree:- Top-Down strategy.
Introduction To Decision Tree
Example of decision tree
Example of decision tree
Example of decision tree
Example of decision tree
Example of decision tree
Design issues How should the training records be split? How should the splitting procedure stop?
a. Binary Attribute : generate two possible outcome (Binary Split). b. Nominal Attribute : Multiway Split. Methods for expressing Attribute test condition
b. Nominal Attribute : Binary Split (e.g., CART). Methods for expressing Attribute test condition
c. Ordinal Attribute : Multiway Split. Methods for expressing Attribute test condition
Methods for expressing Attribute test condition c. Ordinal Attribute : Binary Split.
Methods for expressing Attribute test condition d. Continuous Attribute : Multiway Split. d. Continuous Attribute : Binary Split.
Classify the following attributes: Time in terms of AM or PM. binary, ordinal Angles as measured in degree between 0 and 360 continuous Bronz , Silver and Gold medal as awarded at the Olymbic . Discrete , ordinal Number of patient in hospital. Discrete
Attribute Selection Measures The attribute selection measure provides a ranking for each attribute describing the given training tuples. The attribute having the best score for the measure is chosen as the splitting attribute for the given tuples. An examples of attribute selection measures are information gain, gain ratio, and Gini index
Attribute Selection Measures (Information Gain) ID3 uses information gain as its attribute selection measure. The attribute with the highest information gain is chosen as the splitting attribute. This attribute minimizes the information needed to classify the tuples in the resulting partitions and reflects the least randomness or “impurity” in these partitions.
Attribute Selection Measures (Information Gain) The expected information ( number of bits ) needed to classify a tuple in D ( Entropy ) is given by For attribute A Final Information Gain
Examples:
Splitting binary attributes (using Information Gain) Example Class 6 (-) C0 4 (+) C1 Info(D)=0.970 Info(D) = =0.97 Suppose there are two ways (A and B) to split the data into smaller subset.
T (7) 4 + 3 - F(3) + 3 - A Splitting binary attributes (using Information Gain) Gain(A) = Info(D) - = - Gain(A) = 0.97 – 0.593 = 0.377
References Jerzy W. GRZYMALA-BUSSE, “Selected Algorithms of Machine Learning from Examples”, Fundamenta Informaticae 18 (1993), 193–207. Thair Nu Phyu , “Survey of Classification Techniques in Data Mining”, Proceedings of the International MultiConference of Engineers and Computer Scientists Vol. I IMECS 2009,18th–20th March, 2009, Hong Kong. J. Han and Kamber M., Data Mining : Concepts and Techniques, Third Edition, Morgan Kaufmann, 2011.