What is Decision Tree? A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements. Decision trees are a popular machine learning algorithm that can be used for both regression and classification tasks. They are easy to understand, interpret, and implement.
A decision tree is a non-parametric supervised learning algorithm for classification and regression tasks . It has a hierarchical tree structure consisting of a root node, branches, internal nodes, and leaf nodes. Decision trees are used for classification and regression tasks, providing easy-to-understand models. A decision tree is a hierarchical model used in decision support that depicts decisions and their potential outcomes, incorporating chance events, resource expenses, and utility. This algorithmic model utilizes conditional control statements and is non-parametric, supervised learning, useful for both classification and regression tasks. The tree structure is comprises of a root node, branches, internal nodes, and leaf nodes, forming a hierarchical, tree-like structure.
It is a tool that has applications spanning several different areas. Decision trees can be used for classification as well as regression problems. It uses a flowchart like a tree structure to show the predictions that result from a series of feature-based splits. It starts with a root node and ends with a decision made by leaves.
Decision Tree Terminologies Root Node : The initial node at the beginning of a decision tree, where the entire population or dataset starts dividing based on various features or conditions. Decision Nodes : Nodes resulting from the splitting of root nodes are known as decision nodes. These nodes represent intermediate decisions or conditions within the tree. Leaf Nodes : Nodes where further splitting is not possible, often indicating the final classification or outcome. Leaf nodes are also referred to as terminal nodes. Sub-Tree : Similar to a subsection of a graph being called a sub-graph, a sub-section of a decision tree is referred to as a sub-tree. It represents a specific portion of the decision tree.
Pruning : The process of removing or cutting down specific nodes in a decision tree to prevent overfitting and simplify the model. Branch / Sub-Tree : A subsection of the entire decision tree is referred to as a branch or sub-tree. It represents a specific path of decisions and outcomes within the tree. Parent and Child Node : In a decision tree, a node that is divided into sub-nodes is known as a parent node, and the sub-nodes emerging from it are referred to as child nodes. The parent node represents a decision or condition, while the child nodes represent the potential outcomes or further decisions based on that condition.
Let’s consider the following classification example data: Using this example we will predict whether a person is going to be an astronaut, depending on their age, whether they like dogs, and whether they like gravity. We can follow the paths to come to a decision. For example, we can see that a person who doesn’t like gravity is not going to be an astronaut, independent of the other features. On the other side, we can also see, that a person who likes gravity and likes dogs is going to be an astronaut independent of the age.
Before discussing how to construct a decision tree, let’s have a look at the resulting decision tree for our example data.
Root Node: The top-level node. The first decision that is taken. In our example the root node is ‘likes gravity’. Branches: Branches represent sub-trees. Our example has two branches. One branch is, e.g. the sub-tree from ‘likes dogs’ and the second one from ‘age < 40.5’ on. Node: A node represents a split into further (child) nodes. In our example the nodes are ‘likes gravity’, ‘likes dogs’ and ‘age < 40.5’. Leaf: Leafs are at the end of the branches, i.e. they don’t split any more. They represent possible outcomes for each action. In our example the leafs are represented by ‘yes’ and ‘no’. Parent Node: A node which precedes a (child) node is called a parent node. In our example ‘likes gravity’ is a parent node of ‘likes dogs’ and ‘likes dogs’ is a parent node of ‘age < 40.5’.
Child Node: A node under another node is a child node. In our example ‘likes dogs’ is a child node of ‘likes gravity’ and ‘age < 40.5’ is a child node of ‘likes dogs’. Splitting: The process of dividing a node into two (child) nodes. Pruning: Removing the (child) nodes of a parent node is called pruning. A tree is grown through splitting and shrunk through pruning. In our example, if we would remove the node ‘age < 40.5’ we would prune the tree. We can also observe, that a decision tree allows us to mix data types. We can use numerical data (‘age’) and categorical data (‘likes dogs’, ‘likes gravity’) in the same tree.
We can also observe, that a decision tree allows us to mix data types. We can use numerical data (‘age’) and categorical data (‘likes dogs’, ‘likes gravity’) in the same tree.
Advantages and Disadvantages of Decision Trees Pros Decision trees are intuitive, easy to understand and interpret. Decision trees are not effected by outliers and missing values. The data doesn’t need to be scaled. Numerical and categorical data can be combined. Decision trees are non-parametric algorithms. Cons Overfitting is a common problem. Pruning may help to overcome this. Although decision trees can be used for regression problems, they cannot really predict continuous variables as the predictions must be separated in categories. Training a decision tree is relatively expensive.
Till now we have discussed how to construct a decision tree for a classification problem and how it can be used to make predictions. A crucial step in creating a decision tree is to find the best split of the data into two subsets. A common way to do this is the Gini Impurity . This is also used in the scikit -learn library from Python, which is often used in practice to build a Decision Tree. It’s important to keep in mind the limitations of decision trees, of which the most prominent one is the tendency to overfit .
Create a Decision Tree The most important step in creating a decision tree, is the splitting of the data. We need to find a way to split the data set ( D ) into two data sets ( D_1 ) and ( D_2 ). There are different criteria that can be used in order to find the next split. We will concentrate on one of them: the Gini Impurity , which is a criterion for categorical target variables and also the criterion used by the Python library scikit -learn.
Gini Impurity The Gini Impurity for a data set D is calculated as follows: with n = n_1 + n_2 the size of the data set (D) and with D_1 and D_2 subsets of D, 𝑝_𝑗 the probability of samples belonging to class 𝑗 at a given node, and 𝑐 the number of classes. The lower the Gini Impurity, higher is the homogeneity of the node. The Gini Impurity of a pure node is zero.
To split a decision tree using Gini Impurity, the following steps need to be performed. For each possible split, calculate the Gini Impurity of each child node Calculate the Gini Impurity of each split as the weighted average Gini Impurity of child nodes Select the split with the lowest value of Gini Impurity Repeat steps 1–3 until no further split is possible.
Example: Decision Tree with two binary features Before creating the decision tree for our entire dataset, we will first consider a subset, that only considers two features: ‘likes gravity’ and ‘likes dogs’. The first thing we have to decide is, which feature is going to be the root node . We do that by predicting the target with only one of the features and then use the feature, that has the lowest Gini Impurity as the root node. That is, in our case we build two shallow trees, with just the root node and two leafs . In the first case we use ‘likes gravity’ as a root node and in the second case ‘likes dogs’. We then calculate the Gini Impurity for both.
The trees look like this:
The Gini Impurity for these trees is calculated as follows: Case 1: Dataset 1: Dataset 2: The Gini Impurity is the weighted mean of both:
Case 2: Dataset 1: Dataset 2: The Gini Impurity is the weighted mean of both: That is, the first case has lower Gini Impurity and is the chosen split. In this simple example, only one feature remains, and we can build the final decision tree.