Tree-Based Model

Tree-Based Model

Introduction

Tree-based machine learning methods are among the most commonly used supervised learning methods. They are constructed by two entities; branches and nodes. Tree-based ML methods are built by recursively splitting a training sample, using different features from a dataset at each node that splits the data most effectively. The splitting is based on learning simple decision rules inferred from the training data.

Common Terminology

notion image
i) Root node — this represents the entire population or the sample, which gets divided into two or more homogenous subsets.
ii) Splitting — subdividing a node into two or more sub-nodes.
iii) Decision node — this is when a sub-node is divided into further sub-nodes.
iv) Leaf/Terminal node — this is the final/last node that we consider for our model output. It cannot be split further.
v) Pruning — removing unnecessary sub-nodes of a decision node to combat overfitting.
vi) Branch/Sub-tree — the sub-section of the entire tree.
vii) Parent and Child node — a node that’s subdivided into a sub-node is a parent, while the sub-node is the child node.
 

Types of tree based model

Tree-based machine learning models are a category of algorithms that make decisions by recursively partitioning the input space into regions. Some common types of tree-based models include:
  1. Decision Trees:
      • Overview: Decision trees are a fundamental type of tree-based model that makes decisions based on a series of if-else conditions. Each internal node represents a decision based on a feature, and each leaf node represents the output.
      • Applications: Decision trees are versatile and can be used for both classification and regression tasks.
  1. Random Forest:
      • Overview: Random Forest is an ensemble learning method that constructs a multitude of decision trees during training and outputs the average prediction (for regression tasks) or the majority vote (for classification tasks) of the individual trees.
      • Applications: Random Forest is effective in reducing overfitting and improving accuracy.
  1. Gradient Boosting Machines (GBM):
      • Overview: GBM is another ensemble method that builds trees sequentially, with each tree compensating for the errors of the previous ones. It combines weak learners to create a strong predictive model.
      • Applications: GBM is widely used for both regression and classification tasks and is known for its high predictive power.
  1. XGBoost (Extreme Gradient Boosting):
      • Overview: XGBoost is an optimized and efficient implementation of gradient boosting. It incorporates regularization techniques, parallel processing, and tree pruning to enhance performance.
      • Applications: XGBoost is commonly used in various machine learning competitions and real-world applications due to its speed and accuracy.
  1. LightGBM:
      • Overview: LightGBM is a gradient boosting framework that uses a tree-based learning algorithm. It is designed for distributed and efficient training and can handle large datasets.
      • Applications: LightGBM is suitable for large-scale machine learning tasks and is particularly efficient in scenarios with high dimensionality.
 
Decision tree+Random forest
Decision tree+Random forest
Ensemble Method
Ensemble Method