top of page

LightGBM Explained: Objective Function, Split Finding, and Leaf-Wise Growth

  • Writer: Aryan
    Aryan
  • Sep 18
  • 4 min read

INTRODUCTION

 

XGBoost, while powerful, comes with certain limitations such as computational complexity, memory inefficiency, and the need for categorical data encoding before training. Just like many traditional algorithms, XGBoost requires categorical features to be converted into numerical representations. Moreover, it is not very memory efficient — on very large datasets, it consumes a lot of memory and can become relatively slow.

To address these challenges, newer gradient boosting frameworks were developed. One such framework is LightGBM, introduced by Microsoft’s DMTK team in 2016 (with wider adoption around 2017). LightGBM was specifically designed to improve the efficiency of gradient boosting by being faster, more memory-efficient, and better at handling categorical data.

LightGBM (Light Gradient Boosting Machine) is an open-source, high-performance gradient boosting framework optimized for both speed and accuracy.


Core Features of LightGBM:

  • High speed and memory efficiency

  • Strong predictive accuracy

  • Native support for categorical data (no manual encoding required)

  • Multi-language support (Python, R, C++, etc.)

  • GPU acceleration

  • Ability to handle missing values and sparse data

  • Custom loss function support

With the rapid growth of data due to the expansion of the internet, traditional machine learning algorithms struggled to maintain speed, memory efficiency, and accuracy on massive datasets. Similar problems were also observed in XGBoost. Recognizing these challenges, Microsoft developed LightGBM to overcome them.

Unlike XGBoost, LightGBM can directly handle categorical data without the need for manual encoding. The framework is intelligent enough to recognize categorical features and process them internally. This not only saves preprocessing effort but also improves performance.

In essence, LightGBM has all the features of XGBoost, along with significant improvements in speed, memory efficiency, and categorical data handling. Therefore, we can consider:

LightGBM ≥ XGBoost


BOOSTING AND OBJECTIVE FUNCTION

 

LightGBM, after all, is a gradient boosting machine learning algorithm. The simple logic of gradient boosting is also applied here.

In gradient boosting, we build models in a stage-wise manner. We start with a base model (usually the mean prediction), then train the first decision tree on the residual errors, and keep adding trees sequentially. Each new tree corrects the mistakes of the previous ones.

This principle is the same in Gradient Boosting, XGBoost, and LightGBM.

Both XGBoost and LightGBM share the same objective function, which has two components:

 

  1. Loss function component (measuring prediction error)

  2. Regularization component (controlling model complexity)

ree

Both frameworks use the same objective function and follow stage-wise additive modeling.

 

SPLIT FINDING

 

In boosting algorithms, the most computationally expensive task is building decision trees — specifically, finding the best split points for features.

To make a split:

  1. Data must first be sorted.

  2. For each feature, possible splits are evaluated.

  3. At each split, the gain (improvement in the objective function) is calculated.

  4. The split with the maximum gain is chosen.

This process is repeated for all nodes in the tree, which becomes time-consuming.

There are two main methods for split finding:

  1. Exact Greedy Method – tries all possible split points (very accurate but computationally heavy).

  2. Approximate Method – faster alternative. Instead of evaluating every split, it uses quantiles and bins.

In the approximate method:

  • The data is binned into histograms.

  • For each bin, the gradients and Hessians are computed.

  • Using these, the gain is calculated efficiently.

This histogram-based approach greatly speeds up tree building.

  • XGBoost supports both exact greedy and approximate split finding.

  • LightGBM, however, exclusively uses the histogram-based (approximate) method, which it calls Histogram-based Splitting.

This is one of the key differences between XGBoost and LightGBM — with LightGBM offering faster computation and better memory usage.


BEST FIT TREE (LEAF-WISE GROWTH STRATEGY)

 

When building decision trees, the way we grow the tree (i.e., decide which node to split next) has a huge impact on:

  • Accuracy

  • Speed of training

  • Memory efficiency

  • Risk of overfitting

Different algorithms adopt different tree growth strategies. There are three major ones:

  1. Depth-first growth

  2. Level-first (Breadth-first) growth

  3. Leaf-first (Best-first) growth

 

1. Depth-first Growth

ree
  • Start with the root node.

  • At each step, split one side completely before moving to the other.

  • Conventionally, the algorithm starts with the left branch and keeps splitting until nodes are either:

    • Pure (all data in the node belongs to one class), or

    • Maximum depth is reached.

  • Once done, it backtracks and splits the remaining nodes.

Analogy: Imagine exploring a maze by always going left until you can’t go further, then coming back and exploring the right side.

Advantages: Simple, widely used (Decision Trees, XGBoost).

Drawback: It may spend too much time splitting unimportant branches before exploring better ones.


2. Level-first (Breadth-first) Growth

ree
  • Start from the root node and split it.

  • Then, at the next level, all nodes at the same depth are split before moving deeper.

  • For example:

    • Root → Left and Right nodes.

    • Both Left and Right are split before going further down.

Analogy: Like reading a book page by page, not skipping ahead.

Advantages: Produces balanced trees (all branches grow evenly).

Drawback: May waste computation on weak splits just because they’re at the same depth.


3. Leaf-first (Best-first) Growth (LightGBM’s Strategy)

ree
  • Start with the root split.

  • At each step, calculate the gain (improvement in loss function) for all available leaves.

  • Pick the leaf with the highest gain and split it next.

  • Continue until stopping criteria (like max depth, min samples per leaf) are met.

Analogy: Imagine you’re repairing leaks in a pipe system. Instead of fixing them level by level, you always fix the biggest leak first. This way, overall water loss reduces much faster.

Advantages:

  • Much faster loss reduction: Because it always splits the most promising leaf.

  • Can achieve higher accuracy with fewer splits.

  • Saves computation by avoiding weak splits.

Drawbacks:

  • Produces unbalanced trees (some branches grow deep, others remain shallow).

  • Can overfit if not regularized properly (because it focuses too much on high-gain splits).


Why LightGBM Chooses Leaf-first Growth

 

  • LightGBM’s philosophy is “do more with less”.

  • By always splitting the most important leaf, it reduces the training loss quickly.

  • This makes LightGBM much faster than XGBoost, especially on large datasets.

  • To avoid overfitting, LightGBM introduces regularization parameters like max_depth, min_data_in_leaf, and min_gain_to_split.


Comparison of Tree Growth Strategies

Strategy

How it Grows

Tree Shape

Pros

Cons

Used In

Depth-first

Splits one branch fully before backtracking

Can be skewed

Simple, common

Wastes time on weak splits

Decision Trees, XGBoost

Level-first

Splits all nodes at the same depth

Balanced

Easy to control

May split weak nodes unnecessarily

Random Forests, XGBoost

Leaf-first

Splits the leaf with the highest gain

Unbalanced

Fast loss reduction, accurate

Risk of overfitting

LightGBM

The Leaf-first strategy is what makes LightGBM unique.

It aggressively reduces loss by always picking the most important split, making it faster and more accurate than XGBoost, but it requires careful parameter tuning to avoid overfitting.

bottom of page