top of page
All Posts


K-Means Clustering Explained: Geometric Intuition, Assumptions, Limitations, and Variations
K-Means is a powerful unsupervised machine learning algorithm used to partition a dataset into a pre-determined number of distinct, non-overlapping clusters. It works by iteratively assigning data points to the nearest cluster "centroid" and then updating the centroid's position based on the mean of the assigned points. This guide breaks down the geometric intuition behind K-Means, explores its core assumptions and limitations, and introduces important variations you should k

Aryan
Sep 22
Â
Â


Introduction to Unsupervised Learning: Clustering, Dimensionality Reduction & More
Unsupervised learning is a type of machine learning that uncovers hidden patterns in data without labels. Discover its key types, from clustering and dimensionality reduction to anomaly detection, and see how these techniques are applied in real-world scenarios like customer segmentation and image processing.

Aryan
Sep 22
Â
Â


Exclusive Feature Bundling (EFB) in LightGBM: Boost Speed & Reduce Memory Usage
Exclusive Feature Bundling (EFB) is a key LightGBM optimization that reduces the number of features by merging sparse, mutually exclusive columns—cutting memory usage and training time without sacrificing accuracy.

Aryan
Sep 21
Â
Â


GOSS Explained: How LightGBM Achieves Faster Training Without Sacrificing Accuracy
Gradient-based One-Side Sampling (GOSS) is a key innovation in LightGBM that accelerates model training without losing accuracy. By focusing on high-gradient (hard-to-learn) data points and selectively sampling low-gradient ones, GOSS strikes the perfect balance between speed and performance, making LightGBM faster and more efficient than traditional boosting methods.

Aryan
Sep 19
Â
Â


LightGBM Explained: Objective Function, Split Finding, and Leaf-Wise Growth
Discover how LightGBM optimizes gradient boosting with faster training, memory efficiency, and advanced split finding. Learn its unique leaf-wise growth strategy, objective function, and why it outperforms traditional methods like XGBoost.

Aryan
Sep 18
Â
Â


Handling Missing Data in XGBoost
Struggling with missing data? XGBoost simplifies the process by handling it internally using its sparsity-aware split finding algorithm. Learn how it finds the optimal "default direction" for missing values at every tree split by testing which path maximizes information gain. This allows you to train robust models directly on incomplete datasets without manual imputation.

Aryan
Sep 17
Â
Â


XGBoost Optimizations
XGBoost is one of the fastest gradient boosting algorithms, designed for high-dimensional and large-scale datasets. This guide explains its core optimizations—including approximate split finding, quantile sketches, and weighted quantile sketches—that reduce computation time while maintaining high accuracy.

Aryan
Sep 12
Â
Â


XGBoost Regularization
XGBoost is a powerful boosting algorithm, but it can overfit if not controlled. Regularization helps by simplifying trees, pruning unnecessary splits, and balancing bias–variance. This guide explains overfitting, how XGBoost improves on Gradient Boosting, and key parameters like gamma, lambda, max_depth, min_child_weight, learning rate, subsample, and early stopping to build robust models.

Aryan
Sep 5
Â
Â


The Core Math Behind XGBoost
XGBoost isn’t just another boosting algorithm — its strength lies in the mathematics that power its objective function, optimization, and tree-building strategy. In this post, we break down the core math behind XGBoost: from gradients and Hessians to Taylor series approximation, leaf weight derivation, and similarity scores. By the end, you’ll understand how XGBoost balances accuracy with regularization to build powerful predictive models.

Aryan
Aug 26
Â
Â


XGBoost for Classification
Master classification with XGBoost using a practical, beginner-friendly example. Understand how the algorithm builds decision trees, calculates log loss, optimizes splits, and uses probabilities to make accurate class predictions. A must-read for aspiring machine learning engineers.

Aryan
Aug 16
Â
Â
bottom of page