top of page

KNN (K-Nearest Neighbors)

  • Writer: Aryan
    Aryan
  • Feb 22
  • 8 min read

K-Nearest Neighbors is a lazy learning algorithm, meaning that it doesn't require any explicit training phase. Instead, the algorithm simply stores the training data and performs all computations during the prediction phase. During prediction, KNN computes the distances between the test point and all training points to identify the nearest neighbors, and then makes a prediction based on these neighbors.


Why is KNN called a Lazy Learner?

 

KNN is referred to as a lazy learning algorithm because it does not explicitly learn a model during training. Instead, it stores the training data and defers computation until a query is made.

Key Reasons:

  • No explicit training phase (just stores the data).

  • All computation is done at the time of prediction.

  • High memory usage since the whole dataset is stored.

  • Slow prediction time due to distance calculations for each query.

 

Applications of KNN:

KNN is widely used in:

  • Pattern recognition: Image classification, handwriting recognition, etc.

  • Recommendation systems: Predicting the next item based on user behavior.

  • Medical diagnosis: Classifying patients based on symptoms.

  • Financial forecasting: Predicting stock prices or customer behavior.

  • Anomaly detection: Identifying outliers in datasets.


How Does KNN Work?

 

The KNN algorithm follows a simple sequence of steps for making predictions:

 

Step 1: Choose the Value of k

The first step is to determine the number of neighbors, k, that will be considered to make the prediction. Choosing the right value for k is crucial and affects the model's performance.

 

Step 2: Compute the Distance

The next step is to compute the distance between the test point and every data point in the training dataset. Popular distance metrics include Euclidean, Manhattan, and Minkowski distances.

 

Step 3: Identify the k Nearest Neighbors

Once the distances are computed, the k closest neighbors are selected based on the distance metric.

 

Step 4: Predict the Class or Value

  • For classification: The algorithm applies a majority voting system, where the class with the most votes from the neighbors is assigned to the test point.

  • For regression: The prediction is the mean (or weighted mean) of the values of the k nearest neighbors.

 

Step 5: Return the Predicted Label/Value

 

Finally, the predicted label (for classification) or value (for regression) is returned.


Types of KNN Models

 

KNN can be used for both classification and regression tasks. Let’s break down both models:

 

i) KNN Classifier


  • Purpose: Used for classification tasks where the output is categorical.

  • How it works: For a given test point, the KNN classifier assigns the class label that is the most frequent among its k nearest neighbors.

  • Advantages: Simple, intuitive, and works well for small to medium datasets.

  • Multi-class Classification: KNN can handle multiple classes, using majority voting to determine the class label.


ii) KNN Regressor


  • Purpose: Used for regression tasks where the output is continuous (e.g., predicting house prices or temperatures).

  • How it works: The KNN regressor predicts the continuous value by computing the average (or weighted average) of the values of the k nearest neighbors.

  • Impact of k: The choice of k controls how smooth the predictions are. A small k may lead to overfitting, while a large k may result in oversmoothing.


Overfitting in KNN

 

What is Overfitting?

 

Overfitting occurs when the KNN model memorizes the training data instead of learning the general pattern. This results in high accuracy on training data but poor performance on test data.

 

Causes of Overfitting in KNN :->

 

  1. Low value of K (e.g., K = 1 or 2)

    • Each data point is classified or predicted based on very few neighbors, leading to a model that is too specific to the training data.

    • In classification, this can cause high variance as even noisy points influence predictions.

    • In regression, the prediction is based on very close values, making the model unstable.


  2. Using distance-based weights excessively

    • In weighted KNN, assigning too much importance to very close neighbors can make the model overly sensitive to minor variations in the dataset.


  3. Noisy or irrelevant features

    • If there are irrelevant or highly correlated features, KNN might learn these noise patterns instead of actual trends.

 

Effects of Overfitting


  • High training accuracy but low test accuracy.

  • Model is too sensitive to outliers and noise in the dataset.

  • Poor generalization to unseen data.

 

How to Reduce Overfitting in KNN ?


  • Increase K (Choose an optimal value, such as using cross-validation).

  • Use feature selection to remove irrelevant features.

  • Apply data preprocessing techniques like normalization to reduce the effect of extreme values.


Underfitting in KNN

 

What is Underfitting?

 

Underfitting occurs when the KNN model is too simple to capture the underlying pattern in the data, leading to both low training accuracy and low test accuracy.

 

Causes of Underfitting in KNN


  1. Very high value of K (e.g., K is close to dataset size)

    • The model becomes too generalized because it averages over too many neighbors, losing important local patterns.

    • In classification, it can lead to majority class dominance, where smaller classes are ignored.

    • In regression, predictions become smooth and fail to capture variations.


  2. Inappropriate distance metric

    • If the chosen distance metric (e.g., Euclidean, Manhattan) is not suited for the dataset, the model may not effectively measure similarities between points.


  3. Too few features or too much noise

    • If important features are missing or noisy data dominates, the model won’t capture useful information.

 

Effects of Underfitting


  • Poor training and test accuracy

  • Model fails to capture important trends in data

  • Predictions are too simplistic or random

 

How to Reduce Underfitting in KNN ?


  • Decrease K to capture local trends more effectively.

  • Use relevant features and proper feature scaling (normalization).

  • Experiment with different distance metrics (e.g., Manhattan, Minkowski).


Choosing the Right Value of k

 

Selecting an optimal value for k is essential for ensuring good model performance.


  • Small k (e.g., k=1 or 2):

    • Pros: The model captures the fine-grained structure of the data.

    • Cons: It is highly sensitive to noise, leading to high variance and potential overfitting.

  • Large k (e.g., k=10 or 20):

    • Pros: The model is less sensitive to noise and captures the general pattern.

    • Cons: It may lead to high bias, underfitting the data, and oversmoothing.

  • Cross-validation is often used to find the optimal k value by splitting the data into training and validation sets and evaluating performance for different k values.

  • For classification, choosing odd values of k is preferred to avoid ties in the voting process.


Hyperparameters in KNN

 

KNN is a non-parametric, lazy-learning algorithm for classification and regression, where performance depends on key hyperparameters.

 

  1. Number of Neighbors (K)

    Determines the number of nearest neighbors considered.

    Effect:

    Low K → Overfitting.

    High K → Underfitting.


  2. Distance Metric

    • Euclidean :->

    ree
    • Manhattan :->

    ree

    • Minkowski (generalized version of Euclidean & Manhattan) :->

      ree

            If p = 1, it behaves like Manhattan Distance.

            If p = 2, it behaves like Euclidean Distance.

  3. Weighting of Neighbors

    Uniform Weight (default): Equal contribution from all neighbors.

    Distance-based Weight: Closer points have higher influence.

 

  1. Power Parameter (p) in Minkowski Distance

    p = 1 → Manhattan Distance.

    p = 2 → Euclidean Distance (default).

 

  1. Leaf Size (For KD-Tree & Ball Tree)

    Minimum points in a node before splitting.

    Effect: Smaller size → More precise, slower.

 

  1. Algorithm Optimization

    Brute Force: A simple but slow approach to computing distances.

    KD-Tree: Efficient for low-dimensional data.

    Ball-Tree: Optimized for high-dimensional data.


Space and Time Complexity of KNN

 

Training Phase:

  • Time Complexity: O(1), as there is no actual model training phase.

  • Space Complexity: O(n × d), where n is the number of training samples and d is the number of features.

Prediction Phase:

  • Brute Force: O(n × d) to compute distances to all points.

  • KD-Tree: O(log n) for efficient search, useful in low-dimensional spaces.

  • Ball-Tree: O(log n) for high-dimensional data.


 Optimized Variants of KNN

 

In machine learning, especially when dealing with large datasets, efficiently finding the nearest neighbors is crucial. K-Nearest Neighbors (KNN) is a fundamental algorithm that relies on distance-based comparisons, making the choice of an appropriate search algorithm critical for performance. Two widely used data structures to optimize KNN searches are KD-Tree and Ball Tree.


KD-Tree Algorithm

 

A KD-Tree (K-Dimensional Tree) is a space-partitioning data structure designed for efficient nearest-neighbor searches. It recursively divides the data space using axis-aligned hyperplanes to create a binary tree structure.

 

How KD-Tree Works for KNN


  1. Space Partitioning: The data is split along a coordinate axis at each level using the median value, ensuring balanced partitions.

  2. Recursive Construction: The process continues until each node contains a small subset of points (leaf node).

  3. Nearest Neighbor Search: At query time, the tree is traversed to find the nearest neighbors while pruning irrelevant branches, reducing the number of comparisons.

 

Key Features of KD-Tree in KNN


  • Efficient for low to moderate dimensions (typically up to ~10-20 dimensions).

  • Performs well with structured and uniformly distributed data.

  • Faster KNN queries than brute-force search, reducing computational complexity from O(N) to O(log N) in average cases.

  • Used in KNN-based classification, regression, and recommendation systems where quick nearest-neighbor retrieval is essential.

 

Limitations of KD-Tree in KNN


  • Not well-suited for high-dimensional data (curse of dimensionality makes it inefficient beyond ~20 dimensions).

  • Tree balancing challenges when data distribution is uneven.


Ball Tree Algorithm

 

Unlike KD-Trees, which use axis-aligned partitions, Ball Trees organize data into hierarchical hyperspheres. This makes them particularly effective for high-dimensional and sparse datasets.

 

How Ball Tree Works for KNN


  1. Recursive Partitioning: The dataset is divided into two subsets using a hypersphere around a centroid.

  2. Bounding Hyperspheres: Each node in the tree represents a region enclosed within a hypersphere.

  3. Pruning for Nearest Neighbor Search: During KNN queries, entire branches are pruned if their hypersphere is too far from the query point, improving efficiency.

 

Key Features of Ball Tree in KNN


  • More efficient than KD-Tree for high-dimensional spaces (typically >20 dimensions).

  • Handles non-uniform and sparse data distributions better.

  • Adaptive partitioning reduces unnecessary distance calculations during KNN queries.

  • Used in KNN applications for large-scale datasets, clustering, anomaly detection, and recommendation systems.

 

Limitations of Ball Tree in KNN


  • Slower tree construction compared to KD-Trees due to complex geometric computations.

  • Higher memory usage, as each node stores additional distance metrics for bounding hyperspheres.


Difference Between Ball Tree Algorithm and KD Tree Algorithm

 

Feature

Ball Tree

KD-Tree

Splitting Strategy

Divides data using hyperspheres (multidimensional spheres).

Divides data using axis-aligned hyperplanes (rectangular partitions).

Space Partitioning

Organizes data in a hierarchical structure of hyperspheres.

Organizes data using hierarchical axis-aligned rectangles.

Tree Structure

Non-binary: Each node can have multiple children.

Binary tree: Each node has exactly two children.

Dimensionality

Works well for low to moderate dimensions.

Works well for low to high dimensions.

Nearest Neighbor Search

Generally faster for sparse, high-dimensional data due to hyper-spherical pruning.

Efficient for low to moderate-dimensional data, but performance degrades in very high dimensions.

Tree Construction

Slower due to complex geometric computations involved in forming hyperspheres.

Faster due to simpler axis-aligned splitting.

Balancing

Naturally balanced, as hyperspheres are created based on spatial distribution.

May require balancing strategies to prevent deep trees that slow down queries.

Memory Usage

Higher memory usage, as each node must store additional geometric properties.

Lower memory usage, as each node stores a simple split along an axis.

Implementation Complexity

More complex due to the non-binary structure and hypersphere calculations.

Simpler due to its binary tree structure and axis-aligned splits.

Query Time Complexity

Both have logarithmic query time complexity in average cases.

Both have logarithmic query time complexity in average cases.

Applications

Used for sparse, high-dimensional data, such as image retrieval, clustering, and high-dimensional search.

Commonly used in low to moderate-dimensional applications like computer graphics, nearest-neighbor searches, and spatial indexing.

Library Implementations

Implemented in Scikit-learn and SciPy.

Implemented in Scikit-learn, SciPy, and other libraries.

Leaf Size in KD-Tree and Ball Tree

 

The leaf size (also called leaf node size) is a crucial parameter in KD-Tree and Ball Tree, influencing their efficiency and accuracy in nearest-neighbor searches. It defines the minimum number of points a leaf node can contain before stopping further splits.

 

Leaf Size in KD-Tree


  • In a KD-Tree, the leaf size determines when the recursive splitting stops.

  • Larger leaf size → Fewer tree levels, faster tree construction but slower queries.

  • Smaller leaf size → Deeper tree, better search accuracy but higher memory usage.

  • Typical default value: 30 (but it depends on dataset size and dimensionality).

  • Best for: Lower-dimensional data (≤20 dimensions) with Euclidean distance (p=2).

 

Leaf Size in Ball Tree


  • In a Ball Tree, the leaf size affects hypersphere grouping and pruning efficiency.

  • Larger leaf size → Fewer nodes, better for high-dimensional data.

  • Smaller leaf size → More nodes, finer partitions but increased query time.

  • Best for: High-dimensional data (>20 dimensions) or non-Euclidean distances (p ≠ 2).


How to Choose the Best Leaf Size?

 

Tree Type

Best Leaf Size

Best Use Case

KD-Tree

20 - 40

Low-dimensional data, Euclidean distance (p=2)

Ball Tree

40 - 60

High-dimensional data, any p value (e.g., Minkowski, Manhattan)

  • Rule of thumb: A higher leaf size makes queries slower but speeds up tree construction.

  • Use KD-Tree for low-dimensional data and Ball Tree for high-dimensional data.

bottom of page