top of page

K-Means Clustering Explained: Geometric Intuition, Assumptions, Limitations, and Variations

  • Writer: Aryan
    Aryan
  • Sep 22
  • 4 min read

K-Means is an unsupervised machine learning algorithm that partitions a dataset into k distinct, non-overlapping clusters by minimizing the Within-Cluster Sum of Squares (WCSS). It iteratively assigns each data point to the nearest cluster centroid and updates the centroids to the mean of their assigned points until the assignments stabilize or convergence is achieved.

In simple terms: K-Means groups similar data points together by placing them into k clusters, where each cluster is represented by the average position (centroid) of the points inside it.


K-Means: A Geometric Intuition

 

Let's understand K-Means with a simple example. Suppose we have a dataset of 100 students with two features: their CGPA and IQ score. Our goal is to group similar students together using clustering.

If we plot the data, it may visually suggest the presence of distinct groups.

ree

K-Means discovers clusters through the following iterative process:


Step 1: Choose the number of clusters (k)

The first step is to decide how many clusters (k) to create. This value is represented by 'k'. The algorithm itself does not determine k; we have to provide it.

In our 2D example, we can visually guess that there are three clusters. For higher-dimensional data where we can't see the plot, we'd use methods like the Elbow Method (which we'll cover next). For now, let's set k = 3.


Step 2: Randomly Initialize Centroids

Next, the algorithm randomly places k centroids (in our case, 3) onto the plot. These centroids are the initial guesses for the center of each cluster.

ree

Step 3: The Iterative Process (Assign and Move)

Now, the algorithm repeats two steps in a loop until the clusters are stable.

A. Assign Points to the Nearest Cluster For every student (data point), the algorithm calculates the Euclidean distance to each of the three centroids. Each student is then assigned to the cluster of the nearest centroid. This forms our initial logical clusters.

ree
ree

B. Update the Centroids Once all points are assigned to a cluster, the algorithm updates the position of each centroid. It does this by calculating the mean of all the points within a cluster and moving the centroid to that new mean position (e.g., the average CGPA and IQ of all students in that cluster).

ree

Step 4: Repeat Until Convergence

The algorithm checks if the centroids have moved from their previous position.

  • If the centroids have moved, the process repeats from Step 3A. Points are reassigned to their new nearest centroid, and the centroids are updated again.

  • If the centroids did not move, it means the clusters are stable. The algorithm has converged, and the process stops.

ree

And that's the geometric intuition! We successfully grouped our 100 students into three distinct clusters. The beauty of K-Means is its simplicity; it just uses the concept of distance to create these groupings, and this logic works well in any number of dimensions, not just two.


Assumptions of K-Means

 

  1. Spherical Cluster Shape

    • K-Means assumes clusters are spherical and isotropic, meaning they are uniform in all directions.

    • Works best when clusters are circular (2D) or spherical (higher dimensions).

  2. Similar Cluster Size

    • The algorithm performs better when clusters have roughly equal sizes.

    • Unequal-sized clusters can cause misassignment of points.

  3. Equal Variance of Clusters

    • Assumes all clusters have similar variance.

    • Since it uses Euclidean distance, clusters with lower variance are favored.

  4. Clusters Are Well Separated

    • Works best when clusters are clearly distinct and non-overlapping.

  5. Predefined Number of Clusters (k)

    • K-Means requires specifying the number of clusters in advance.

    • Choosing the right k often needs domain knowledge or methods like Elbow or Silhouette analysis.

  6. Large n, Small k

    • More efficient when the dataset has many points (large n) but relatively few clusters (small k).


Limitations of K-Means

 

  1. Determining the Number of Clusters (k)

    • Choosing the optimal number of clusters is not straightforward.

    • Often requires domain knowledge or methods like the Elbow Method or Silhouette Analysis.

  2. Requires Similar Cluster Sizes

    • K-Means assumes clusters are roughly equal in size.

    • Unequal-sized clusters may lead to poor assignments.

  3. Assumes Similar Variance

    • The algorithm works best when clusters have similar variance.

    • Clusters with widely different spreads may be misrepresented.

  4. Assumption of Spherical Clusters

    • K-Means assumes clusters are spherical and evenly distributed, which may not hold in real-world data.

  5. Vulnerability to Outliers

    • Outliers can distort cluster centroids, leading to misleading results.

  6. Hard Clustering

    • Each data point is assigned to exactly one cluster, which may not be suitable if points naturally belong to multiple groups.

  7. High-Dimensional Challenges

    • In very high-dimensional spaces, Euclidean distances become less meaningful, reducing clustering effectiveness.

  8. Sensitivity to Feature Scaling

    • K-Means is sensitive to the scale of features.

    • Preprocessing like standardization is often necessary before applying the algorithm.


Variations of K-Means

 

  1. KMeans++

    • Improves the initialization of centroids by spreading them out.

    • Leads to better and more consistent clustering results compared to random initialization.

  2. Mini-Batch KMeans

    • Uses small random batches of data in each iteration instead of the full dataset.

    • Speeds up computation, especially for large datasets, while achieving results similar to standard K-Means.

  3. K-Medoids

    • Uses the most centrally located data point (medoid) as the cluster center instead of the mean.

    • More robust to outliers than standard K-Means.

  4. Fuzzy C-Means

    • Allows each data point to belong to multiple clusters with varying degrees of membership.

    • Useful when data naturally fits into more than one group.

  5. Incremental KMeans (also called Online KMeans or Streaming KMeans)

    • Designed to handle streaming or dynamically arriving data.

    • Updates clusters incrementally as new data arrives, without retraining on the entire dataset.


bottom of page