Lesson 8 of 15

K-Means Clustering

K-Means Clustering

K-means is the most widely used clustering algorithm. It partitions nn data points into kk clusters by iterating two steps:

Step 1 — Assignment

Assign each point to the nearest centroid:

c(i)=argminjx(i)μj2c^{(i)} = \arg\min_j \| \mathbf{x}^{(i)} - \boldsymbol{\mu}_j \|_2

Step 2 — Update

Move each centroid to the mean of its assigned points:

μj=1CjiCjx(i)\boldsymbol{\mu}_j = \frac{1}{|C_j|} \sum_{i \in C_j} \mathbf{x}^{(i)}

Inertia (Within-Cluster Sum of Squares)

A common quality metric is inertia — the sum of squared distances from each point to its assigned centroid:

Inertia=i=1nx(i)μc(i)22\text{Inertia} = \sum_{i=1}^{n} \| \mathbf{x}^{(i)} - \boldsymbol{\mu}_{c^{(i)}} \|_2^2

Lower inertia means tighter, more compact clusters.

Your Task

Implement:

  • assign_clusters(X, centroids) → list of cluster indices (one per point)
  • update_centroids(X, assignments, k) → new centroids as mean of assigned points
  • kmeans_inertia(X, assignments, centroids) → total within-cluster sum of squares
Python runtime loading...
Loading...
Click "Run" to execute your code.