K-Means Clustering

K-means is the most widely used clustering algorithm. It partitions $n$ data points into $k$ clusters by iterating two steps:

Assign each point to the nearest centroid:

$c^{(i)} = \arg\min_j \| \mathbf{x}^{(i)} - \boldsymbol{\mu}_j \|_2$

Move each centroid to the mean of its assigned points:

$\boldsymbol{\mu}_j = \frac{1}{|C_j|} \sum_{i \in C_j} \mathbf{x}^{(i)}$

A common quality metric is inertia — the sum of squared distances from each point to its assigned centroid:

$\text{Inertia} = \sum_{i=1}^{n} \| \mathbf{x}^{(i)} - \boldsymbol{\mu}_{c^{(i)}} \|_2^2$

Lower inertia means tighter, more compact clusters.

Implement:

assign_clusters(X, centroids) → list of cluster indices (one per point)
update_centroids(X, assignments, k) → new centroids as mean of assigned points
kmeans_inertia(X, assignments, centroids) → total within-cluster sum of squares

Python runtime loading...

Click "Run" to execute your code.