Lesson 7 of 15

K-Nearest Neighbours

K-Nearest Neighbours (k-NN)

K-Nearest Neighbours is one of the simplest machine learning algorithms. To classify a new point, it:

  1. Computes the distance from the query point to every training point
  2. Selects the kk closest training points
  3. Returns the most common label among those kk neighbours

y^=mode({y(i):ik-nearest(x)})\hat{y} = \text{mode}\left(\{y^{(i)} : i \in \text{k-nearest}(\mathbf{x})\}\right)

Algorithm

for each training point:
    compute euclidean distance to query
sort by distance
take k smallest
return the majority label

Tie-breaking

When two labels are equally common among the kk neighbours, return the smaller label value.

Properties

  • Non-parametric: no training phase, all computation at prediction time
  • Lazy learner: stores the entire training set
  • kk is a hyperparameter: small kk = low bias, high variance; large kk = high bias, low variance

Your Task

Implement knn_classify(X_train, y_train, x_query, k) that returns the most common label among the kk nearest neighbours. Use Euclidean distance. Break ties by returning the lowest label value.

Python runtime loading...
Loading...
Click "Run" to execute your code.