K-Nearest Neighbours

K-Nearest Neighbours (k-NN)

K-Nearest Neighbours is one of the simplest machine learning algorithms. To classify a new point, it:

Computes the distance from the query point to every training point
Selects the $k$ closest training points
Returns the most common label among those $k$ neighbours

$\hat{y} = \text{mode}\left(\{y^{(i)} : i \in \text{k-nearest}(\mathbf{x})\}\right)$

Algorithm

for each training point:
    compute euclidean distance to query
sort by distance
take k smallest
return the majority label

Tie-breaking

When two labels are equally common among the $k$ neighbours, return the smaller label value.

Properties

Non-parametric: no training phase, all computation at prediction time
Lazy learner: stores the entire training set
$k$ is a hyperparameter: small $k$ = low bias, high variance; large $k$ = high bias, low variance

Your Task

Implement knn_classify(X_train, y_train, x_query, k) that returns the most common label among the $k$ nearest neighbours. Use Euclidean distance. Break ties by returning the lowest label value.

← Previous Next →

Python runtime loading...

Click "Run" to execute your code.