Distance Metrics
Distance Metrics
Unsupervised learning algorithms like k-NN and k-means rely on measuring distance between data points. Different metrics capture different notions of similarity.
Euclidean Distance
The straight-line distance between two points and :
Manhattan Distance
The sum of absolute differences (also called distance or "city block" distance):
Cosine Similarity
Measures the angle between two vectors, ignoring magnitude:
- means identical direction (angle = 0°)
- means perpendicular (angle = 90°)
- means opposite direction (angle = 180°)
Cosine similarity is widely used in text and document similarity because it is scale-invariant.
A Warning: The Curse of Dimensionality
All distance metrics suffer in very high dimensions. As the number of features grows, the difference between the nearest and farthest points shrinks, making distances less meaningful. This is why dimensionality reduction (e.g., PCA, covered later in this course) is often applied before distance-based algorithms like k-NN or k-means.
Your Task
Implement:
euclidean(a, b)— Euclidean () distancemanhattan(a, b)— Manhattan () distancecosine_similarity(a, b)— cosine similarity