Lesson 9 of 15
PCA (2D)
Principal Component Analysis (2D)
PCA finds the directions of maximum variance in the data. It is used for dimensionality reduction, visualisation, and noise removal.
Step 1 — Center the Data
Subtract the column mean from each feature so the data has zero mean:
Step 2 — Covariance Matrix (2D)
For centered 2D data with points:
Step 3 — Explained Variance Ratio
Given eigenvalues :
If the first component's EVR is close to 1, most variance lives along a single direction.
The Curse of Dimensionality
PCA is one of the main defences against the curse of dimensionality — the phenomenon where high-dimensional spaces behave counter-intuitively:
- Distances converge: In dimensions, the ratio between the nearest and farthest neighbour approaches 1 as . This makes distance-based methods (k-NN, k-means, DBSCAN) unreliable.
- Data becomes sparse: To maintain the same density of data points, you need exponentially more samples as dimensions grow. With fixed samples, the data "spreads thin" and every point looks like an outlier.
- Overfitting risk increases: More features relative to samples means more opportunity for the model to memorise noise.
PCA combats this by projecting data onto the top principal components, discarding low-variance directions that are likely noise. As a rule of thumb, keep enough components to capture 90-95% of the total variance.
Your Task
Implement:
center(X)→ subtract column means from each rowcovariance_2d(X_centered)→ covariance matrix as a list of listsexplained_variance_ratio(eigenvalues)→ list of ratios
Python runtime loading...
Loading...
Click "Run" to execute your code.