Lesson 13 of 15

Principal Component Analysis

Principal Component Analysis (PCA)

PCA finds the directions of maximum variance in data — the principal components. The first PC is the direction along which the data varies most.

Algorithm

  1. Centre the data by subtracting the mean
  2. Compute the covariance matrix C=1nXTXC = \frac{1}{n} X^T X
  3. Find the dominant eigenvector of CC via power iteration

The dominant eigenvector is the first principal component.

Example

Data: (2,1), (4,2), (6,3), (8,4)(2,1),\ (4,2),\ (6,3),\ (8,4) — perfectly collinear along y=x/2y = x/2

After centring, the covariance matrix is:

C=(5.02.52.51.25)C = \begin{pmatrix} 5.0 & 2.5 \\ 2.5 & 1.25 \end{pmatrix}

The first PC (dominant eigenvector) points along [2, 1]/5[2,\ 1]/\sqrt{5}:

PC1=[0.8944, 0.4472]\text{PC}_{1} = [0.8944,\ 0.4472]

Your Task

Implement pca_first_component(data) that returns the first principal component (unit vector) of a list of 2D points.

Python runtime loading...
Loading...
Click "Run" to execute your code.