PCA

Motivation

Data compression
Data visualization

PCA algorithm

The pinciple component algorithm procedure is a dimension reduction technique that projects the data on $k$ dimensions by maximizing the variance of the data.

Data preprocessing: feature scaling/ mean normalization

\[ x^{(i)}_{j} = \frac {x^{(i)}_{j} - \mu_{j}}{\sigma_{j}} \]

Compute covariance matrix

\[ \Sigma = \frac{1}{m} \sum^{n}_{i = 1} (x^{(i)})(x^{(i)})^{T} \]

Compute $\mu_{1}, ..., \mu_{k} $ the $k$ orthogonal eigenvectors of $\Sigma$
Project the data on $span_{\mathbb{R}}(\mu_{1}, ..., \mu_{k} )$

Choosing $K$ (number of principle component)

Average squared projection error
Total variation in the data (how much variance you want to retain)

Last updated on Dec 3, 2019