PCA
Motivation
- Data compression
- Data visualization
PCA algorithm
The pinciple component algorithm procedure is a dimension reduction technique that projects the data on $k$ dimensions by maximizing the variance of the data.
- Data preprocessing: feature scaling/ mean normalization
\[ x^{(i)}_{j} = \frac {x^{(i)}_{j} - \mu_{j}}{\sigma_{j}} \]
- Compute covariance matrix
\[ \Sigma = \frac{1}{m} \sum^{n}_{i = 1} (x^{(i)})(x^{(i)})^{T} \]
Compute $\mu_{1}, ..., \mu_{k} $ the $k$ orthogonal eigenvectors of $\Sigma$
Project the data on $span_{\mathbb{R}}(\mu_{1}, ..., \mu_{k} )$
Choosing $K$ (number of principle component)
- Average squared projection error
- Total variation in the data (how much variance you want to retain)