PCA

Motivation

  1. Data compression
  2. Data visualization

PCA algorithm

The pinciple component algorithm procedure is a dimension reduction technique that projects the data on $k$ dimensions by maximizing the variance of the data.

  1. Data preprocessing: feature scaling/ mean normalization

\[ x^{(i)}_{j} = \frac {x^{(i)}_{j} - \mu_{j}}{\sigma_{j}} \]

  1. Compute covariance matrix

\[ \Sigma = \frac{1}{m} \sum^{n}_{i = 1} (x^{(i)})(x^{(i)})^{T} \]

  1. Compute $\mu_{1}, ..., \mu_{k} $ the $k$ orthogonal eigenvectors of $\Sigma$

  2. Project the data on $span_{\mathbb{R}}(\mu_{1}, ..., \mu_{k} )$

Choosing $K$ (number of principle component)

  1. Average squared projection error
  2. Total variation in the data (how much variance you want to retain)
Previous
Next