Principal components analysis
PCA is a common technique to reduce data in high dimensions to a lower dimensional subspace, either for visualization or data compression.
Suppose we have a dataset consisting of n samples and each sample is of p features. We represent the dataset by a n\times p matrix \mathbf{X}.
We also assume that \mathbf{X} is centered so that the sample covariance matrix \boldsymbol{\Sigma} in the feature space is:
\begin{eqnarray}
\boldsymbol{\Sigma} = \frac{1}{n}\mathbf{X}^T\mathbf{X}\,.\tag{1}\end{eqnarray}
PCA is to find a set of k\leq p feature vectors \mathbf{v}_1, \cdots, \mathbf{v}_k (called principal component directions) such that these new feature vectors have the largest k sample variances
\mathbf{v}^T_1 \boldsymbol{\Sigma} \mathbf{v}_1,\cdots, \mathbf{v}^T_k \boldsymbol{\Sigma} \mathbf{v}_k. That is, \mathbf{v}_1, \cdots, \mathbf{v}_k are the eigenvectors of \boldsymbol{\Sigma} corresponding to the k largest eigenvalues \lambda_1\geq \cdots \geq \lambda_k.
In sum,
let \mathbf{V} be a p\times k matrix [\mathbf{v}_1, \cdots, \mathbf{v}_k] and \boldsymbol{\lambda} = \text{diag}\left[\lambda_1, \cdots, \lambda_k\right],
we can solve \mathbf{V} by the eigenvalue problem \begin{eqnarray}
\boldsymbol{\Sigma}\,\mathbf{V} = \mathbf{V}\,\boldsymbol{\lambda}\,.\tag{2} \end{eqnarray}
In practice, instead of computing (1) and (2), one can solve \mathbf{V} directly by SVD of \mathbf{X} as
\begin{eqnarray}
\mathbf{X} = \mathbf{U}\,\boldsymbol{\sigma}\,\mathbf{V}^T\,. \tag{3}\end{eqnarray} The dataset \mathbf{X} can then be projected to a lower k-dimensional feature space as \mathbf{X}\,\mathbf{V}.
Comments
Post a Comment