Principal components analysis

PCA is a common technique to reduce data in high dimensions to a lower dimensional subspace, either for visualization or data compression. 
Suppose we have a dataset consisting of $n$ samples and each sample is of $p$ features. We represent the dataset by a $n\times p$ matrix $\mathbf{X}$. We also assume that $\mathbf{X}$ is centered so that the sample covariance matrix $\boldsymbol{\Sigma}$ in the feature space is: \begin{eqnarray} \boldsymbol{\Sigma} = \frac{1}{n}\mathbf{X}^T\mathbf{X}\,.\tag{1}\end{eqnarray} PCA is to find a set of $k\leq p$ feature vectors $\mathbf{v}_1, \cdots, \mathbf{v}_k$ (called principal component directions) such that these new feature vectors have the largest $k$ sample variances $\mathbf{v}^T_1 \boldsymbol{\Sigma} \mathbf{v}_1,\cdots, \mathbf{v}^T_k \boldsymbol{\Sigma} \mathbf{v}_k$. That is, $\mathbf{v}_1, \cdots, \mathbf{v}_k$ are the eigenvectors of $\boldsymbol{\Sigma}$ corresponding to the $k$ largest eigenvalues $\lambda_1\geq \cdots \geq \lambda_k$. In sum, let $\mathbf{V}$ be a $p\times k$ matrix $[\mathbf{v}_1, \cdots, \mathbf{v}_k]$ and $\boldsymbol{\lambda} = \text{diag}\left[\lambda_1, \cdots, \lambda_k\right]$, we can solve $\mathbf{V}$ by the eigenvalue problem \begin{eqnarray} \boldsymbol{\Sigma}\,\mathbf{V} = \mathbf{V}\,\boldsymbol{\lambda}\,.\tag{2} \end{eqnarray} 
In practice, instead of computing (1) and (2), one can solve $\mathbf{V}$ directly by SVD of $\mathbf{X}$ as \begin{eqnarray} \mathbf{X} = \mathbf{U}\,\boldsymbol{\sigma}\,\mathbf{V}^T\,. \tag{3}\end{eqnarray} The dataset $\mathbf{X}$ can then be projected to a lower $k$-dimensional feature space as $\mathbf{X}\,\mathbf{V}$. 

Comments

Popular posts from this blog

529 Plan

How to offset W2 tax

Retirement Accounts