Quantum Computing: Quantum Machine Learning

We introduced the HHL algorithm and its subroute quantum phase estimation (QPE) algorithm in this blog. Now we introduce two quantum versions of machine learning algorithms: quantum principal component analysis (qPCA) and quantum support vector machine (qSVM), which replies on QPE and HHL, respectively.

Data Preparation

Suppose we have $n$ data samples and each data sample has $p$ features, i.e., $\mathbf{x}^{(i)}\equiv[x_1^{(i)}, \cdots, x_p^{(i)}]^T$ for $i=1,\cdots, n$. To apply quantum algorithms, we encode each data sample $\mathbf{x}^{(i)}$ in a quantum state \begin{eqnarray}\mathbf{x}^{(i)}\equiv[x_1^{(i)}, \cdots, x_p^{(i)}]^T \quad\xrightarrow{p=2^m} \quad |\psi_i\rangle\equiv \frac{1}{\left|\left| \mathbf{x}^{(i)}\right|\right|_2}\sum_{\tau\in \{0,1\}^m}x_{\tau}^{(i)}|\tau\rangle\,.\tag{1}\end{eqnarray} To encode the entire dataset, we entangle the data state $|\psi_i\rangle$ with an auxiliary state $|i\rangle$ and form a pure state \begin{equation}|\psi\rangle\equiv\sum_{i=1}^n \sqrt{p_i}|\psi_i\rangle|i\rangle\,,\tag{2}\end{equation} where \begin{equation}p_i\equiv \frac{\left|\left| \mathbf{x}^{(i)}\right|\right|^2_2}{\sum_{k=1}^n \left|\left| \mathbf{x}^{(k)}\right|\right|^2_2}\tag{3}\end{equation} comes from the fact that $\langle \psi_i|\psi_i\rangle=1$ while ${\mathbf{x}^{(i)}}^T\cdot \mathbf{x}^{(i)} = \left|\left|\mathbf{x}^{(i)}\right|\right|_2^2$.

Quantum Principal Component Analysis (qPCA)

Classical PCA as described in this blog is to find eigenvectors of the covariance matrix $\boldsymbol{\Sigma}=\frac{1}{n}\sum_{i=1}^n\mathbf{x}^{(i)}{\mathbf{x}^{(i)}}^T$, assuming the data is centered. The classical covariance matrix corresponds to the density matrix \begin{equation}\rho\equiv\sum_{i=1}^n p_i |\psi_i\rangle\langle \psi_i|\tag{4}\end{equation} in quantum mechanics up to a global constant $n/\sum_{k=1}^n \left|\left| \mathbf{x}^{(k)}\right|\right|^2_2$. The qPCA algorithm to find the eigenvalues and eigenvectors of the density matrix $\rho$ given multiple copies of $\rho$ as the input:

The first step in qPCA is to construct the unitary operator $e^{i\rho t}$ in order to estimate the eigenvalues of $\rho$ by QPE. Since $\rho$ in general is not sparse, the authors of the original qPCA paper proposed a trick to achieve $e^{i\rho t}$ by a SWAP operation: \begin{eqnarray}\text{Tr}_P\left\{e^{-iS\,\delta t}\rho \otimes \sigma e^{iS\,\delta t}\right\} = e^{-i\rho \,\delta t}\sigma e^{i\rho \,\delta t}\tag{5}\end{eqnarray} for any density matrix $\sigma$ under an infinitesimal time interval $\delta\,t$, where $\text{Tr}_P$ is the trace partially over the first variable and $S$ is the SWAP operator. As a result, to achieve $e^{-i\rho N\,\delta t}\sigma e^{i\rho N\,\delta t}$, we need to prepare $N$ copies of $\rho$ and repeat the application on the left side of Eq.(5) for $N$ times.
The second step is to apply QPE on the initial state of $\rho$ itself: \begin{eqnarray}\rho \otimes |0\rangle\langle 0| \xrightarrow{\text{Q.P.E. on }e^{i\rho t}} \sum_{j=1}^p \lambda_j |u_j\rangle\langle u_j| \otimes |\tilde{\lambda}_j\rangle\langle \tilde{\lambda}_j|\,,\tag{6}\end{eqnarray} where $\lambda_j$ and $|u_j\rangle$ are the eigenvalue and eigenstate of $\rho$. The state $|\tilde{\lambda}_j\rangle$ ecodes the binary representation of the eigenvalue $\lambda_j\frac{t}{2\pi}$. Finally, sampling from the mixed state (6) gives the eigenvalue $\tilde{\lambda}_j$ and eigenstate $|u_j\rangle$ with a classical probability proportional to $\lambda_j$.

Note:

The proof of Eq. (5) is as follows: without loss of generality, we assume that $\rho=|\psi\rangle\langle \psi|$ and $\sigma=|\phi\rangle\langle \phi|$. We can compute that \begin{eqnarray}\text{Tr}_P\left\{S\,\rho\otimes \sigma\right\}&=&\text{Tr}_P\left\{S\,|\psi\rangle\langle\psi|\otimes |\phi\rangle\langle\phi|\right\}=\text{Tr}_P\left\{|\phi\rangle\langle\psi|\otimes |\psi\rangle\langle\phi|\right\}=\sum_{i} \langle i| \phi\rangle\langle\psi| i\rangle\,|\psi\rangle\langle\phi|\\ &=& \sum_{i} \langle\psi| i\rangle\langle i| \phi\rangle\,|\psi\rangle\langle\phi| =\langle\psi| \phi\rangle\,|\psi\rangle\langle\phi| =\rho\sigma\,.\end{eqnarray} Similarly, we can prove that $\text{Tr}_P\left\{\rho\otimes \sigma\,S\right\}=\sigma\rho$ and thus $\text{Tr}_P\left\{[S, \rho \otimes \sigma]\right\}=[\rho, \sigma]$. Therefore, under an infinitesimal time interval $\delta t$, the left-hand side of Eq. (5) to the first order of $\delta t$ is \begin{eqnarray} \text{L.H.S.}=\text{Tr}_P\left\{(1-iS\,\delta t)\,\rho\otimes\sigma\,(1+iS\,\delta t)\right\}=\text{Tr}_P\left\{\rho\otimes\sigma -i[S,\,\rho\otimes\sigma]\delta t\right\}=\sigma -i [\rho, \sigma]\delta t\,,\end{eqnarray} which equals to the right-hand side of Eq. (5) to the first order of $\delta t$: \begin{eqnarray}\text{R.H.S.}=(1-i\rho \delta t)\,\sigma\,(1+i\rho \delta t)= \sigma -i [\rho, \,\sigma]\,\delta t\,.\end{eqnarray}
Unlike the explanation of QPE in this blog, we apply QPE on a density matrix as shown in Eq. (6). This is because the density matrix (4) of a mixed state can be viewed as a reduced state of the pure state (2) with additional auxiliary states. Assuming $|\psi_i\rangle=\sum_{j=1}^p\beta_j^{(i)}|u_j\rangle$, $\rho=\sum_{i=1}^n p_i\,|\psi_i\rangle\langle \psi_i|=\sum_{j=1}^p \lambda_j|u_j\rangle\langle u_j|$ imposes the constraint $\sum_{i=1}^n p_i\beta^{(i)}_{j}{\beta^{(i)}_{k}}^*=\lambda_j\delta_{jk}$. We apply QPE on the pure state (2) as \begin{eqnarray}\sum_{i=1}^n\sqrt{p_i} |\psi_i\rangle|i\rangle|0\rangle \xrightarrow{\text{Q.P.E. on }e^{i\rho t}} \sum_{i=1}^n\sum_{j=1}^p \sqrt{p_i}\beta^{(i)}_{j}|u_j\rangle|i\rangle|\tilde{\lambda}_j\rangle\,,\end{eqnarray}leading to a reduced density matrix that is exactly the same as the right-hand side of Eq. (6) after tracing out the auxiliary states.

Quantum Support Vector Machine (qSVM)

Classical LS-SVM as described in this blog is reduced to solve linear equations: \begin{eqnarray}\mathbf{F}\left[\begin{array}{c} b\\ \boldsymbol{\alpha} \end{array}\right] \equiv\left[\begin{array}{cc} 0 & \mathbf{1}^T\\ \mathbf{1} & \mathbf{K} + \gamma^{-1}\mathbf{I}\end{array}\right] \left[\begin{array}{c} b\\ \boldsymbol{\alpha} \end{array}\right] = \left[\begin{array}{c} 0\\ \mathbf{y}\end{array}\right]\,,\end{eqnarray} where $\mathbf{K}_{ij}={\mathbf{x}^{(i)}}^T\cdot {\mathbf{x}^{(j)}}$ is the kernel matrix. The qSVM algorithm is just to solve the normalized equations $\hat{F}|b, \alpha\rangle=|y\rangle$ using the HHL algorithm, where $\hat{F}\equiv F/{\text{Tr}F}$ with $||\hat{F}||\leq 1$. The qSVM paper summarizes the algorithm in the following steps:

Prepare the kernel by the partial trace to the pure state (2) over the data states \begin{equation}\text{Tr}_p |\psi\rangle\langle\psi|=\frac{1}{\sum_{k=1}^n \left|\left| \mathbf{x}^{(k)}\right|\right|^2_2}\sum_{i,j=1}^n \langle \psi_i|\psi_j\rangle \left|\left| \mathbf{x}^{(i)}\right|\right|_2 \left|\left| \mathbf{x}^{(j)}\right|\right|_2 |i\rangle\langle j|=\frac{K}{\text{Tr}K}\equiv \hat{K}\,.\end{equation} The normalized kernel $\hat{K}$ is the density matrix of the auxiliary states.
Construct the unitary operator $e^{i\hat{K}\,\delta t}$ using the trick as in Eq. (5).
Compute $\text{Tr}K$: construct a Hamiltonian $\sum_{j=1}^n \left|\left|\mathbf{x}^{(j)}\right|\right|_2 |j\rangle\langle j| \otimes \sigma_x$ and apply it to the state $\frac{1}{\sqrt{n}}\sum_{j=1}^n|j\rangle|0\rangle$ to reach a state $\frac{1}{\sqrt{n}}\sum_{j=1}^n\left[\cos\left(\left|\left|\mathbf{x}^{(j)}\right|\right|_2t\right) |j\rangle|0\rangle+\sin\left(\left|\left|\mathbf{x}^{(j)}\right|\right|_2t\right)|j\rangle|1\rangle\right]$. The probability of ending in the state $|1\rangle$ after a measurement is $\sum_{i=1}^n \left|\left|\mathbf{x}^{(i)}\right|\right|_2^2 t^2/n$ when $\left(\max \left|\left|\mathbf{x}^{(i)}\right|\right|_2\right) t \ll 1$, from which one can estimate the kernel trace $\text{Tr}K=\sum_{i=1}^n \left|\left|\mathbf{x}^{(i)}\right|\right|_2^2$.
Apply HHL algorithm with \begin{equation} e^{-iF\,\delta t }=e^{-i \gamma^{-1}\,\mathbf{I}\,\delta t/\text{Tr}F }\, e^{-i \left[\begin{array}{cc} 0 & \mathbf{1}^T\\ \mathbf{1} & \mathbf{0}\end{array}\right]\delta t/\text{Tr}F}\,e^{-i K\,\delta t/\text{Tr}F } + \mathcal{O}(\delta t^2)\,.\end{equation} Especially, $e^{-i K\,\delta t/\text{Tr}F}$ is achieved from the constructed $e^{i\hat{K}\,\delta t}$ in the second step by a time rescaling $\frac{\text{Tr}K}{\text{Tr}F}$ with $\text{Tr}K$ computed in the third step.

Search This Blog

Life with Physics