Quantum Computing: Quantum Machine Learning

We introduced the HHL algorithm and its subroute quantum phase estimation (QPE) algorithm in this blog. Now we introduce two quantum versions of machine learning algorithms: quantum principal component analysis (qPCA) and quantum support vector machine (qSVM), which replies on QPE and HHL, respectively.

Data Preparation

Suppose we have n data samples and each data sample has p features, i.e., \mathbf{x}^{(i)}\equiv[x_1^{(i)}, \cdots, x_p^{(i)}]^T for i=1,\cdots, n. To apply quantum algorithms, we encode each data sample \mathbf{x}^{(i)} in a quantum state \begin{eqnarray}\mathbf{x}^{(i)}\equiv[x_1^{(i)}, \cdots, x_p^{(i)}]^T \quad\xrightarrow{p=2^m} \quad |\psi_i\rangle\equiv \frac{1}{\left|\left| \mathbf{x}^{(i)}\right|\right|_2}\sum_{\tau\in \{0,1\}^m}x_{\tau}^{(i)}|\tau\rangle\,.\tag{1}\end{eqnarray} To encode the entire dataset, we entangle the data state |\psi_i\rangle with an auxiliary state |i\rangle and form a pure state \begin{equation}|\psi\rangle\equiv\sum_{i=1}^n \sqrt{p_i}|\psi_i\rangle|i\rangle\,,\tag{2}\end{equation} where \begin{equation}p_i\equiv \frac{\left|\left| \mathbf{x}^{(i)}\right|\right|^2_2}{\sum_{k=1}^n \left|\left| \mathbf{x}^{(k)}\right|\right|^2_2}\tag{3}\end{equation} comes from the fact that \langle \psi_i|\psi_i\rangle=1 while {\mathbf{x}^{(i)}}^T\cdot \mathbf{x}^{(i)} = \left|\left|\mathbf{x}^{(i)}\right|\right|_2^2.

Quantum Principal Component Analysis (qPCA) 

Classical PCA as described in this blog is to find eigenvectors of the covariance matrix \boldsymbol{\Sigma}=\frac{1}{n}\sum_{i=1}^n\mathbf{x}^{(i)}{\mathbf{x}^{(i)}}^T, assuming the data is centered. The classical covariance matrix corresponds to the density matrix \begin{equation}\rho\equiv\sum_{i=1}^n p_i |\psi_i\rangle\langle \psi_i|\tag{4}\end{equation} in quantum mechanics up to a global constant n/\sum_{k=1}^n \left|\left| \mathbf{x}^{(k)}\right|\right|^2_2. The qPCA algorithm to find the eigenvalues and eigenvectors of the density matrix \rho given multiple copies of \rho as the input:

  1. The first step in qPCA is to construct the unitary operator e^{i\rho t} in order to estimate the eigenvalues of \rho by QPE. Since \rho in general is not sparse, the authors of the original qPCA paper proposed a trick to achieve e^{i\rho t} by a SWAP operation: \begin{eqnarray}\text{Tr}_P\left\{e^{-iS\,\delta t}\rho \otimes \sigma e^{iS\,\delta t}\right\} = e^{-i\rho \,\delta t}\sigma e^{i\rho \,\delta t}\tag{5}\end{eqnarray} for any density matrix \sigma under an infinitesimal time interval \delta\,t, where \text{Tr}_P is the trace partially over the first variable and S is the SWAP operator. As a result, to achieve e^{-i\rho N\,\delta t}\sigma e^{i\rho N\,\delta t}, we need to prepare N copies of \rho and repeat the application on the left side of Eq.(5) for N times.
  2. The second step is to apply QPE on the initial state of \rho itself: \begin{eqnarray}\rho \otimes |0\rangle\langle 0| \xrightarrow{\text{Q.P.E. on }e^{i\rho t}} \sum_{j=1}^p \lambda_j |u_j\rangle\langle u_j| \otimes |\tilde{\lambda}_j\rangle\langle \tilde{\lambda}_j|\,,\tag{6}\end{eqnarray} where \lambda_j and |u_j\rangle are the eigenvalue and eigenstate of \rho. The state |\tilde{\lambda}_j\rangle ecodes the binary representation of the eigenvalue \lambda_j\frac{t}{2\pi}. Finally, sampling from the mixed state (6) gives the eigenvalue \tilde{\lambda}_j and eigenstate |u_j\rangle with a classical probability proportional to \lambda_j
Note:

  1. The proof of Eq. (5) is as follows: without loss of generality, we assume that \rho=|\psi\rangle\langle \psi| and \sigma=|\phi\rangle\langle \phi|. We can compute that \begin{eqnarray}\text{Tr}_P\left\{S\,\rho\otimes \sigma\right\}&=&\text{Tr}_P\left\{S\,|\psi\rangle\langle\psi|\otimes |\phi\rangle\langle\phi|\right\}=\text{Tr}_P\left\{|\phi\rangle\langle\psi|\otimes |\psi\rangle\langle\phi|\right\}=\sum_{i} \langle i| \phi\rangle\langle\psi| i\rangle\,|\psi\rangle\langle\phi|\\ &=& \sum_{i} \langle\psi| i\rangle\langle i| \phi\rangle\,|\psi\rangle\langle\phi| =\langle\psi| \phi\rangle\,|\psi\rangle\langle\phi| =\rho\sigma\,.\end{eqnarray} Similarly, we can prove that \text{Tr}_P\left\{\rho\otimes \sigma\,S\right\}=\sigma\rho and thus \text{Tr}_P\left\{[S, \rho \otimes \sigma]\right\}=[\rho, \sigma]. Therefore, under an infinitesimal time interval \delta t, the left-hand side of Eq. (5) to the first order of \delta t is \begin{eqnarray} \text{L.H.S.}=\text{Tr}_P\left\{(1-iS\,\delta t)\,\rho\otimes\sigma\,(1+iS\,\delta t)\right\}=\text{Tr}_P\left\{\rho\otimes\sigma -i[S,\,\rho\otimes\sigma]\delta t\right\}=\sigma -i [\rho, \sigma]\delta t\,,\end{eqnarray} which equals to the right-hand side of Eq. (5) to the first order of \delta t: \begin{eqnarray}\text{R.H.S.}=(1-i\rho \delta t)\,\sigma\,(1+i\rho \delta t)= \sigma -i [\rho, \,\sigma]\,\delta t\,.\end{eqnarray}
  2. Unlike the explanation of QPE in this blog, we apply QPE on a density matrix as shown in Eq. (6). This is because the density matrix (4) of a mixed state can be viewed as a reduced state of the pure state (2) with additional auxiliary states. Assuming |\psi_i\rangle=\sum_{j=1}^p\beta_j^{(i)}|u_j\rangle, \rho=\sum_{i=1}^n p_i\,|\psi_i\rangle\langle \psi_i|=\sum_{j=1}^p \lambda_j|u_j\rangle\langle u_j| imposes the constraint \sum_{i=1}^n p_i\beta^{(i)}_{j}{\beta^{(i)}_{k}}^*=\lambda_j\delta_{jk}. We apply QPE on the pure state (2) as \begin{eqnarray}\sum_{i=1}^n\sqrt{p_i} |\psi_i\rangle|i\rangle|0\rangle \xrightarrow{\text{Q.P.E. on }e^{i\rho t}} \sum_{i=1}^n\sum_{j=1}^p \sqrt{p_i}\beta^{(i)}_{j}|u_j\rangle|i\rangle|\tilde{\lambda}_j\rangle\,,\end{eqnarray}leading to a reduced density matrix that is exactly the same as the right-hand side of Eq. (6) after tracing out the auxiliary states.

Quantum Support Vector Machine (qSVM) 

Classical LS-SVM as described in this blog is reduced to solve linear equations: \begin{eqnarray}\mathbf{F}\left[\begin{array}{c} b\\ \boldsymbol{\alpha} \end{array}\right] \equiv\left[\begin{array}{cc} 0 & \mathbf{1}^T\\ \mathbf{1} & \mathbf{K} + \gamma^{-1}\mathbf{I}\end{array}\right] \left[\begin{array}{c} b\\ \boldsymbol{\alpha} \end{array}\right] = \left[\begin{array}{c} 0\\ \mathbf{y}\end{array}\right]\,,\end{eqnarray} where \mathbf{K}_{ij}={\mathbf{x}^{(i)}}^T\cdot {\mathbf{x}^{(j)}} is the kernel matrix. The qSVM algorithm is just to solve the normalized equations \hat{F}|b, \alpha\rangle=|y\rangle using the HHL algorithm, where \hat{F}\equiv F/{\text{Tr}F} with ||\hat{F}||\leq 1. The qSVM paper summarizes the algorithm in the following steps:

  1. Prepare the kernel by the partial trace to the pure state (2) over the data states \begin{equation}\text{Tr}_p |\psi\rangle\langle\psi|=\frac{1}{\sum_{k=1}^n \left|\left| \mathbf{x}^{(k)}\right|\right|^2_2}\sum_{i,j=1}^n \langle \psi_i|\psi_j\rangle \left|\left| \mathbf{x}^{(i)}\right|\right|_2 \left|\left| \mathbf{x}^{(j)}\right|\right|_2 |i\rangle\langle j|=\frac{K}{\text{Tr}K}\equiv \hat{K}\,.\end{equation} The normalized kernel \hat{K} is the density matrix of the auxiliary states.
  2. Construct the unitary operator e^{i\hat{K}\,\delta t} using the trick as in Eq. (5).
  3. Compute \text{Tr}K: construct a Hamiltonian \sum_{j=1}^n \left|\left|\mathbf{x}^{(j)}\right|\right|_2 |j\rangle\langle j| \otimes \sigma_x and apply it to the state \frac{1}{\sqrt{n}}\sum_{j=1}^n|j\rangle|0\rangle to reach a state \frac{1}{\sqrt{n}}\sum_{j=1}^n\left[\cos\left(\left|\left|\mathbf{x}^{(j)}\right|\right|_2t\right) |j\rangle|0\rangle+\sin\left(\left|\left|\mathbf{x}^{(j)}\right|\right|_2t\right)|j\rangle|1\rangle\right]. The probability of ending in the state |1\rangle after a measurement is \sum_{i=1}^n \left|\left|\mathbf{x}^{(i)}\right|\right|_2^2 t^2/n when \left(\max \left|\left|\mathbf{x}^{(i)}\right|\right|_2\right) t \ll 1, from which one can estimate the kernel trace \text{Tr}K=\sum_{i=1}^n \left|\left|\mathbf{x}^{(i)}\right|\right|_2^2.
  4. Apply HHL algorithm with \begin{equation} e^{-iF\,\delta t }=e^{-i \gamma^{-1}\,\mathbf{I}\,\delta t/\text{Tr}F }\, e^{-i  \left[\begin{array}{cc} 0 & \mathbf{1}^T\\ \mathbf{1} & \mathbf{0}\end{array}\right]\delta t/\text{Tr}F}\,e^{-i K\,\delta t/\text{Tr}F } + \mathcal{O}(\delta t^2)\,.\end{equation} Especially, e^{-i K\,\delta t/\text{Tr}F} is achieved from the constructed e^{i\hat{K}\,\delta t} in the second step by a time rescaling \frac{\text{Tr}K}{\text{Tr}F} with \text{Tr}K computed in the third step.

Comments

Popular posts from this blog

529 Plan

How to offset W2 tax

Retirement Accounts