Posts

Showing posts from December, 2020

Karush-Kuhn-Tucker (KKT) conditions

When I first learned Andrew Ng's CS229 machine learning, I did not really understand KKT conditions in his lecture note 3 , in which KKT conditions are presented without proof and mixed with the content of Lagrangian duality. Recently, I occasionally read a blog which provides a more clear and intuitive illustration on KKT conditions. Let me try to rephase the argument in this blog. Lagrangian Multiplier KKT conditions are about the optimization under inequality constraints. So before jumping into KKT conditions, we should first recall the optimization under equality constraints. Let's consider the optimization problem \begin{eqnarray} \begin{split} \min_{x}&&\quad f(x) \\ \text{s.t.}&& \quad g(x)=0 \end{split}\tag{1}\end{eqnarray} The key insight is that if $x^*$ is the optimal solution of the problem (1), then \begin{eqnarray} \nabla f(x^*)&=&-\lambda\nabla g(x^*)\tag{2}\\ g(x^*)&=&0\tag{3}\end{eqnarray} for some $\lambda \in \mathbb{R}$. Ther

Support Vector Machine (SVM)

Image
The training dataset of SVM consists of $n$ training examples of the form $\left(x^{(i)}, y^{(i)}\right)$ for $i=1, \cdots, n$, where each ${x}^{(i)}$ is a $p$-dimensional input vector and $y^{(i)}$ is the corresponding label. Furthermore, there is a fixed feature map $\phi$ in SVM that maps an input vector $x$ to a feature vector $\phi(x)$. Such feature map is implicitly specified by a kernel in the dual formulation of SVM. The task of SVM is to find two parallel linear boundaries (hyperplanes) in the feature space by maximizing their margin for either binary classification or regression. SVM as binary classifier For binary classification, $y^{(i)}\in \{1, -1\}$ labels the class to which ${x}^{(i)}$ belongs. The task of SVM is to find two parallel hyperplanes in the feature space that separate the two classes.  Fig. 1: Two parallel hyperplanes in the feature space with the maximum margin for binary classification: (a) Linearly separable data: taken from wiki . (b) Non-linearly separab

Quantum Computing: Quantum Machine Learning

We introduced the HHL algorithm and its subroute quantum phase estimation (QPE) algorithm in this blog . Now we introduce two quantum versions of machine learning algorithms: quantum principal component analysis (qPCA) and quantum support vector machine (qSVM), which replies on QPE and HHL, respectively. Data Preparation Suppose we have $n$ data samples and each data sample has $p$ features, i.e., $\mathbf{x}^{(i)}\equiv[x_1^{(i)}, \cdots, x_p^{(i)}]^T$ for $i=1,\cdots, n$. To apply quantum algorithms, we encode each data sample $\mathbf{x}^{(i)}$ in a quantum state \begin{eqnarray}\mathbf{x}^{(i)}\equiv[x_1^{(i)}, \cdots, x_p^{(i)}]^T \quad\xrightarrow{p=2^m} \quad |\psi_i\rangle\equiv \frac{1}{\left|\left| \mathbf{x}^{(i)}\right|\right|_2}\sum_{\tau\in \{0,1\}^m}x_{\tau}^{(i)}|\tau\rangle\,.\tag{1}\end{eqnarray} To encode the entire dataset, we entangle the data state $|\psi_i\rangle$ with an auxiliary state $|i\rangle$ and form a pure state \begin{equation}|\psi\rangle\equiv\sum_{i=