Posts

Showing posts with the label computer vision

Note on Denoising Diffusion Probabilistic Models

I've recently discovered a fantastic online course titled " TinyML and Efficient Deep Learning Computing " taught by  Prof. Song Han  at MIT. This course delves into the latest advancements in large language models and generative AI. While   Lecture 16  provides a comprehensive overview on diffusion models and their recent generalizations, it skips some mathematical details regarding  Denoising Diffusion Probabilistic Models  (DDPM).  This post serves as my notes on these skipped mathematical details from the lecture. Especially,  We provide a simplified and much more transparent derivation on the training loss than the one presented in the  DDPM paper .  We show that the dropped $L_T$ term in the  DDPM paper  should not appear at all if we start with the correct loss.  No special treatment is needed for the $L_0$ term in the  DDPM paper , i.e. $L_{t-1}$ is applicable for $t=1$ as well.  Forwa...

Note on convolutions

Image
For simplicity, we consider a 1d convolution on 1d input array as an example. Most important parameters of a convolution are kernel size, stride and padding, denoted by $K, S, P$, respectively. Let $I$ be the size of 1d input array, the index of padded array can be viewed of value from $-P$ to $I-1+P$ (inclusive). The convolution can be interpreted in a sliding window picture: At the start (0 step), the first element in the window is aligned at the index $-P$ of the input array; In each step, the window slides $S$ elements along the input array. That is, in the $i$-th step, the first element in the window is aligned at the index $-P+i*S$ and the last element is thus aligned at the index $-P+i*S + K-1$, leading to a constraint $-P+i*S + K-1\leq I-1+P$. As a result, we have $0 \leq i \leq (I + 2P-K) / S$ and the size of output array is \begin{equation}\left\lfloor\frac{I + 2P - K}{S}\right\rfloor + 1\,.\end{equation} The above process can be summarized in the python code: 1 2 3...

Iterative Closest Point

Iterative closest point (ICP) algorithm is to align two point clouds iteratively: For each point $\mathbf{x}^{(i)}$ in the query point cloud, find its corresponding point $\mathbf{y}^{(\cal{C}_i)}$ in the reference point cloud using the current estimation of rotation matrix $\mathbf{R}$ and translation $\mathbf{t}$: \begin{equation}\cal{C}_i = \arg\min_{j} \left|\left|\mathbf{y}^{(j)}-\mathbf{R}\mathbf{x}^{(i)}-\mathbf{t}\right|\right|_2^2\,.\tag{1}\end{equation} Update the rotation matrix $\mathbf{R}$ and translation $\mathbf{t}$ by optimizing \begin{equation}\min_{\mathbf{R},\mathbf{t}} \sum_{i=1}^n\left|\left|\mathbf{y}^{(\cal{C}_i)}-\mathbf{R}\mathbf{x}^{(i)}-\mathbf{t}\right|\right|_2^2\,.\tag{2}\end{equation}  To solve (2), we first take derivative on $\mathbf{t}$ and obtain \begin{equation}\mathbf{t} = \frac{1}{n}\sum_{i=1}^n\mathbf{y}^{(\cal{C}_i)}-\mathbf{R}\,\frac{1}{n}\sum_{i=1}^n\mathbf{x}^{(i)}\,.\tag{3}\end{equation} As a result, we can remove $\mathbf{t}$ from (...

Kalman filter

Conditional mean and variance of Gaussian Let $X, Y$ be joint Gaussian, i.e. \begin{eqnarray}\left[\begin{array}{c} X \\ Y\end{array}\right] \,\sim \, {\cal{N}}\left(\left[\begin{array}{c}\mu_X \\ \mu_Y\end{array}\right],\,\left[\begin{array}{cc}\Sigma_{XX} & \Sigma_{XY} \\ \Sigma_{YX} & \Sigma_{YY}\end{array}\right]\right)\,,\end{eqnarray} what is $\mathbb{E}(X|Y)$ and $\text{Var}(X|Y)$?  Trick: Find constant matrix $A$ such that $X+AY$ is independent with $Y$. Then we have \begin{eqnarray} \mathbb{E}(X|Y) &=& \mathbb{E}(X + AY|Y) - \mathbb{E}(AY|Y) = \mathbb{E}(X + AY) - AY\nonumber\\ &=&\mu_X + A\left(\mu_Y - Y\right)\,,\nonumber\\ \text{Var}(X|Y) &=& \text{Var}(X+AY|Y) = \text{Var}(X+AY) = \text{Cov}(X+AY, X^T + Y^T A^T)\nonumber\\ &=&\Sigma_{XX} + \Sigma_{XY} A^T + A \Sigma_{YX} + A \Sigma_{YY} A^T\,. \nonumber \end{eqnarray} Finally, from $0=\text{Cov}(X+AY, Y)=\Sigma_{XY}+A\Sigma_{YY}$, we have $A=-\Sigma_{XY}\Sigma_{YY}^{-1}$. As a re...

Interview problem: Implement convex hull

Graham's scan $\text{arctan2}(y,x)$ can be replaced by \begin{equation}\text{sign}(y)\,\left(1 - \frac{x}{|x|+|y|}\right)\end{equation} since they shared the same monotonicity, as proposed here . Collinearity is tricky compared to monotone chain. Here is my C++ implementation that is the solution of Leetcode 587 : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 struct Point { int x ; int y ; double angle ; int dist ; bool operator < ( const Point & other ) { return angle < other . angle || ( angle == other . angle && dist < other . dist ); } }; class Solution { public: vector < vector < int >> outerTrees ( vector < vector < int >>& points ) { const auto & min_iter = min_elemen...