Life with Physics

Posts

Showing posts with the label computer vision

Note on Denoising Diffusion Probabilistic Models

April 02, 2024

I've recently discovered a fantastic online course titled " TinyML and Efficient Deep Learning Computing " taught by Prof. Song Han at MIT. This course delves into the latest advancements in large language models and generative AI. While Lecture 16 provides a comprehensive overview on diffusion models and their recent generalizations, it skips some mathematical details regarding Denoising Diffusion Probabilistic Models (DDPM). This post serves as my notes on these skipped mathematical details from the lecture. Especially, We provide a simplified and much more transparent derivation on the training loss than the one presented in the DDPM paper . We show that the dropped $L_T$ term in the DDPM paper should not appear at all if we start with the correct loss. No special treatment is needed for the $L_0$ term in the DDPM paper , i.e. $L_{t-1}$ is applicable for $t=1$ as well. Forwa...

Note on convolutions

August 08, 2020

For simplicity, we consider a 1d convolution on 1d input array as an example. Most important parameters of a convolution are kernel size, stride and padding, denoted by $K, S, P$, respectively. Let $I$ be the size of 1d input array, the index of padded array can be viewed of value from $-P$ to $I-1+P$ (inclusive). The convolution can be interpreted in a sliding window picture: At the start (0 step), the first element in the window is aligned at the index $-P$ of the input array; In each step, the window slides $S$ elements along the input array. That is, in the $i$-th step, the first element in the window is aligned at the index $-P+i*S$ and the last element is thus aligned at the index $-P+i*S + K-1$, leading to a constraint $-P+i*S + K-1\leq I-1+P$. As a result, we have $0 \leq i \leq (I + 2P-K) / S$ and the size of output array is \begin{equation}\left\lfloor\frac{I + 2P - K}{S}\right\rfloor + 1\,.\end{equation} The above process can be summarized in the python code: 1 2 3...

Iterative Closest Point

August 01, 2020

Iterative closest point (ICP) algorithm is to align two point clouds iteratively: For each point $\mathbf{x}^{(i)}$ in the query point cloud, find its corresponding point $\mathbf{y}^{(\cal{C}_i)}$ in the reference point cloud using the current estimation of rotation matrix $\mathbf{R}$ and translation $\mathbf{t}$: \begin{equation}\cal{C}_i = \arg\min_{j} \left|\left|\mathbf{y}^{(j)}-\mathbf{R}\mathbf{x}^{(i)}-\mathbf{t}\right|\right|_2^2\,.\tag{1}\end{equation} Update the rotation matrix $\mathbf{R}$ and translation $\mathbf{t}$ by optimizing \begin{equation}\min_{\mathbf{R},\mathbf{t}} \sum_{i=1}^n\left|\left|\mathbf{y}^{(\cal{C}_i)}-\mathbf{R}\mathbf{x}^{(i)}-\mathbf{t}\right|\right|_2^2\,.\tag{2}\end{equation} To solve (2), we first take derivative on $\mathbf{t}$ and obtain \begin{equation}\mathbf{t} = \frac{1}{n}\sum_{i=1}^n\mathbf{y}^{(\cal{C}_i)}-\mathbf{R}\,\frac{1}{n}\sum_{i=1}^n\mathbf{x}^{(i)}\,.\tag{3}\end{equation} As a result, we can remove $\mathbf{t}$ from (...

Kalman filter

July 05, 2020

Conditional mean and variance of Gaussian Let $X, Y$ be joint Gaussian, i.e. \begin{eqnarray}\left[\begin{array}{c} X \\ Y\end{array}\right] \,\sim \, {\cal{N}}\left(\left[\begin{array}{c}\mu_X \\ \mu_Y\end{array}\right],\,\left[\begin{array}{cc}\Sigma_{XX} & \Sigma_{XY} \\ \Sigma_{YX} & \Sigma_{YY}\end{array}\right]\right)\,,\end{eqnarray} what is $\mathbb{E}(X|Y)$ and $\text{Var}(X|Y)$? Trick: Find constant matrix $A$ such that $X+AY$ is independent with $Y$. Then we have \begin{eqnarray} \mathbb{E}(X|Y) &=& \mathbb{E}(X + AY|Y) - \mathbb{E}(AY|Y) = \mathbb{E}(X + AY) - AY\nonumber\\ &=&\mu_X + A\left(\mu_Y - Y\right)\,,\nonumber\\ \text{Var}(X|Y) &=& \text{Var}(X+AY|Y) = \text{Var}(X+AY) = \text{Cov}(X+AY, X^T + Y^T A^T)\nonumber\\ &=&\Sigma_{XX} + \Sigma_{XY} A^T + A \Sigma_{YX} + A \Sigma_{YY} A^T\,. \nonumber \end{eqnarray} Finally, from $0=\text{Cov}(X+AY, Y)=\Sigma_{XY}+A\Sigma_{YY}$, we have $A=-\Sigma_{XY}\Sigma_{YY}^{-1}$. As a re...

Interview problem: Implement convex hull

July 03, 2020

Graham's scan $\text{arctan2}(y,x)$ can be replaced by \begin{equation}\text{sign}(y)\,\left(1 - \frac{x}{|x|+|y|}\right)\end{equation} since they shared the same monotonicity, as proposed here . Collinearity is tricky compared to monotone chain. Here is my C++ implementation that is the solution of Leetcode 587 : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 struct Point { int x ; int y ; double angle ; int dist ; bool operator < ( const Point & other ) { return angle < other . angle || ( angle == other . angle && dist < other . dist ); } }; class Solution { public: vector < vector < int >> outerTrees ( vector < vector < int >>& points ) { const auto & min_iter = min_elemen...