Note on Denoising Diffusion Probabilistic Models
I've recently discovered a fantastic online course titled " TinyML and Efficient Deep Learning Computing " taught by Prof. Song Han at MIT. This course delves into the latest advancements in large language models and generative AI. While Lecture 16 provides a comprehensive overview on diffusion models and their recent generalizations, it skips some mathematical details regarding Denoising Diffusion Probabilistic Models (DDPM). This post serves as my notes on these skipped mathematical details from the lecture. Especially, We provide a simplified and much more transparent derivation on the training loss than the one presented in the DDPM paper . We show that the dropped $L_T$ term in the DDPM paper should not appear at all if we start with the correct loss. No special treatment is needed for the $L_0$ term in the DDPM paper , i.e. $L_{t-1}$ is applicable for $t=1$ as well. Forward diffusion process The forward diffusion process is to gradually add white noi