Exponential family

An exponential family is a large class of distributions with the form \begin{equation} \mathbb{P}\left(Y|\theta\right)= \frac{1}{Z(\theta)} g(Y)\,e^{\theta \,H(Y)}\,.\end{equation} Such form is the Boltzmann distribution in statistical physics, where $Y$ is a microstate of the ensemble; $H(Y)$ and $g(Y)$ are the energy and degeneracy; $\theta$ relates to the temperature; Finally, $Z(\theta)$ is the partition function: \begin{equation}Z(\theta)=\int g(y)\,e^{\theta \,H(y)}\,dy\,. \end{equation} As well known in statistical physics, from partition function we can compute \begin{equation}\mu(\theta) \equiv\mathbb{E}\left[H(Y) |\theta\right]=\frac{d}{d\theta} \log Z(\theta)\,,\tag{1}\end{equation}\begin{equation}\sigma^2(\theta) \equiv \text{Var}\left[H(Y)|\theta\right] = \frac{d^2}{d\theta^2} \log Z(\theta)\,. \end{equation} 

Given observations $Y_1, \cdots, Y_n$, the log-likelihood of the exponential family is \begin{equation} l_n(\theta)\equiv \frac{1}{n}\log \sum_{i=1}^n\mathbb{P}\left(Y_i|\theta\right)=\frac{\theta}{n} \sum_{i=1}^n\,H(Y_i)-\log Z(\theta)\,.\tag{2}\end{equation} In the last equal sign, we drop terms without $\theta$. Since \begin{equation} l_n''(\theta) = -\frac{d^2}{d\theta^2} \log Z(\theta)=-\sigma^2(\theta) \leq 0\,,\end{equation} $l_n(\theta)$ is convex and thus easy to optimize. 

Examples of exponential family: 
  • Bernoulli \begin{equation} \mathbb{P}(Y|p)=p^Y(1-p)^{1-Y}=(1-p)\exp\left[Y\log \frac{p}{1-p}\right]\,.\end{equation} We have the correspondences: $ H(Y)= Y$, $g(Y)=1$, $\theta = \log \frac{p}{1-p}$ and $Z(\theta)=\frac{1}{1-p}=1+e^{\theta}$. Furthermore, we can compute \begin{equation}\mu(\theta) = \frac{1}{1+e^{-\theta}}\,, \end{equation} which is the sigmoid function in logistic regression. The log-likelihood in terms of $h(\theta)$ is \begin{equation} l_n(\theta) = \frac{1}{n}\sum_{i=1}^n \left[Y_i\log \mu(\theta) + (1-Y_i) \log(1-\mu(\theta))\right]\,.\end{equation}
  • Gaussian with known variance \begin{equation} \mathbb{P}(Y|\mu)=\frac{1}{\sqrt{2\pi}\sigma}\exp\left[-\frac{(Y-\mu)^2}{2\sigma^2}\right]\,.\end{equation} We have the correspondences: $ H(Y)= {Y}/{\sigma}$, $g(Y)=e^{-Y^2/2\sigma^2}/\sqrt{2\pi\sigma^2}$, $\theta = \mu /\sigma$ and $Z(\theta)=e^{\theta^2/2}$, and thus $\mu(\theta)=\theta$ and $l_n(\theta)=\frac{\theta}{n}\sum_{i=1}^n Y_i -\frac{\theta^2}{2}$.
  • More examples can be found in this Wiki.

Comments

Popular posts from this blog

529 Plan

How to offset W2 tax

Retirement Accounts