Balanced L1 loss

Smooth L1 loss  

For regression tasks, we usually start by the L2 loss function ${\cal{l}}_2(x)=x^2 / 2$. Since its gradient is linear, i.e., ${\cal{l}}'_2(x)=x \propto\sqrt{{\cal{l}}_2(x)}$, the batch gradient is dominated by data examples with large L2 losses (outliers).

Starting from Fast R-CNN, one usually applies the so-called smooth L1 loss for object detection. The motivation is very simple: truncating the gradient from linear to some constant $\gamma$ if x is too large, leading to a truncated gradient function \begin{equation} \frac{d}{dx}\text{Smooth-}{\cal{l}}_1(x) =  \begin{cases} x & |x| < 1  \\ \pm 1 & |x| \geq 1 \end{cases} \,,\tag{1}\end{equation} in which $\gamma=1$ for gradient continuity at $|x|=1$. In addition, one can introduce a scaling factor $\beta$ by replacing $x$ in Eq. (1) with $x/\beta$ and obtain a more general gradient function as \begin{equation} \frac{d}{dx}\text{Smooth-}{\cal{l}}_1(x) =  \begin{cases} x/\beta & |x| < \beta  \\ \pm 1 & |x| \geq \beta \end{cases} \,.\tag{2}\end{equation}

Finally, by integrating Eq. (2) with respect to $x$ and using the boundary condition $\text{Smooth-}{\cal{l}}_1(0)=0$ as well as the continuous condition $\text{Smooth-}{\cal{l}}_1(\beta^+)=\text{Smooth-}{\cal{l}}_1(\beta^-)$, one can derive the form of smooth L1 loss as \begin{equation} \text{Smooth-}{\cal{l}}_1(x) =  \begin{cases} x^2/2\beta & |x| < \beta  \\ |x| -\beta / 2 & |x| \geq \beta \end{cases} \,.\tag{3}\end{equation}

Note:

  • The form of Eq. (3) agrees with that in the pytorch doc.
  • The default value of $\beta$ is $1$ in pytorch implementation but $1/9$ in the mask R-CNN benchmark. (No idea why 1/9).
  • As seen in Eq. (2), a smaller $\beta$ leads to a larger gradient, equivalently weighting more on inliners' losses. 

Balanced L1 Loss

Inspired by the above motivation from L2 loss to smooth L1 loss, Libra R-CNN proposes the so-called balanced L1 loss by further downgrading the linear gradient in Eq. (2) to a logarithmic form: \begin{equation} \frac{d}{dx}\text{Balanced-}{\cal{l}}_1(x) =  \begin{cases} \alpha \log\left(1+ b |x| /\beta\right) & |x| < \beta  \\ \pm \gamma & |x| \geq \beta \end{cases} \,.\tag{4}\end{equation} The gradient continuity at $|x|=\beta$ imposes the constraint $\gamma = \alpha \log(1+b)$.

We integrate Eq. (4) and take into account the boundary condition $\text{Balanced-}{\cal{l}}_1(0)=0$ as well as the continuous condition $\text{Balanced-}{\cal{l}}_1(\beta^+)=\text{Balanced-}{\cal{l}}_1(\beta^-)$. The final result is \begin{equation} \text{Balanced-}{\cal{l}}_1(x) =  \begin{cases} \frac{\alpha}{b}\left(b|x|+\beta\right)\log(1 + b|x| /\beta) -\alpha |x| & |x| < \beta  \\ \gamma  |x| + (\gamma / b -\alpha) \beta & |x| \geq \beta \end{cases} \,.\tag{5}\end{equation}

Note:

  • The function $L_b(x)$ in original Libra R-CNN is a only special case of Eq. (5) when $\beta=1$.
  • As a sanity check, $\text{Balanced-}{\cal{l}}_1(x)$ in Eq. (5) is equal to $\beta L_b(x/\beta)$, agreeing with the scaling law.
  • The mmdet implementation is NOT correct when $\beta \neq 1$.
  • The default values are $\alpha=0.5$ and $\gamma=1.5$.
As demonstrated in CenterNet3D, replacing smooth L1 loss with balanced L1 loss indeed improves the training of 3D object detectors from LiDAR point cloud. 

Comments

Popular posts from this blog

529 Plan

How to offset W2 tax

Retirement Accounts