Balanced L1 loss
Smooth L1 loss
For regression tasks, we usually start by the L2 loss function ${\cal{l}}_2(x)=x^2 / 2$. Since its gradient is linear, i.e., ${\cal{l}}'_2(x)=x \propto\sqrt{{\cal{l}}_2(x)}$, the batch gradient is dominated by data examples with large L2 losses (outliers).
Starting from Fast R-CNN, one usually applies the so-called smooth L1 loss for object detection. The motivation is very simple: truncating the gradient from linear to some constant $\gamma$ if x is too large, leading to a truncated gradient function \begin{equation} \frac{d}{dx}\text{Smooth-}{\cal{l}}_1(x) = \begin{cases} x & |x| < 1 \\ \pm 1 & |x| \geq 1 \end{cases} \,,\tag{1}\end{equation} in which $\gamma=1$ for gradient continuity at $|x|=1$. In addition, one can introduce a scaling factor $\beta$ by replacing $x$ in Eq. (1) with $x/\beta$ and obtain a more general gradient function as \begin{equation} \frac{d}{dx}\text{Smooth-}{\cal{l}}_1(x) = \begin{cases} x/\beta & |x| < \beta \\ \pm 1 & |x| \geq \beta \end{cases} \,.\tag{2}\end{equation}
Finally, by integrating Eq. (2) with respect to $x$ and using the boundary condition $\text{Smooth-}{\cal{l}}_1(0)=0$ as well as the continuous condition $\text{Smooth-}{\cal{l}}_1(\beta^+)=\text{Smooth-}{\cal{l}}_1(\beta^-)$, one can derive the form of smooth L1 loss as \begin{equation} \text{Smooth-}{\cal{l}}_1(x) = \begin{cases} x^2/2\beta & |x| < \beta \\ |x| -\beta / 2 & |x| \geq \beta \end{cases} \,.\tag{3}\end{equation}
Note:
- The form of Eq. (3) agrees with that in the pytorch doc.
- The default value of $\beta$ is $1$ in pytorch implementation but $1/9$ in the mask R-CNN benchmark. (No idea why 1/9).
- As seen in Eq. (2), a smaller $\beta$ leads to a larger gradient, equivalently weighting more on inliners' losses.
Balanced L1 Loss
Inspired by the above motivation from L2 loss to smooth L1 loss, Libra R-CNN proposes the so-called balanced L1 loss by further downgrading the linear gradient in Eq. (2) to a logarithmic form: \begin{equation} \frac{d}{dx}\text{Balanced-}{\cal{l}}_1(x) = \begin{cases} \alpha \log\left(1+ b |x| /\beta\right) & |x| < \beta \\ \pm \gamma & |x| \geq \beta \end{cases} \,.\tag{4}\end{equation} The gradient continuity at $|x|=\beta$ imposes the constraint $\gamma = \alpha \log(1+b)$.
We integrate Eq. (4) and take into account the boundary condition $\text{Balanced-}{\cal{l}}_1(0)=0$ as well as the continuous condition $\text{Balanced-}{\cal{l}}_1(\beta^+)=\text{Balanced-}{\cal{l}}_1(\beta^-)$. The final result is \begin{equation} \text{Balanced-}{\cal{l}}_1(x) = \begin{cases} \frac{\alpha}{b}\left(b|x|+\beta\right)\log(1 + b|x| /\beta) -\alpha |x| & |x| < \beta \\ \gamma |x| + (\gamma / b -\alpha) \beta & |x| \geq \beta \end{cases} \,.\tag{5}\end{equation}
Note:
- The function $L_b(x)$ in original Libra R-CNN is a only special case of Eq. (5) when $\beta=1$.
- As a sanity check, $\text{Balanced-}{\cal{l}}_1(x)$ in Eq. (5) is equal to $\beta L_b(x/\beta)$, agreeing with the scaling law.
- The mmdet implementation is NOT correct when $\beta \neq 1$.
- The default values are $\alpha=0.5$ and $\gamma=1.5$.
Comments
Post a Comment