Balanced L1 loss
Smooth L1 loss
For regression tasks, we usually start by the L2 loss function {\cal{l}}_2(x)=x^2 / 2. Since its gradient is linear, i.e., {\cal{l}}'_2(x)=x \propto\sqrt{{\cal{l}}_2(x)}, the batch gradient is dominated by data examples with large L2 losses (outliers).
Starting from Fast R-CNN, one usually applies the so-called smooth L1 loss for object detection. The motivation is very simple: truncating the gradient from linear to some constant \gamma if x is too large, leading to a truncated gradient function \begin{equation} \frac{d}{dx}\text{Smooth-}{\cal{l}}_1(x) = \begin{cases} x & |x| < 1 \\ \pm 1 & |x| \geq 1 \end{cases} \,,\tag{1}\end{equation}
Finally, by integrating Eq. (2) with respect to x and using the boundary condition \text{Smooth-}{\cal{l}}_1(0)=0 as well as the continuous condition \text{Smooth-}{\cal{l}}_1(\beta^+)=\text{Smooth-}{\cal{l}}_1(\beta^-), one can derive the form of smooth L1 loss as \begin{equation} \text{Smooth-}{\cal{l}}_1(x) = \begin{cases} x^2/2\beta & |x| < \beta \\ |x| -\beta / 2 & |x| \geq \beta \end{cases} \,.\tag{3}\end{equation}
Note:
- The form of Eq. (3) agrees with that in the pytorch doc.
- The default value of \beta is 1 in pytorch implementation but 1/9 in the mask R-CNN benchmark. (No idea why 1/9).
- As seen in Eq. (2), a smaller \beta leads to a larger gradient, equivalently weighting more on inliners' losses.
Balanced L1 Loss
Inspired by the above motivation from L2 loss to smooth L1 loss, Libra R-CNN proposes the so-called balanced L1 loss by further downgrading the linear gradient in Eq. (2) to a logarithmic form: \begin{equation} \frac{d}{dx}\text{Balanced-}{\cal{l}}_1(x) = \begin{cases} \alpha \log\left(1+ b |x| /\beta\right) & |x| < \beta \\ \pm \gamma & |x| \geq \beta \end{cases} \,.\tag{4}\end{equation}
We integrate Eq. (4) and take into account the boundary condition \text{Balanced-}{\cal{l}}_1(0)=0 as well as the continuous condition \text{Balanced-}{\cal{l}}_1(\beta^+)=\text{Balanced-}{\cal{l}}_1(\beta^-). The final result is \begin{equation} \text{Balanced-}{\cal{l}}_1(x) = \begin{cases} \frac{\alpha}{b}\left(b|x|+\beta\right)\log(1 + b|x| /\beta) -\alpha |x| & |x| < \beta \\ \gamma |x| + (\gamma / b -\alpha) \beta & |x| \geq \beta \end{cases} \,.\tag{5}\end{equation}
Note:
- The function L_b(x) in original Libra R-CNN is a only special case of Eq. (5) when \beta=1.
- As a sanity check, \text{Balanced-}{\cal{l}}_1(x) in Eq. (5) is equal to \beta L_b(x/\beta), agreeing with the scaling law.
- The mmdet implementation is NOT correct when \beta \neq 1.
- The default values are \alpha=0.5 and \gamma=1.5.
Comments
Post a Comment