Posts

Showing posts from March, 2021

Balanced L1 loss

Smooth L1 loss   For regression tasks, we usually start by the L2 loss function ${\cal{l}}_2(x)=x^2 / 2$. Since its gradient is linear, i.e., ${\cal{l}}'_2(x)=x \propto\sqrt{{\cal{l}}_2(x)}$, the batch gradient is dominated by data examples with large L2 losses (outliers). Starting from Fast R-CNN , one usually applies the so-called smooth L1 loss for object detection. The motivation is very simple: truncating the gradient from linear to some constant $\gamma$ if x is too large, leading to a truncated gradient function \begin{equation} \frac{d}{dx}\text{Smooth-}{\cal{l}}_1(x) =  \begin{cases} x & |x| < 1  \\ \pm 1 & |x| \geq 1 \end{cases} \,,\tag{1}\end{equation} in which $\gamma=1$ for gradient continuity at $|x|=1$. In addition, one can introduce a scaling factor $\beta$ by replacing $x$ in Eq. (1) with $x/\beta$ and obtain a more general gradient function as \begin{equation} \frac{d}{dx}\text{Smooth-}{\cal{l}}_1(x) =  \begin{cases} x/\beta & |x| < \beta  \\ \pm