Posts

Showing posts from March, 2021

Balanced L1 loss

Smooth L1 loss   For regression tasks, we usually start by the L2 loss function {\cal{l}}_2(x)=x^2 / 2. Since its gradient is linear, i.e., {\cal{l}}'_2(x)=x \propto\sqrt{{\cal{l}}_2(x)}, the batch gradient is dominated by data examples with large L2 losses (outliers). Starting from Fast R-CNN , one usually applies the so-called smooth L1 loss for object detection. The motivation is very simple: truncating the gradient from linear to some constant \gamma if x is too large, leading to a truncated gradient function \begin{equation} \frac{d}{dx}\text{Smooth-}{\cal{l}}_1(x) =  \begin{cases} x & |x| < 1  \\ \pm 1 & |x| \geq 1 \end{cases} \,,\tag{1}\end{equation} in which \gamma=1 for gradient continuity at |x|=1. In addition, one can introduce a scaling factor \beta by replacing x in Eq. (1) with x/\beta and obtain a more general gradient function as \begin{equation} \frac{d}{dx}\text{Smooth-}{\cal{l}}_1(x) =  \begin{cases} x/\bet...