L1 regression (Lasso)
L1 regularization is to add a L1-norm term: \begin{equation}\hat{\boldsymbol{\beta}}^{\text{lasso}}=\text{argmin}_{\boldsymbol{\beta}}\left\{\sum_{i=1}^n\left(y^{(i)}-\beta_0-\sum_{j=1}^p \beta_jx^{(i)}_{j}\right)^2+\lambda\sum_{j=1}^p \left|\beta_j\right|\right\}\,.\end{equation} Note that there is no regularization for the interception term $\beta_0$ since $\beta_0$ is a global shift to all $y$. We can remove $\beta_0$ by making $\mathbf{y}$ and $\mathbf{X}$ centered. With centered inputs , we have \begin{equation}\hat{\boldsymbol{\beta}}^{\text{lasso}}=\text{argmin}_{\boldsymbol{\beta}}\left\{(\mathbf{y}-\mathbf{X}\boldsymbol{\beta})^T(\mathbf{y}-\mathbf{X}\boldsymbol{\beta})+\lambda \left|\left|\boldsymbol{\beta}\right|\right|_1\right\}\,.\tag{1}\end{equation} Compared with ridge , lasso can be used for continuous subset (feature) selection besides regularization since it leads to sparse coefficients. The argument is from an equivalent optimization problem of (1): \begin{equation}