R-squared

June 29, 2020

Given

$n$ samples

$y^{(k)}$ and the corresponding fitting values

$\hat{y}^{(k)}$ , one can evaluate the goodness-of-fit by

$\begin{equation} R^2 = 1 - \frac{\sum_{k=1}^n \left(y^{(k)}-\hat{y}^{(k)}\right)^2}{\sum_{k=1}^n\left(y^{(k)}-\bar{y}\right)^2}\,,\tag{1}\end{equation}$

where

$\bar{y}\equiv \frac{1}{n}\sum_{i=1}^n y^{(k)}$ .

Let

$Y=\hat{Y}+e$ , we can write

$R^2$ in short as

$\begin{equation}R^2=1-\frac{\mathbb{E}(e^2)}{\text{Var}(Y)}\,.\end{equation}$ In the following, we will discuss

$R^2$ in the scope of linear regression.

With interception term, as shown in this post, we have $\mathbb{E}(e)=0$ and $\text{Var}(Y)=\text{Var}(\hat{Y})+\text{Var}(e)$ . As a result, $\begin{equation}R^2=1-\frac{\text{Var}(e)}{\text{Var}(Y)}=\frac{\text{Var}(\hat{Y})}{\text{Var}(Y)}\geq 0\,.\tag{2}\end{equation}$
Without interception term, as shown in this post, we can only have $\mathbb{E}(Y^2)=\mathbb{E}(\hat{Y}^2)+\mathbb{E}(e^2)$ . As a result, $\begin{equation} R^2=\frac{\mathbb{E}\left(\hat{Y}^2\right)-\left(\mathbb{E}(Y)\right)^2}{\text{Var}(Y)}\,,\end{equation}$ which can be negative.
In simple linear regression $\hat{Y}=\beta_0+\beta X$ , as shown in this post, $\begin{equation}\beta = \frac{\text{Cov}(X, Y)}{\text{Var}(X)} =\rho_{X,Y}\sqrt{\frac{\text{Var}(Y)}{\text{Var}(X)}}\,,\end{equation}$ and thus $\begin{equation}R^2= \frac{\text{Var}(\hat{Y})}{\text{Var}(Y)}=\beta^2 \frac{\text{Var}(X)}{\text{Var}(Y)}=\rho_{X,Y}^2\,.\tag{3}\end{equation}$

Problem:

If R-squared of linear regressions

$Y\sim X_1$ and

$Y\sim X_2$ are

$R^2_1$ and

$R^2_2$ , what is the range of

$R^2$ of the linear regression

$Y\sim X_1+X_2$ ?

Figure 1: a solid geometry picture of

$Y\sim X_1+X_2$

Solution:

Without loss of generality, assume

$Y, X_1, X_2$ are centered (by interceptions). We consider a solid geometry picture as in Fig. 1, in which

$Y, X_1, X_2$ are represented as lines OA, OB, OC and variance is represented by the length of the corresponding line segment. In such a picture,

$R_1^2=\cos^2 \angle AOB$ ,

$R_2^2=\cos^2\angle AOC$ and

$R^2=\cos^2\angle AOD$ . The value of

$R^2$ depends on the value of

$\angle BOC$ denoted by

$\theta$ . Using solid geometry, we can derive

$\begin{equation} R^2=\frac{R_1^2+R_2^2\pm 2R_1R_2\cos\theta}{\sin^2\theta}\,.\end{equation}$

Assume

$R_1\geq R_2$ ,

$\begin{equation} R^2=\left(\frac{R_2\pm R_1\cos\theta}{\sin\theta}\right)^2+R_1^2\geq R^2_1\,,\end{equation}$

the equality holds when

$\cos^2\theta = R_2^2/R_1^2$ (D coincides with B). On the other side,

$R^2\leq 1$ and the equality holds when

$\cos^2\theta=\left(R_1R_2\pm \sqrt{(1-R_1^2)(1-R_2^2)}\right)^2$ (A is in the plane OBC). In sum,

$\max(R_1^2, R_2^2)\leq R^2\leq 1$ .

Search This Blog

Life with Physics

R-squared

Comments

Post a Comment

Popular posts from this blog

529 Plan

How to offset W2 tax

Retirement Accounts