Chapter 10 Multiple Regression

10.1 Reminder: Least Squares algorithm

Here, we look only at the case of

a Normal distribution
with the identity link function

In this case, having collected \(N\) independent responses \(\lbrace y_i \rbrace_{i=1,\cdots,N}\), the likelihood function is: \[f(y_1, y_2,\cdots, y_N;\theta_1,\theta_2,\cdots,\theta_N)=\prod_{i=1}^N f(y_i;\theta_i)=\prod_{i=1}^N \frac{1}{\sqrt{2\pi}\sigma} \exp\left( \frac{-(y_i-\theta_i)^2}{2\sigma^2}\right)\quad \text{(independence)}\] Now considering the link \(\theta_i=\mathbf{x}_i^{T}\pmb{\beta}=\beta_1+\beta_2\ x_{1,i}+\cdots+\beta_p\ x_{p-1,i}\) then the likelihood becomes: \[f(y_1, y_2,\cdots, y_N;\pmb{\beta})=\prod_{i=1}^N \frac{1}{\sqrt{2\pi}\sigma} \exp\left( \frac{-(y_i-\mathbf{x}_i^{T}\pmb{\beta})^2}{2\sigma^2}\right)\] The log likelihood function is: \[\log f(y_1, y_2,\cdots, y_N;\pmb{\beta} )=- N\log\left (\sqrt{2\pi}\sigma \right)-\sum_{i=1}^N\frac{(y_i-\mathbf{x}_i^{T}\pmb{\beta})^2}{2\sigma^2}\] Finding the estimate \(\pmb{\hat{\beta}}\) that maximises the likelihood is equivalent to finding the estimate \(\pmb{\hat{\beta}}\) that minimises the negative of the log-likelihood function. So \(\pmb{\hat{\beta}}\) is found by making the derivative of \(-\log f(y_1, y_2,\cdots, y_N;\pmb{\beta} )\) equal to 0, which is equivalent to: \[\frac{\partial}{\partial \pmb{\beta}} \sum_{i=1}^N (y_i-\mathbf{x}_i^{T}\pmb{\beta})^2=0\] So maximising the likelihood is equivalent to minimising the sum of square error (SSE) when defining the error to be \(\epsilon_i=y_i-\mathbf{x}_i^{T}\pmb{\beta}\). Lets define the vector \(\pmb{\epsilon}\) concatenating \(\epsilon_1, \cdots,\epsilon_N\), the vector \(\mathbf{y}=[y_1,\cdots,y_N]^T\) and the matrix \(\mathrm{X}\): \[\mathrm{X}= \left\lbrack \begin{array}{cccc} 1 & x_{1,1} &\cdots & x_{p-1,1}\\ 1 & x_{1,2} &\cdots & x_{p-1,2}\\ \vdots &\vdots & &\vdots\\ 11 & x_{1,N} &\cdots & x_{p-1,N}\\ \end{array}\right\rbrack\] Then \(\pmb{\epsilon}=\mathbf{y}-\mathrm{X} \pmb{\beta}\). Minimising the SSE corresponds to finding \(\pmb{\beta}\): \[\begin{array}{ll} \pmb{\hat{\beta}}&=\arg\min_{\pmb{\beta}} \left\lbrace SSE=\sum_{i=1}^{n}\epsilon_i^2=\|\pmb{\epsilon}\|^2\right\rbrace\\ &=\arg\min_{\pmb{\beta}} \left\lbrace SSE=\|\mathbf{y}-\mathrm{X}\pmb{\beta}\|^2\right\rbrace\\ &=\arg\min_{\pmb{\beta}} \left\lbrace SSE=(\mathbf{y}-\mathrm{X}\pmb{\beta})^{T}(\mathbf{y}-\mathrm{X}\pmb{\beta})\right\rbrace\\ &=\arg\min_{\pmb{\beta}} \left\lbrace SSE=\mathbf{y}^{T}\mathbf{y}-\mathbf{y}^{T}\mathrm{X}\pmb{\beta}-\pmb{\beta}^{T}\mathrm{X}^{T}\mathbf{y}-\pmb{\beta}^{T}\mathrm{X}^{T}\mathrm{X}\pmb{\beta}\right\rbrace\\ \end{array}\] The derivative of the SSE w.r.t. \(\pmb{\beta}\) is is: \[\begin{array}{ll} \frac{\partial}{\partial \pmb{\beta}} SSE &=\frac{\partial}{\partial \pmb{\beta}} \left\lbrace \mathbf{y}^{T}\mathbf{y}-\mathbf{y}^{T}\mathrm{X}\pmb{\beta}-\pmb{\beta}^{T}\mathrm{X}^{T}\mathbf{y}-\pmb{\beta}^{T}\mathrm{X}^{T}\mathrm{X}\pmb{\beta}\right\rbrace\\ &=0-(\mathbf{y}^{T}\mathrm{X})^T -\mathrm{X}^{T}\mathbf{y}+\mathrm{X}^{T}\mathrm{X}\pmb{\beta}+(\mathrm{X}^{T}\mathrm{X})^T\pmb{\beta}\\ &= -2\mathrm{X}^T\mathbf{y}+2\mathrm{X}^{T}\mathrm{X}\pmb{\beta}\\ \end{array}\] So the solution \(\pmb{\hat{\beta}}\) for which this derivative is 0 is \[\pmb{\hat{\beta}}=(\mathrm{X}^{T}\mathrm{X})^{-1}\mathrm{X}^{T}\mathbf{y} \quad \text{(Least Square estimate)} \label{eq:LS}\] This is the maximum likelihood estimate when considering a Normal distribution with the identity link function.

10.2 Covariance of \(\hat{\beta}\)

Expectation of \(\hat{\beta}\) \[\mathbb{E}[\hat{\beta}]=\mathbb{E}\left\lbrack(\mathrm{X}^{T}\mathrm{X})^{-1}\mathrm{X}^{T}\mathbf{y} \right\rbrack=(\mathrm{X}^{T}\mathrm{X})^{-1}\mathrm{X}^{T}\mathbb{E}\left\lbrack\mathbf{y} \right\rbrack =(\mathrm{X}^{T}\mathrm{X})^{-1}\mathrm{X}^{T}\mathrm{X}\beta=\beta\]

Covariance of \(\hat{\beta}\) \[\mathbb{C}[\hat{\beta}]=\mathbb{E} \left\lbrack(\hat{\beta}-\beta)(\hat{\beta}-\beta)^T\right\rbrack=\mathbb{E} \left\lbrack((\mathrm{X}^{T}\mathrm{X})^{-1}\mathrm{X}^{T}\mathbf{y}-\beta)((\mathrm{X}^{T}\mathrm{X})^{-1}\mathrm{X}^{T}\mathbf{y}-\beta)^T\right\rbrack\] with \(\mathbf{y}=\mathrm{X}\beta+\epsilon\), we get \[\mathbb{C}[\hat{\beta}]=\mathbb{E} \left\lbrack ((\mathrm{X}^{T}\mathrm{X})^{-1}\mathrm{X}^{T} \epsilon)((\mathrm{X}^{T}\mathrm{X})^{-1}\mathrm{X}^{T} \epsilon)^T \right\rbrack =\mathbb{E} \left\lbrack (\mathrm{X}^{T}\mathrm{X})^{-1}\mathrm{X}^{T} \epsilon \epsilon^T ((\mathrm{X}^{T}\mathrm{X})^{-1}\mathrm{X}^{T})^T \right\rbrack\] or \[\mathbb{C}[\hat{\beta}] =\sigma^2 (\mathrm{X}^{T}\mathrm{X})^{-1}(\mathrm{X}^{T}\mathrm{X}) ((\mathrm{X}^{T}\mathrm{X})^{-1})^{T} =\sigma^2 (\mathrm{X}^{T}\mathrm{X})^{-1}\]

Remark \(\nabla_{\hat{\beta}}=0\): \[\mathcal{L}(\beta) = \mathcal{L}(\hat{\beta}) +\frac{1}{2}\ (\beta-\hat{\beta}) \ \mathrm{H}_{\hat{\beta}}\ (\beta-\hat{\beta})^T\] in comparison to Gaussian approximation at \(\hat{\beta}\) \[\mathcal{L}(\beta) =\text{constant}-\frac{1}{2}\ (\beta-\hat{\beta}) \ \Sigma^{-1}\ (\beta-\hat{\beta})^T\] \(-\mathrm{H}_{\hat{\beta}}\) is called the observed information matrix and approximates \(\Sigma^{-1}\). Hence diagonal values of \(-\mathrm{H}_{\hat{\beta}}^{-1}\) approximate the standard errors squared of \(\hat{\beta}\).

10.3 Case study : Carbohydrate Diet

In this dataset (Dobson and Barnett 2008) ( p.96), the response variable \(y\) corresponds to the percentage of total calories obtained from complex carbohydrates for 20 male insulin-dependent diabetics who have been on a high carbohydrate diet for six months. Additional information is collected about the individuals taking part in the study including age (in years), weight (relative to ideal weight) and other calories intake from protein (as percentage).

10.3.1 Fitting with lm() and glm()

carbohydrate=c(33,40,37,27,30,43,34,48,30,38,50,51,30,36,41,42,46,24,35,37)
age=c(33,47,49,35,46,52,62,23,32,42,31,61,63,40,50,64,56,61,48,28)
weight=c(100,92,135,144,140,101,95,101,98,105,108,85,130,127,109,107,117,100,118,102)
protein=c(14,15,18,12,15,15,14,17,15,14,17,19,19,20,15,16,18,13,18,14)

data=cbind(carbohydrate,age,weight,protein)
knitr::kable(t(data), caption = "Carbohydrate diet data")

Table 10.1: Carbohydrate diet data
carbohydrate	33	40	37	27	30	43	34	48	30	38	50	51	30	36	41	42	46	24	35	37
age	33	47	49	35	46	52	62	23	32	42	31	61	63	40	50	64	56	61	48	28
weight	100	92	135	144	140	101	95	101	98	105	108	85	130	127	109	107	117	100	118	102
protein	14	15	18	12	15	15	14	17	15	14	17	19	19	20	15	16	18	13	18	14

#using lm
res.lm=lm(carbohydrate~age+weight+protein)
summary(res.lm)

## 
## Call:
## lm(formula = carbohydrate ~ age + weight + protein)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.3424  -4.8203   0.9897   3.8553   7.9087 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 36.96006   13.07128   2.828  0.01213 * 
## age         -0.11368    0.10933  -1.040  0.31389   
## weight      -0.22802    0.08329  -2.738  0.01460 * 
## protein      1.95771    0.63489   3.084  0.00712 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.956 on 16 degrees of freedom
## Multiple R-squared:  0.4805, Adjusted R-squared:  0.3831 
## F-statistic: 4.934 on 3 and 16 DF,  p-value: 0.01297

#using glm

res.glm=glm(carbohydrate~age+weight+protein,family=gaussian)
summary(res.glm)

## 
## Call:
## glm(formula = carbohydrate ~ age + weight + protein, family = gaussian)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -10.3424   -4.8203    0.9897    3.8553    7.9087  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 36.96006   13.07128   2.828  0.01213 * 
## age         -0.11368    0.10933  -1.040  0.31389   
## weight      -0.22802    0.08329  -2.738  0.01460 * 
## protein      1.95771    0.63489   3.084  0.00712 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 35.47893)
## 
##     Null deviance: 1092.80  on 19  degrees of freedom
## Residual deviance:  567.66  on 16  degrees of freedom
## AIC: 133.67
## 
## Number of Fisher Scoring iterations: 2

10.3.2 Using Math

carbohydrate=c(33,40,37,27,30,43,34,48,30,38,50,51,30,36,41,42,46,24,35,37) # response vector
age=c(33,47,49,35,46,52,62,23,32,42,31,61,63,40,50,64,56,61,48,28)
weight=c(100,92,135,144,140,101,95,101,98,105,108,85,130,127,109,107,117,100,118,102)
protein=c(14,15,18,12,15,15,14,17,15,14,17,19,19,20,15,16,18,13,18,14)

#
X=matrix(1,20,4)
X[,1]=c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
X[,2]=age
X[,3]=weight
X[,4]=protein

# least square estimate
BetaHat=solve(t(X)%*%X)%*%t(X)%*%carbohydrate

Compare the estimated parameters \(\hat{\beta}\)

BetaHat

##            [,1]
## [1,] 36.9600559
## [2,] -0.1136764
## [3,] -0.2280174
## [4,]  1.9577126

with those computed with glm and lm earlier.

res.lm$coefficients

## (Intercept)         age      weight     protein 
##  36.9600559  -0.1136764  -0.2280174   1.9577126

res.glm$coefficients

## (Intercept)         age      weight     protein 
##  36.9600559  -0.1136764  -0.2280174   1.9577126

So we have our Math right for computing estimate \(\hat{\beta}\) !

fitted =X %*% BetaHat
residuals=carbohydrate-fitted
SSE=t(residuals)%*%residuals
sigmahatsq=SSE/(length(carbohydrate)-length(BetaHat))
#Residual standard error 
sqrt(sigmahatsq)

##          [,1]
## [1,] 5.956419

Compare this estimate with the Residual standard error computed with lm .

# uncertainty of BetaHat (standard error)
CovBetaHat=solve(t(X)%*%X)*as.numeric(sigmahatsq)
Std.Error.BetaHat=sqrt(diag(CovBetaHat))
Std.Error.BetaHat

## [1] 13.07128293  0.10932548  0.08328895  0.63489286

Compare these standard errors with those reported in summary(res.lm) and summary(res.glm)

#Multiple R-squared: (coefficient of determination) 
S0=sum((carbohydrate-mean(carbohydrate))*(carbohydrate-mean(carbohydrate)))
Rsq=(S0-SSE)/S0
Rsq

##           [,1]
## [1,] 0.4805428

References

Dobson, A. J., and A. G. Barnett. 2008. An Introduction to Generalized Linear Models. CRC Press, Third Edition.