Chapter 4 Generalized Linear Models
4.1 Formal structure for the class of generalized Linear Models
Generalized linear Models are a large class of statistical models defined such that:
We have collected independently a set of responses \(y_i\) as well as the values for some explanatory variables stored in the vector \(x_i\). In other words, observations that are collected are \(\lbrace (y_i,x_i)\rbrace_{i=1,\cdots,N}\).
The response \(y_i\) has a distribution \(p_{y|\theta}(y_i|\theta_i)\) that is a member of the exponential family, indexed by the parameter \(\theta_i\) that is related to the expectation of the response \(\mathbb{E}[y_i]\).
A model is constructed by linking the expectation of the response \(\mathbb{E}[y_i]=\mu_i\) with the linear predictor \(x_i^{T}\beta=\beta_0+\beta_1\ x_{1i}+\cdots+\beta_k\ x_{ki}\) such that \(g(\mu_i)=x_i^{T}\beta\) with \(g\) the link function. The expected response is: \[\mathbb{E}[y_i]=g^{-1}\left(x_i^{T}\beta\right)\]
The link function \(g\) is a monotonic differentiable function (hence the inverse function \(g^{-1}\) exists).
The joint density function of the responses given the parameters \(\theta_i \propto \mathbb{E}[y_i]\) corresponds to: \[\begin{array}{ll} \mathcal{L}(\theta_1,\cdots\theta_N)&= p(y_1,\cdots, y_N | \theta_1,\cdots\theta_N)\\ & = \prod_{i=1}^N p_{y|\theta}(y_i|\theta_i)\\ \end{array} \label{eq:glm:saturated}\] When the likelihood \(\mathcal{L}(\theta_1,\cdots\theta_N)\) is unconstraint (the model is said to be saturated) then the maximum likelihood solution corresponds to (\(\forall i=1,\cdots,N\)) \[\begin{array}{ll} \hat{\theta}_i&=\arg\max_{\theta_i}\mathcal{L}(\theta_1,\cdots\theta_N)\\ &=\arg\max_{\theta_i} p_{y|\theta}(y_i|\theta_i) \\ \end{array}\] The maximum likelihood estimate \(\hat{\theta}_i\) of the saturated model is found by solving \(\frac{\partial p_{y|\theta}(y_i|\theta_i)}{\partial \theta_i} =0\).
When \(\mathbb{E}[y_i]\) (e.g. \(\theta_i\)) is related to \(x_i^{T}\beta\) using a link function \(g\), then the likelihood function can be rewritten as a function of \(\beta\) (and the explanatory variables): \[\begin{array}{ll} \mathcal{L}(\beta)&=p(y_1,\cdots, y_N | x_1,\cdots,x_N,\beta)\\ & = \prod_{i=1}^N p_{y|\theta}(y_i|x_i,\beta)\\ \end{array} \label{eq:glm:def2}\] The maximum likelihood estimate \(\hat{\beta}\) corresponds to: \[\hat{\beta}=\arg\max_{\beta} \mathcal{L}(\beta)\] and this estimate is found by solving \(\frac{\partial \mathcal{L}(\beta)}{\partial \beta}=0\). With this estimate, we can propose the following model for linking \(\mathbb{E}[y]\) with any input explanatory variables \(x\): \[\hat{\theta}(x)=g^{(-1)}(x^{T} \hat{\beta})\]
4.2 Statistical analysis with GMLs
Different keywords are used for statistical techniques that are GLMs:
Linear regression: the natural link function \(g\) is the identity (\(\theta\in \mathbb{R}\)).
Poisson regression: the natural link function \(g\) is the log (\(\theta\in \mathbb{R}^{+*}\)).
Binomial regression: the natural link function \(g\) is the logit function (\(\theta\in [0,1]\)): \[g(\theta)=\log\left(\frac{\theta}{1-\theta}\right)\]
Survival analysis: the natural link function will be the log function.
Note how these proposed link functions relate to the function \(b(\theta)\) defined for distributions in canonical form in the exponential family of distributions. Other link functions can be used.
Note that in some experiments the observations collected are \(y_i\) the response, \(x_i\) the explanatory variables and in addition a value \(n_i\) is provided e.g. when \(y_i\) is the number of successes in \(n_i\) trials. This is an indication that the Binomial distribution is appropriate to model the response.