Chapter 6 Akaike Information Criterion

6.1 Likelihood and log-likelihood

Considering the independent observations \(\lbrace (y_i,x_i)\rbrace_{i=1,\cdots,N}\) (or \(\lbrace (y_i,x_i,n_i)\rbrace_{i=1,\cdots,N}\)), the likelihood is defined as: \[\mathcal{L}(\theta_1,\cdots,\theta_{N})=\prod_{i=1}^N p_{y|\theta}(y_i|\theta_i)\] with \(p_{y|\theta}\) a probability distribution from the exponential family i.e.: \[\mathcal{L}(\theta_1,\cdots,\theta_{N})=\prod_{i=1}^N \exp\left( a(y_i) \ b(\theta_i) +c(\theta_i)+d(y_i) \right)\] The log transformation of the likelihood is often computed instead: \[\log \mathcal{L}(\theta_1,\cdots,\theta_{N})=\sum_{i=1}^N \left( a(y_i) \ b(\theta_i) +c(\theta_i)+d(y_i) \right)\] and the maximum likelihood estimates for the saturated model are then computed with: \[(\hat{\theta}_1,\cdots,\hat{\theta}_{N})=\arg\max\ \log \mathcal{L}(\theta_1,\cdots,\theta_{N})\] When a generalised linear model is used, a link function \(g\) is used to constraint the parameters such that \(\theta_i\propto g^{-1}(x_i^{T}\beta),\ \forall i=1,\cdots,N\). In this case the likelihood is written \(\mathcal{L}(\beta)\) and the log likelihood is \(\log \mathcal{L}(\beta)\). The parameter \(\beta\) has often a lower dimension than \(\theta s\) (i.e. \(\dim(\beta)\leq N\)) and the maximum likelihood estimate is computed such that: \[\hat{\beta}=\arg\max\ \log \mathcal{L}(\beta)\]

6.2 Comparing working models with the AIC

For analysing data, it is not rare that more than one distribution from the exponential family can be used, that several link functions can be selected, and also, when several explanatory variables are recorded, several relation \(x^{T}\beta\) can be defined. For instance, some explanatory variables may not be helpful in explaining the responses and should not be integrated in the model, limiting the number of parameters \(\beta_0,\beta_1,\cdots\) to estimate.

To compare the different models proposed for the same dataset, several criteria exist to facilitate the selection of the best model. We focus here on the Akaike Information Criterion (AIC) that is given as an output of the function glm in R.

Definition 6.1 (Akaike Information Criterion ) The Akaike Information Criterion is a measure of goodness of fit defined as: \[AIC=- 2 \ \log \mathcal{L}(\hat{\beta}) + 2\ p\] where

\(p=\dim{\beta}\) is the number of parameters to be estimated in the model,
\(\hat{\beta}\) are the estimated parameters that maximize the likelihood (or log likelihood),
\(\log \mathcal{L}(\hat{\beta})\) is the maximum value of the log likelihood.

When fitting several models to the data, the one having the smallest AIC is selected. Note that the best model is a trade off in between one that maximizes the likelihood with also having the minimum number of parameters.