Chapter 11 Multinomial Distribution

11.1 From Binomial Distribution to Multinomial Distribution

The Binomial distribution has been defined as the joint distribution of Bernouilli random variables. Bernouilli variables have only two possible outcomes (e.g. success and failure, or yes and no). The Multinomial Distribution defined below extends the number of categories for the outcomes from 2 to \(J\) (e.g. for \(J=3\): yes, maybe, no).

Definition 11.1 (Multinomial distribution) Consider \(J\) categories. Having collected the outcomes of \(n\) experiments, \(y_1\) indicates the number of experiments with outcomes in category 1, \(y_2\) indicates the number of experiments with outcomes in category 2, ..., \(y_J\) indicates the number of experiments with outcomes in category \(J\). Then the joint density function of \((y_1,\ y_2,\cdots, \ y_J)\) is: \[p(y_1,y_2,\cdots, y_J;\theta_1,\cdots,\theta_J,n)=\frac{n!}{y_1!\ y_2!\ \cdots y_{J}! } \ \theta_1^{y_1} \ \theta_2^{y_2}\ \cdots \theta_J^{y_J}\] with \(\theta_1, \cdots,\ \theta_{J}\) are the respective probabilities of the categories then \(\theta_1+\theta_2+\cdots+\theta_J=1\). and \(n=y_1+y_2+\cdots+y_J\). The Multinomial Distribution is noted: \[\mathrm{Multinomial}(n,\theta_1,\cdots,\theta_J)\] and when \(J=2\), we get the Binomial distribution.

Is the multinomial distribution a member of the exponential family of distribution?

no, it is not.

Notation conventions.

The data collected for analysis are now \(\lbrace (y_{i,1}, \cdots,y_{i,J}, x_i,n_i)\rbrace_{i=1,\cdots,N}\) for \(N\) groups (we can use the vectorial notation \(\mathbf{y}_i=(y_{i,1}, \cdots,y_{i,J})\) leading equivalently to the observations \(\lbrace (\mathbf{y}_i, x_i,n_i)\rbrace_{i=1,\cdots,N}\)). The likelihood function for the saturated model is then: \[\mathcal{L}\left ( \lbrace\theta_{i,1}, \cdots \theta_{i,J}\rbrace_{i=1,\cdots,N} \right)= \prod_{i=1}^N \mathrm{Multinomial}(n_i,\theta_{i,1},\cdots,\theta_{i,J})= \prod_{i=1}^N \frac{n_i !}{y_{i,1}!\ y_{i,2}!\ \cdots y_{i,J}! } \ \theta_{i,1}^{y_{i,1}} \ \theta_{i,2}^{y_{i,2}}\ \cdots \theta_{i,J}^{y_{i,J}}\] The degree of freedom for this saturated model is \((J-1)\ N\) (remember that we have the constraints \(\theta_{i,1}+\theta_{i,2}+\cdots+\theta_{i,J}=1, \ \forall i\) ).

11.2 Multinomial Distribution & Poisson random variables

Preliminary exercise:

consider two independent random variables that follow a Poisson distribution i.e. \(y_1\sim\mathcal{P}_o(\lambda_1)\) and \(y_2\sim\mathcal{P}_o(\lambda_2)\). Show that \(n=y_1+y_2\) follows a Poisson distribution with parameter \(\lambda_1+\lambda_2\) (or \(n\sim\mathcal{P}_o(\lambda_1+\lambda_2\) ).

Multinomial distribution & joint distribution of Poisson variables.

Consider \(y_1, y_2,\cdots, y_J\) independent random variables with distributions: \[y_j\sim\mathcal{P}_o(\lambda_j) , \ \forall j=1,\cdots,J\] Then the joint density function of \((y_1, y_2,\cdots, y_J )\) is: \[p(y_1, y_2,\cdots, y_J; \lambda_1,\cdots,\lambda_J )=\prod_{j=1}^J \frac{\lambda_j^{y_j}}{y_j!} \exp(-\lambda_j)\] Lets define \(n=y_1+y_2+\cdots+y_J\) then \(n\sim\mathcal{P}_o(\lambda_1+\lambda_2+\cdots+\lambda_J)\) (see exercise). Using Bayes, and ignoring the parameters of the distributions for now: \[p(y_1, y_2,\cdots, y_J|n)=\frac{p(n|y_1,\cdots,y_J)\ p(y_1, y_2,\cdots, y_J)}{p(n)}\] with \(p(n|y_1,\cdots,y_J)=\delta(n-y_1-\cdots-y_J)\) (the Kronecker delta) since \(n=y_1+y_2+\cdots+y_J\). Hence \[\begin{array}{ll} p(y_1, y_2,\cdots, y_J|n)&=\frac{\delta(n-y_1-\cdots-y_J) \ \prod_{j=1}^J \frac{\lambda_j^{y_j}}{y_j!} \exp(-\lambda_j)}{\frac{(\sum_{j=1}^J \lambda_j)^n }{n!} \exp(-\sum_{j=1}^J \lambda_j)}\\ &\\ &=\delta(n-y_1-\cdots-y_J)\ n! \ \prod_{j=1}^J \frac{1}{y_j!}\left( \frac{\lambda_j}{\sum_{j=1}^J \lambda_j} \right)^{y_j}\\ \end{array}\] This is equivalent to the multinomial distribution: \[p(y_1, y_2,\cdots, y_J|n)= \frac{n!}{y_1! \cdots y_J!} \theta_1^{y_1}\cdots \theta_J^{y_J}\] with the convention \(\theta_j=\frac{\lambda_j}{\sum_{j=1}^J \lambda_j}\) and the constraint \(n=y_1+\cdots+y_J\) (this is when the kronecker delta is 1, otherwise it is 0). So the Multinomial distribution can be regarded as the joint distribution of Poisson random variables conditionnal upon their sum \(n\). This justifies the use of generalized linear models.

11.3 Nominal logistic regression

Definition 11.2 (Nominal logistic Regression) The outcomes of experiments are in \(J\) categories and there is no natural order amongst the response categories. One category is arbitrarily chosen as the reference category e.g. \(\theta_1\). Then the logits for the other categories are defined by: \[\mathrm{logit}(\theta_j)=\log\left( \frac{\theta_j}{\theta_1}\right) = x^T \beta_j \quad,\ \forall j=2,\cdots, J\] having the constraints \(\sum_{j=1}^J\theta_j=1\). When the estimates \(\hat{\beta}_j\) are computed, then \[\left\lbrace \begin{array}{l} \hat{\theta}_j = \hat{\theta}_1\ \exp\left(x^T\hat{\beta}_j\right)\quad \forall j=2,\cdots,J\\ \\ \hat{\theta}_1= \frac{1}{1+\sum_{j=2}^J \exp\left( x^T \hat{\beta}_j\right)}\\ \end{array} \right.\] or \[\hat{\theta}_j =\frac{ \exp\left(x^T\hat{\beta}_j\right) }{1+\sum_{j=2}^J \exp\left( x^T \hat{\beta}_j\right)}\quad \forall j=2,\cdots,J\]

Having observations \(\lbrace (y_{i,1}, \cdots,y_{i,J}, x_i,n_i)\rbrace_{i=1,\cdots,N}\) for \(N\) groups then the estimates of the proportions using the model are: \[\hat{\theta}_{i,j} =\frac{ \exp\left(x_{i}^T\hat{\beta}_j\right) }{1+\sum_{j=2}^J \exp\left( x_{i}^T \hat{\beta}_j\right)}\quad \forall j=2,\cdots,J\]