Chapter 11 Multinomial Distribution
11.1 From Binomial Distribution to Multinomial Distribution
The Binomial distribution has been defined as the joint distribution of Bernouilli random variables. Bernouilli variables have only two possible outcomes (e.g. success and failure, or yes and no). The Multinomial Distribution defined below extends the number of categories for the outcomes from 2 to \(J\) (e.g. for \(J=3\): yes, maybe, no).
Is the multinomial distribution a member of the exponential family of distribution?
no, it is not.
Notation conventions.
The data collected for analysis are now \(\lbrace (y_{i,1}, \cdots,y_{i,J}, x_i,n_i)\rbrace_{i=1,\cdots,N}\) for \(N\) groups (we can use the vectorial notation \(\mathbf{y}_i=(y_{i,1}, \cdots,y_{i,J})\) leading equivalently to the observations \(\lbrace (\mathbf{y}_i, x_i,n_i)\rbrace_{i=1,\cdots,N}\)). The likelihood function for the saturated model is then: \[\mathcal{L}\left ( \lbrace\theta_{i,1}, \cdots \theta_{i,J}\rbrace_{i=1,\cdots,N} \right)= \prod_{i=1}^N \mathrm{Multinomial}(n_i,\theta_{i,1},\cdots,\theta_{i,J})= \prod_{i=1}^N \frac{n_i !}{y_{i,1}!\ y_{i,2}!\ \cdots y_{i,J}! } \ \theta_{i,1}^{y_{i,1}} \ \theta_{i,2}^{y_{i,2}}\ \cdots \theta_{i,J}^{y_{i,J}}\] The degree of freedom for this saturated model is \((J-1)\ N\) (remember that we have the constraints \(\theta_{i,1}+\theta_{i,2}+\cdots+\theta_{i,J}=1, \ \forall i\) ).
11.2 Multinomial Distribution & Poisson random variables
Preliminary exercise:
consider two independent random variables that follow a Poisson distribution i.e. \(y_1\sim\mathcal{P}_o(\lambda_1)\) and \(y_2\sim\mathcal{P}_o(\lambda_2)\). Show that \(n=y_1+y_2\) follows a Poisson distribution with parameter \(\lambda_1+\lambda_2\) (or \(n\sim\mathcal{P}_o(\lambda_1+\lambda_2\) ).
Multinomial distribution & joint distribution of Poisson variables.
Consider \(y_1, y_2,\cdots, y_J\) independent random variables with distributions: \[y_j\sim\mathcal{P}_o(\lambda_j) , \ \forall j=1,\cdots,J\] Then the joint density function of \((y_1, y_2,\cdots, y_J )\) is: \[p(y_1, y_2,\cdots, y_J; \lambda_1,\cdots,\lambda_J )=\prod_{j=1}^J \frac{\lambda_j^{y_j}}{y_j!} \exp(-\lambda_j)\] Lets define \(n=y_1+y_2+\cdots+y_J\) then \(n\sim\mathcal{P}_o(\lambda_1+\lambda_2+\cdots+\lambda_J)\) (see exercise). Using Bayes, and ignoring the parameters of the distributions for now: \[p(y_1, y_2,\cdots, y_J|n)=\frac{p(n|y_1,\cdots,y_J)\ p(y_1, y_2,\cdots, y_J)}{p(n)}\] with \(p(n|y_1,\cdots,y_J)=\delta(n-y_1-\cdots-y_J)\) (the Kronecker delta) since \(n=y_1+y_2+\cdots+y_J\). Hence \[\begin{array}{ll} p(y_1, y_2,\cdots, y_J|n)&=\frac{\delta(n-y_1-\cdots-y_J) \ \prod_{j=1}^J \frac{\lambda_j^{y_j}}{y_j!} \exp(-\lambda_j)}{\frac{(\sum_{j=1}^J \lambda_j)^n }{n!} \exp(-\sum_{j=1}^J \lambda_j)}\\ &\\ &=\delta(n-y_1-\cdots-y_J)\ n! \ \prod_{j=1}^J \frac{1}{y_j!}\left( \frac{\lambda_j}{\sum_{j=1}^J \lambda_j} \right)^{y_j}\\ \end{array}\] This is equivalent to the multinomial distribution: \[p(y_1, y_2,\cdots, y_J|n)= \frac{n!}{y_1! \cdots y_J!} \theta_1^{y_1}\cdots \theta_J^{y_J}\] with the convention \(\theta_j=\frac{\lambda_j}{\sum_{j=1}^J \lambda_j}\) and the constraint \(n=y_1+\cdots+y_J\) (this is when the kronecker delta is 1, otherwise it is 0). So the Multinomial distribution can be regarded as the joint distribution of Poisson random variables conditionnal upon their sum \(n\). This justifies the use of generalized linear models.
11.3 Nominal logistic regression
Having observations \(\lbrace (y_{i,1}, \cdots,y_{i,J}, x_i,n_i)\rbrace_{i=1,\cdots,N}\) for \(N\) groups then the estimates of the proportions using the model are: \[\hat{\theta}_{i,j} =\frac{ \exp\left(x_{i}^T\hat{\beta}_j\right) }{1+\sum_{j=2}^J \exp\left( x_{i}^T \hat{\beta}_j\right)}\quad \forall j=2,\cdots,J\]