# Normal-inverse-Wishart distribution

In probability theory and statistics, the normal-inverse-Wishart distribution (or Gaussian-inverse-Wishart distribution) is a multivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a multivariate normal distribution with unknown mean and covariance matrix (the inverse of the precision matrix).[1]

Notation ${\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})\sim \mathrm {NIW} ({\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu )}$ ${\displaystyle {\boldsymbol {\mu }}_{0}\in \mathbb {R} ^{D}\,}$ location (vector of real)${\displaystyle \lambda >0\,}$ (real)${\displaystyle {\boldsymbol {\Psi }}\in \mathbb {R} ^{D\times D}}$ inverse scale matrix (pos. def.)${\displaystyle \nu >D-1\,}$ (real) ${\displaystyle {\boldsymbol {\mu }}\in \mathbb {R} ^{D};{\boldsymbol {\Sigma }}\in \mathbb {R} ^{D\times D}}$ covariance matrix (pos. def.) ${\displaystyle f({\boldsymbol {\mu }},{\boldsymbol {\Sigma }}|{\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu )={\mathcal {N}}({\boldsymbol {\mu }}|{\boldsymbol {\mu }}_{0},{\tfrac {1}{\lambda }}{\boldsymbol {\Sigma }})\ {\mathcal {W}}^{-1}({\boldsymbol {\Sigma }}|{\boldsymbol {\Psi }},\nu )}$

## Definition

Suppose

${\displaystyle {\boldsymbol {\mu }}|{\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Sigma }}\sim {\mathcal {N}}\left({\boldsymbol {\mu }}{\Big |}{\boldsymbol {\mu }}_{0},{\frac {1}{\lambda }}{\boldsymbol {\Sigma }}\right)}$

has a multivariate normal distribution with mean ${\displaystyle {\boldsymbol {\mu }}_{0}}$ and covariance matrix ${\displaystyle {\tfrac {1}{\lambda }}{\boldsymbol {\Sigma }}}$ , where

${\displaystyle {\boldsymbol {\Sigma }}|{\boldsymbol {\Psi }},\nu \sim {\mathcal {W}}^{-1}({\boldsymbol {\Sigma }}|{\boldsymbol {\Psi }},\nu )}$

has an inverse Wishart distribution. Then ${\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})}$ has a normal-inverse-Wishart distribution, denoted as

${\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})\sim \mathrm {NIW} ({\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu ).}$

## Characterization

### Probability density function

${\displaystyle f({\boldsymbol {\mu }},{\boldsymbol {\Sigma }}|{\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu )={\mathcal {N}}\left({\boldsymbol {\mu }}{\Big |}{\boldsymbol {\mu }}_{0},{\frac {1}{\lambda }}{\boldsymbol {\Sigma }}\right){\mathcal {W}}^{-1}({\boldsymbol {\Sigma }}|{\boldsymbol {\Psi }},\nu )}$

## Properties

### Marginal distributions

By construction, the marginal distribution over ${\displaystyle {\boldsymbol {\Sigma }}}$ is an inverse Wishart distribution, and the conditional distribution over ${\displaystyle {\boldsymbol {\mu }}}$ given ${\displaystyle {\boldsymbol {\Sigma }}}$ is a multivariate normal distribution. The marginal distribution over ${\displaystyle {\boldsymbol {\mu }}}$ is a multivariate t-distribution.

## Posterior distribution of the parameters

Suppose the sampling density is a multivariate normal distribution

${\displaystyle {\boldsymbol {y_{i}}}|{\boldsymbol {\mu }},{\boldsymbol {\Sigma }}\sim {\mathcal {N}}_{p}({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})}$

where ${\displaystyle {\boldsymbol {y}}}$ is an ${\displaystyle n\times p}$ matrix and ${\displaystyle {\boldsymbol {y_{i}}}}$ (of length ${\displaystyle p}$ ) is row ${\displaystyle i}$ of the matrix .

With the mean and covariance matrix of the sampling distribution is unknown, we can place a Normal-Inverse-Wishart prior on the mean and covariance parameters jointly

${\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})\sim \mathrm {NIW} ({\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu ).}$

The resulting posterior distribution for the mean and covariance matrix will also be a Normal-Inverse-Wishart

${\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }}|y)\sim \mathrm {NIW} ({\boldsymbol {\mu }}_{n},\lambda _{n},{\boldsymbol {\Psi }}_{n},\nu _{n}),}$

where

${\displaystyle {\boldsymbol {\mu }}_{n}={\frac {\lambda {\boldsymbol {\mu }}_{0}+n{\bar {\boldsymbol {y}}}}{\lambda +n}}}$
${\displaystyle \lambda _{n}=\lambda +n}$
${\displaystyle \nu _{n}=\nu +n}$
${\displaystyle {\boldsymbol {\Psi }}_{n}={\boldsymbol {\Psi +S}}+{\frac {\lambda n}{\lambda +n}}({\boldsymbol {{\bar {y}}-\mu _{0}}})({\boldsymbol {{\bar {y}}-\mu _{0}}})^{T}~~~\mathrm {with,} ~~{\boldsymbol {S}}=\sum _{i=1}^{n}({\boldsymbol {y_{i}-{\bar {y}}}})({\boldsymbol {y_{i}-{\bar {y}}}})^{T}}$ .

To sample from the joint posterior of ${\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})}$ , one simply draws samples from ${\displaystyle {\boldsymbol {\Sigma }}|{\boldsymbol {y}}\sim {\mathcal {W}}^{-1}({\boldsymbol {\Psi }}_{n},\nu _{n})}$ , then draw ${\displaystyle {\boldsymbol {\mu }}|{\boldsymbol {\Sigma ,y}}\sim {\mathcal {N}}_{p}({\boldsymbol {\mu }}_{n},{\boldsymbol {\Sigma }}/\lambda _{n})}$ . To draw from the posterior predictive of a new observation, draw ${\displaystyle {\boldsymbol {\tilde {y}}}|{\boldsymbol {\mu ,\Sigma ,y}}\sim {\mathcal {N}}_{p}({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})}$ , given the already drawn values of ${\displaystyle {\boldsymbol {\mu }}}$ and ${\displaystyle {\boldsymbol {\Sigma }}}$ .[2]

## Generating normal-inverse-Wishart random variates

Generation of random variates is straightforward:

1. Sample ${\displaystyle {\boldsymbol {\Sigma }}}$ from an inverse Wishart distribution with parameters ${\displaystyle {\boldsymbol {\Psi }}}$ and ${\displaystyle \nu }$
2. Sample ${\displaystyle {\boldsymbol {\mu }}}$ from a multivariate normal distribution with mean ${\displaystyle {\boldsymbol {\mu }}_{0}}$ and variance ${\displaystyle {\boldsymbol {\tfrac {1}{\lambda }}}{\boldsymbol {\Sigma }}}$
• The normal-Wishart distribution is essentially the same distribution parameterized by precision rather than variance. If ${\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})\sim \mathrm {NIW} ({\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu )}$ then ${\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }}^{-1})\sim \mathrm {NW} ({\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }}^{-1},\nu )}$ .
• The normal-inverse-gamma distribution is the one-dimensional equivalent.
• The multivariate normal distribution and inverse Wishart distribution are the component distributions out of which this distribution is made.

## Notes

1. Murphy, Kevin P. (2007). "Conjugate Bayesian analysis of the Gaussian distribution."
2. Gelman, Andrew, et al. Bayesian data analysis. Vol. 2, p.73. Boca Raton, FL, USA: Chapman & Hall/CRC, 2014.

## References

• Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. Springer Science+Business Media.
• Murphy, Kevin P. (2007). "Conjugate Bayesian analysis of the Gaussian distribution."