# Normal-Wishart distribution

In probability theory and statistics, the normal-Wishart distribution (or Gaussian-Wishart distribution) is a multivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a multivariate normal distribution with unknown mean and precision matrix (the inverse of the covariance matrix).[1]

Notation ${\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Lambda }})\sim \mathrm {NW} ({\boldsymbol {\mu }}_{0},\lambda ,\mathbf {W} ,\nu )}$ ${\displaystyle {\boldsymbol {\mu }}_{0}\in \mathbb {R} ^{D}\,}$ location (vector of real)${\displaystyle \lambda >0\,}$ (real)${\displaystyle \mathbf {W} \in \mathbb {R} ^{D\times D}}$ scale matrix (pos. def.)${\displaystyle \nu >D-1\,}$ (real) ${\displaystyle {\boldsymbol {\mu }}\in \mathbb {R} ^{D};{\boldsymbol {\Lambda }}\in \mathbb {R} ^{D\times D}}$ covariance matrix (pos. def.) ${\displaystyle f({\boldsymbol {\mu }},{\boldsymbol {\Lambda }}|{\boldsymbol {\mu }}_{0},\lambda ,\mathbf {W} ,\nu )={\mathcal {N}}({\boldsymbol {\mu }}|{\boldsymbol {\mu }}_{0},(\lambda {\boldsymbol {\Lambda }})^{-1})\ {\mathcal {W}}({\boldsymbol {\Lambda }}|\mathbf {W} ,\nu )}$

## Definition

Suppose

${\displaystyle {\boldsymbol {\mu }}|{\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Lambda }}\sim {\mathcal {N}}({\boldsymbol {\mu }}_{0},(\lambda {\boldsymbol {\Lambda }})^{-1})}$

has a multivariate normal distribution with mean ${\displaystyle {\boldsymbol {\mu }}_{0}}$ and covariance matrix ${\displaystyle (\lambda {\boldsymbol {\Lambda }})^{-1}}$ , where

${\displaystyle {\boldsymbol {\Lambda }}|\mathbf {W} ,\nu \sim {\mathcal {W}}({\boldsymbol {\Lambda }}|\mathbf {W} ,\nu )}$

has a Wishart distribution. Then ${\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Lambda }})}$ has a normal-Wishart distribution, denoted as

${\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Lambda }})\sim \mathrm {NW} ({\boldsymbol {\mu }}_{0},\lambda ,\mathbf {W} ,\nu ).}$

## Characterization

### Probability density function

${\displaystyle f({\boldsymbol {\mu }},{\boldsymbol {\Lambda }}|{\boldsymbol {\mu }}_{0},\lambda ,\mathbf {W} ,\nu )={\mathcal {N}}({\boldsymbol {\mu }}|{\boldsymbol {\mu }}_{0},(\lambda {\boldsymbol {\Lambda }})^{-1})\ {\mathcal {W}}({\boldsymbol {\Lambda }}|\mathbf {W} ,\nu )}$

## Properties

### Marginal distributions

By construction, the marginal distribution over ${\displaystyle {\boldsymbol {\Lambda }}}$ is a Wishart distribution, and the conditional distribution over ${\displaystyle {\boldsymbol {\mu }}}$ given ${\displaystyle {\boldsymbol {\Lambda }}}$ is a multivariate normal distribution. The marginal distribution over ${\displaystyle {\boldsymbol {\mu }}}$ is a multivariate t-distribution.

## Posterior distribution of the parameters

After making ${\displaystyle n}$ observations ${\displaystyle {\boldsymbol {x}}_{1},\dots ,{\boldsymbol {x}}_{n}}$ , the posterior distribution of the parameters is

${\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Lambda }})\sim \mathrm {NW} ({\boldsymbol {\mu }}_{n},\lambda _{n},\mathbf {W} _{n},\nu _{n}),}$

where

${\displaystyle \lambda _{n}=\lambda +n,}$
${\displaystyle {\boldsymbol {\mu }}_{n}={\frac {\lambda {\boldsymbol {\mu }}_{0}+n{\boldsymbol {\bar {x}}}}{\lambda +n}},}$
${\displaystyle \nu _{n}=\nu +n,}$
${\displaystyle \mathbf {W} _{n}^{-1}=\mathbf {W} ^{-1}+\sum _{i=1}^{n}({\boldsymbol {x}}_{i}-{\boldsymbol {\bar {x}}})({\boldsymbol {x}}_{i}-{\boldsymbol {\bar {x}}})^{T}+{\frac {n\lambda }{n+\lambda }}({\boldsymbol {\bar {x}}}-{\boldsymbol {\mu }}_{0})({\boldsymbol {\bar {x}}}-{\boldsymbol {\mu }}_{0})^{T}.}$ [2]

## Generating normal-Wishart random variates

Generation of random variates is straightforward:

1. Sample ${\displaystyle {\boldsymbol {\Lambda }}}$ from a Wishart distribution with parameters ${\displaystyle \mathbf {W} }$ and ${\displaystyle \nu }$
2. Sample ${\displaystyle {\boldsymbol {\mu }}}$ from a multivariate normal distribution with mean ${\displaystyle {\boldsymbol {\mu }}_{0}}$ and variance ${\displaystyle (\lambda {\boldsymbol {\Lambda }})^{-1}}$

## Notes

1. Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. Springer Science+Business Media. Page 690.
2. Cross Validated, https://stats.stackexchange.com/q/324925

## References

• Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. Springer Science+Business Media.