Normal-inverse-Wishart distribution
Multivariate parameter family of continuous probability distributions
normal-inverse-Wishart Notation
(
μ μ -->
,
Σ Σ -->
)
∼ ∼ -->
N
I
W
(
μ μ -->
0
,
λ λ -->
,
Ψ Ψ -->
,
ν ν -->
)
{\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})\sim \mathrm {NIW} ({\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu )}
Parameters
μ μ -->
0
∈ ∈ -->
R
D
{\displaystyle {\boldsymbol {\mu }}_{0}\in \mathbb {R} ^{D}\,}
location (vector of real )
λ λ -->
>
0
{\displaystyle \lambda >0\,}
(real)
Ψ Ψ -->
∈ ∈ -->
R
D
× × -->
D
{\displaystyle {\boldsymbol {\Psi }}\in \mathbb {R} ^{D\times D}}
inverse scale matrix (pos. def. )
ν ν -->
>
D
− − -->
1
{\displaystyle \nu >D-1\,}
(real) Support
μ μ -->
∈ ∈ -->
R
D
;
Σ Σ -->
∈ ∈ -->
R
D
× × -->
D
{\displaystyle {\boldsymbol {\mu }}\in \mathbb {R} ^{D};{\boldsymbol {\Sigma }}\in \mathbb {R} ^{D\times D}}
covariance matrix (pos. def. ) PDF
f
(
μ μ -->
,
Σ Σ -->
|
μ μ -->
0
,
λ λ -->
,
Ψ Ψ -->
,
ν ν -->
)
=
N
(
μ μ -->
|
μ μ -->
0
,
1
λ λ -->
Σ Σ -->
)
W
− − -->
1
(
Σ Σ -->
|
Ψ Ψ -->
,
ν ν -->
)
{\displaystyle f({\boldsymbol {\mu }},{\boldsymbol {\Sigma }}|{\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu )={\mathcal {N}}({\boldsymbol {\mu }}|{\boldsymbol {\mu }}_{0},{\tfrac {1}{\lambda }}{\boldsymbol {\Sigma }})\ {\mathcal {W}}^{-1}({\boldsymbol {\Sigma }}|{\boldsymbol {\Psi }},\nu )}
In probability theory and statistics , the normal-inverse-Wishart distribution (or Gaussian-inverse-Wishart distribution ) is a multivariate four-parameter family of continuous probability distributions . It is the conjugate prior of a multivariate normal distribution with unknown mean and covariance matrix (the inverse of the precision matrix ).[ 1]
Definition
Suppose
μ μ -->
|
μ μ -->
0
,
λ λ -->
,
Σ Σ -->
∼ ∼ -->
N
(
μ μ -->
|
μ μ -->
0
,
1
λ λ -->
Σ Σ -->
)
{\displaystyle {\boldsymbol {\mu }}|{\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Sigma }}\sim {\mathcal {N}}\left({\boldsymbol {\mu }}{\Big |}{\boldsymbol {\mu }}_{0},{\frac {1}{\lambda }}{\boldsymbol {\Sigma }}\right)}
has a multivariate normal distribution with mean
μ μ -->
0
{\displaystyle {\boldsymbol {\mu }}_{0}}
and covariance matrix
1
λ λ -->
Σ Σ -->
{\displaystyle {\tfrac {1}{\lambda }}{\boldsymbol {\Sigma }}}
, where
Σ Σ -->
|
Ψ Ψ -->
,
ν ν -->
∼ ∼ -->
W
− − -->
1
(
Σ Σ -->
|
Ψ Ψ -->
,
ν ν -->
)
{\displaystyle {\boldsymbol {\Sigma }}|{\boldsymbol {\Psi }},\nu \sim {\mathcal {W}}^{-1}({\boldsymbol {\Sigma }}|{\boldsymbol {\Psi }},\nu )}
has an inverse Wishart distribution . Then
(
μ μ -->
,
Σ Σ -->
)
{\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})}
has a normal-inverse-Wishart distribution, denoted as
(
μ μ -->
,
Σ Σ -->
)
∼ ∼ -->
N
I
W
(
μ μ -->
0
,
λ λ -->
,
Ψ Ψ -->
,
ν ν -->
)
.
{\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})\sim \mathrm {NIW} ({\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu ).}
Characterization
Probability density function
f
(
μ μ -->
,
Σ Σ -->
|
μ μ -->
0
,
λ λ -->
,
Ψ Ψ -->
,
ν ν -->
)
=
N
(
μ μ -->
|
μ μ -->
0
,
1
λ λ -->
Σ Σ -->
)
W
− − -->
1
(
Σ Σ -->
|
Ψ Ψ -->
,
ν ν -->
)
{\displaystyle f({\boldsymbol {\mu }},{\boldsymbol {\Sigma }}|{\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu )={\mathcal {N}}\left({\boldsymbol {\mu }}{\Big |}{\boldsymbol {\mu }}_{0},{\frac {1}{\lambda }}{\boldsymbol {\Sigma }}\right){\mathcal {W}}^{-1}({\boldsymbol {\Sigma }}|{\boldsymbol {\Psi }},\nu )}
The full version of the PDF is as follows:[ 2]
f
(
μ μ -->
,
Σ Σ -->
|
μ μ -->
0
,
λ λ -->
,
Ψ Ψ -->
,
ν ν -->
)
=
λ λ -->
D
/
2
|
Ψ Ψ -->
|
ν ν -->
/
2
|
Σ Σ -->
|
− − -->
ν ν -->
+
D
+
2
2
(
2
π π -->
)
D
/
2
2
ν ν -->
D
2
Γ Γ -->
D
(
ν ν -->
2
)
exp
{
− − -->
1
2
T
r
(
Ψ Ψ -->
Σ Σ -->
− − -->
1
)
− − -->
λ λ -->
2
(
μ μ -->
− − -->
μ μ -->
0
)
T
Σ Σ -->
− − -->
1
(
μ μ -->
− − -->
μ μ -->
0
)
}
{\displaystyle f({\boldsymbol {\mu }},{\boldsymbol {\Sigma }}|{\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu )={\frac {\lambda ^{D/2}|{\boldsymbol {\Psi }}|^{\nu /2}|{\boldsymbol {\Sigma }}|^{-{\frac {\nu +D+2}{2}}}}{(2\pi )^{D/2}2^{\frac {\nu D}{2}}\Gamma _{D}({\frac {\nu }{2}})}}{\text{exp}}\left\{-{\frac {1}{2}}Tr({\boldsymbol {\Psi \Sigma }}^{-1})-{\frac {\lambda }{2}}({\boldsymbol {\mu }}-{\boldsymbol {\mu }}_{0})^{T}{\boldsymbol {\Sigma }}^{-1}({\boldsymbol {\mu }}-{\boldsymbol {\mu }}_{0})\right\}}
Here
Γ Γ -->
D
[
⋅ ⋅ -->
]
{\displaystyle \Gamma _{D}[\cdot ]}
is the multivariate gamma function and
T
r
(
Ψ Ψ -->
)
{\displaystyle Tr({\boldsymbol {\Psi }})}
is the Trace of the given matrix.
Properties
Scaling
Marginal distributions
By construction, the marginal distribution over
Σ Σ -->
{\displaystyle {\boldsymbol {\Sigma }}}
is an inverse Wishart distribution , and the conditional distribution over
μ μ -->
{\displaystyle {\boldsymbol {\mu }}}
given
Σ Σ -->
{\displaystyle {\boldsymbol {\Sigma }}}
is a multivariate normal distribution . The marginal distribution over
μ μ -->
{\displaystyle {\boldsymbol {\mu }}}
is a multivariate t-distribution .
Posterior distribution of the parameters
Suppose the sampling density is a multivariate normal distribution
y
i
|
μ μ -->
,
Σ Σ -->
∼ ∼ -->
N
p
(
μ μ -->
,
Σ Σ -->
)
{\displaystyle {\boldsymbol {y_{i}}}|{\boldsymbol {\mu }},{\boldsymbol {\Sigma }}\sim {\mathcal {N}}_{p}({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})}
where
y
{\displaystyle {\boldsymbol {y}}}
is an
n
× × -->
p
{\displaystyle n\times p}
matrix and
y
i
{\displaystyle {\boldsymbol {y_{i}}}}
(of length
p
{\displaystyle p}
) is row
i
{\displaystyle i}
of the matrix .
With the mean and covariance matrix of the sampling distribution is unknown, we can place a Normal-Inverse-Wishart prior on the mean and covariance parameters jointly
(
μ μ -->
,
Σ Σ -->
)
∼ ∼ -->
N
I
W
(
μ μ -->
0
,
λ λ -->
,
Ψ Ψ -->
,
ν ν -->
)
.
{\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})\sim \mathrm {NIW} ({\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu ).}
The resulting posterior distribution for the mean and covariance matrix will also be a Normal-Inverse-Wishart
(
μ μ -->
,
Σ Σ -->
|
y
)
∼ ∼ -->
N
I
W
(
μ μ -->
n
,
λ λ -->
n
,
Ψ Ψ -->
n
,
ν ν -->
n
)
,
{\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }}|y)\sim \mathrm {NIW} ({\boldsymbol {\mu }}_{n},\lambda _{n},{\boldsymbol {\Psi }}_{n},\nu _{n}),}
where
μ μ -->
n
=
λ λ -->
μ μ -->
0
+
n
y
¯ ¯ -->
λ λ -->
+
n
{\displaystyle {\boldsymbol {\mu }}_{n}={\frac {\lambda {\boldsymbol {\mu }}_{0}+n{\bar {\boldsymbol {y}}}}{\lambda +n}}}
λ λ -->
n
=
λ λ -->
+
n
{\displaystyle \lambda _{n}=\lambda +n}
ν ν -->
n
=
ν ν -->
+
n
{\displaystyle \nu _{n}=\nu +n}
Ψ Ψ -->
n
=
Ψ Ψ -->
+
S
+
λ λ -->
n
λ λ -->
+
n
(
y
¯ ¯ -->
− − -->
μ μ -->
0
)
(
y
¯ ¯ -->
− − -->
μ μ -->
0
)
T
w
i
t
h
S
=
∑ ∑ -->
i
=
1
n
(
y
i
− − -->
y
¯ ¯ -->
)
(
y
i
− − -->
y
¯ ¯ -->
)
T
{\displaystyle {\boldsymbol {\Psi }}_{n}={\boldsymbol {\Psi +S}}+{\frac {\lambda n}{\lambda +n}}({\boldsymbol {{\bar {y}}-\mu _{0}}})({\boldsymbol {{\bar {y}}-\mu _{0}}})^{T}~~~\mathrm {with} ~~{\boldsymbol {S}}=\sum _{i=1}^{n}({\boldsymbol {y_{i}-{\bar {y}}}})({\boldsymbol {y_{i}-{\bar {y}}}})^{T}}
.
To sample from the joint posterior of
(
μ μ -->
,
Σ Σ -->
)
{\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})}
, one simply draws samples from
Σ Σ -->
|
y
∼ ∼ -->
W
− − -->
1
(
Ψ Ψ -->
n
,
ν ν -->
n
)
{\displaystyle {\boldsymbol {\Sigma }}|{\boldsymbol {y}}\sim {\mathcal {W}}^{-1}({\boldsymbol {\Psi }}_{n},\nu _{n})}
, then draw
μ μ -->
|
Σ Σ -->
,
y
∼ ∼ -->
N
p
(
μ μ -->
n
,
Σ Σ -->
/
λ λ -->
n
)
{\displaystyle {\boldsymbol {\mu }}|{\boldsymbol {\Sigma ,y}}\sim {\mathcal {N}}_{p}({\boldsymbol {\mu }}_{n},{\boldsymbol {\Sigma }}/\lambda _{n})}
. To draw from the posterior predictive of a new observation, draw
y
~ ~ -->
|
μ μ -->
,
Σ Σ -->
,
y
∼ ∼ -->
N
p
(
μ μ -->
,
Σ Σ -->
)
{\displaystyle {\boldsymbol {\tilde {y}}}|{\boldsymbol {\mu ,\Sigma ,y}}\sim {\mathcal {N}}_{p}({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})}
, given the already drawn values of
μ μ -->
{\displaystyle {\boldsymbol {\mu }}}
and
Σ Σ -->
{\displaystyle {\boldsymbol {\Sigma }}}
.[ 3]
Generating normal-inverse-Wishart random variates
Generation of random variates is straightforward:
Sample
Σ Σ -->
{\displaystyle {\boldsymbol {\Sigma }}}
from an inverse Wishart distribution with parameters
Ψ Ψ -->
{\displaystyle {\boldsymbol {\Psi }}}
and
ν ν -->
{\displaystyle \nu }
Sample
μ μ -->
{\displaystyle {\boldsymbol {\mu }}}
from a multivariate normal distribution with mean
μ μ -->
0
{\displaystyle {\boldsymbol {\mu }}_{0}}
and variance
1
λ λ -->
Σ Σ -->
{\displaystyle {\boldsymbol {\tfrac {1}{\lambda }}}{\boldsymbol {\Sigma }}}
The normal-Wishart distribution is essentially the same distribution parameterized by precision rather than variance. If
(
μ μ -->
,
Σ Σ -->
)
∼ ∼ -->
N
I
W
(
μ μ -->
0
,
λ λ -->
,
Ψ Ψ -->
,
ν ν -->
)
{\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})\sim \mathrm {NIW} ({\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu )}
then
(
μ μ -->
,
Σ Σ -->
− − -->
1
)
∼ ∼ -->
N
W
(
μ μ -->
0
,
λ λ -->
,
Ψ Ψ -->
− − -->
1
,
ν ν -->
)
{\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }}^{-1})\sim \mathrm {NW} ({\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }}^{-1},\nu )}
.
The normal-inverse-gamma distribution is the one-dimensional equivalent.
The multivariate normal distribution and inverse Wishart distribution are the component distributions out of which this distribution is made.
Notes
^ Murphy, Kevin P. (2007). "Conjugate Bayesian analysis of the Gaussian distribution." [1]
^ Simon J.D. Prince(June 2012). Computer Vision: Models, Learning, and Inference . Cambridge University Press. 3.8: "Normal inverse Wishart distribution".
^ Gelman, Andrew, et al. Bayesian data analysis. Vol. 2, p.73. Boca Raton, FL, USA: Chapman & Hall/CRC, 2014.
References
Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. Springer Science+Business Media.
Murphy, Kevin P. (2007). "Conjugate Bayesian analysis of the Gaussian distribution." [2]
Discrete univariate
with finite support with infinite support
Continuous univariate
supported on a bounded interval supported on a semi-infinite interval supported on the whole real line with support whose type varies
Mixed univariate
Multivariate (joint) Directional Degenerate and singular Families