One common method of construction of a multivariate t-distribution, for the case of dimensions, is based on the observation that if and are independent and distributed as and (i.e. multivariate normal and chi-squared distributions) respectively, the matrix is a p × p matrix, and is a constant vector then the random variable has the density[1]
and is said to be distributed as a multivariate t-distribution with parameters . Note that is not the covariance matrix since the covariance is given by (for ).
The constructive definition of a multivariate t-distribution simultaneously serves as a sampling algorithm:
Generate and , independently.
Compute .
This formulation gives rise to the hierarchical representation of a multivariate t-distribution as a scale-mixture of normals: where indicates a gamma distribution with density proportional to , and conditionally follows .
There are in fact many candidates for the multivariate generalization of Student's t-distribution. An extensive survey of the field has been given by Kotz and Nadarajah (2004). The essential issue is to define a probability density function of several variables that is the appropriate generalization of the formula for the univariate case. In one dimension (), with and , we have the probability density function
and one approach is to use a corresponding function of several variables. This is the basic idea of elliptical distribution theory, where one writes down a corresponding function of variables that replaces by a quadratic function of all the . It is clear that this only makes sense when all the marginal distributions have the same degrees of freedom. With , one has a simple choice of multivariate density function
which is the standard but not the only choice.
An important special case is the standard bivariate t-distribution, p = 2:
Note that .
Now, if is the identity matrix, the density is
The difficulty with the standard representation is revealed by this formula, which does not factorize into the product of the marginal one-dimensional distributions. When is diagonal the standard representation can be shown to have zero correlation but the marginal distributions are not statistically independent.
A notable spontaneous occurrence of the elliptical multivariate distribution is its formal mathematical appearance when least squares methods are applied to multivariate normal data such as the classical Markowitz minimum variance econometric solution for asset portfolios.[2]
Cumulative distribution function
The definition of the cumulative distribution function (cdf) in one dimension can be extended to multiple dimensions by defining the following probability (here is a real vector):
This was developed by Muirhead [6] and Cornish.[7] but later derived using the simpler chi-squared ratio representation above, by Roth[1] and Ding.[8] Let vector follow a multivariate t distribution and partition into two subvectors of elements:
where , the known mean vectors are and the scale matrix is .
Roth and Ding find the conditional distribution to be a new t-distribution with modified parameters.
An equivalent expression in Kotz et. al. is somewhat less concise.
Thus the conditional distribution is most easily represented as a two-step procedure. Form first the intermediate distribution above then, using the parameters below, the explicit conditional distribution becomes
where
Effective degrees of freedom, is augmented by the number of disused variables .
The use of such distributions is enjoying renewed interest due to applications in mathematical finance, especially through the use of the Student's tcopula.[9]
Elliptical representation
Constructed as an elliptical distribution,[10] take the simplest centralised case with spherical symmetry and no scaling, , then the multivariate t-PDF takes the form
where and = degrees of freedom as defined in Muirhead[6] section 1.5. The covariance of is
The aim is to convert the Cartesian PDF to a radial one. Kibria and Joarder,[11] define radial measure and, noting that the density is dependent only on r2, we get
which is equivalent to the variance of -element vector treated as a univariate heavy-tail zero-mean random sequence with uncorrelated, yet statistically dependent, elements.
Given the Beta-prime distribution, the radial cumulative distribution function of is known:
where is the incomplete Beta function and applies with a spherical assumption.
In the scalar case, , the distribution is equivalent to Student-t with the equivalence , the variable t having double-sided tails for CDF purposes, i.e. the "two-tail-t-test".
The radial distribution can also be derived via a straightforward coordinate transformation from Cartesian to spherical. A constant radius surface at with PDF is an iso-density surface. Given this density value, the quantum of probability on a shell of surface area and thickness at is .
The enclosed -sphere of radius has surface area . Substitution into shows that the shell has element of probability which is equivalent to radial density function
which further simplifies to where is the Beta function.
Changing the radial variable to returns the previous Beta Prime distribution
To scale the radial variables without changing the radial shape function, define scale matrix , yielding a 3-parameter Cartesian density function, ie. the probability in volume element is
or, in terms of scalar radial variable ,
Radial Moments
The moments of all the radial variables , with the spherical distribution assumption, can be derived from the Beta Prime distribution. If then , a known result. Thus, for variable we have
The moments of are
while introducing the scale matrix yields
Moments relating to radial variable are found by setting and whereupon
Linear Combinations and Affine Transformation
Full Rank Transform
This closely relates to the multivariate normal method and is described in Kotz and Nadarajah, Kibria and Joarder, Roth, and Cornish. Starting from a somewhat simplified version of the central MV-t pdf: , where is a constant and is arbitrary but fixed, let be a full-rank matrix and form vector . Then, by straightforward change of variables
The matrix of partial derivatives is and the Jacobian becomes . Thus
The denominator reduces to
In full:
which is a regular MV-t distribution.
In general if and has full rank then
Marginal Distributions
This is a special case of the rank-reducing linear transform below. Kotz defines marginal distributions as follows. Partition into two subvectors of elements:
with , means , scale matrix
then , such that
If a transformation is constructed in the form
then vector , as discussed below, has the same distribution as the marginal distribution of .
Rank-Reducing Linear Transform
In the linear transform case, if is a rectangular matrix , of rank the result is dimensionality reduction. Here, Jacobian is seemingly rectangular but the value in the denominator pdf is nevertheless correct. There is a discussion of rectangular matrix product determinants in Aitken.[12] In general if and has full rank then
In extremis, if m = 1 and becomes a row vector, then scalar Y follows a univariate double-sided Student-t distribution defined by with the same degrees of freedom. Kibria et. al. use the affine transformation to find the marginal distributions which are also MV-t.
During affine transformations of variables with elliptical distributions all vectors must ultimately derive from one initial isotropic spherical vector whose elements remain 'entangled' and are not statistically independent.
A vector of independent student-t samples is not consistent with the multivariate t distribution.
Adding two sample multivariate t vectors generated with independent Chi-squared samples and different values: will not produce internally consistent distributions, though they will yield a Behrens-Fisher problem.[13]
Taleb compares many examples of fat-tail elliptical vs non-elliptical multivariate distributions
The elliptical multivariate-t distribution arises spontaneously in linearly constrained least squares solutions involving multivariate normal source data, for example the Markowitz global minimum variance solution in financial portfolio analysis.[14][15][2] which addresses an ensemble of normal random vectors or a random matrix. It does not arise in ordinary least squares (OLS) or multiple regression with fixed dependent and independent variables which problem tends to produce well-behaved normal error probabilities.
Chi distribution, the pdf of the scaling factor in the construction the Student's t-distribution and also the 2-norm (or Euclidean norm) of a multivariate normally distributed vector (centered at zero).
^ abRoth, Michael (17 April 2013). "On the Multivariate t Distribution"(PDF). Automatic Control group. Linköpin University, Sweden. Archived(PDF) from the original on 31 July 2022. Retrieved 1 June 2022.
^Botev, Z. I.; L'Ecuyer, P. (6 December 2015). "Efficient probability estimation and simulation of the truncated multivariate student-t distribution". 2015 Winter Simulation Conference (WSC). Huntington Beach, CA, USA: IEEE. pp. 380–391. doi:10.1109/WSC.2015.7408180.
^Osiewalski, Jacek; Steele, Mark (1996). "Posterior Moments of Scale Parameters in Elliptical Sampling Models". Bayesian Analysis in Statistics and Econometrics. Wiley. pp. 323–335. ISBN0-471-11856-7.