In mathematics, the disintegration theorem is a result in measure theory and probability theory. It rigorously defines the idea of a non-trivial "restriction" of a measure to a measure zero subset of the measure space in question. It is related to the existence of conditional probability measures. In a sense, "disintegration" is the opposite process to the construction of a product measure.
Consider the unit square S = [ 0 , 1 ] × [ 0 , 1 ] {\displaystyle S=[0,1]\times [0,1]} in the Euclidean plane R 2 {\displaystyle \mathbb {R} ^{2}} . Consider the probability measure μ {\displaystyle \mu } defined on S {\displaystyle S} by the restriction of two-dimensional Lebesgue measure λ 2 {\displaystyle \lambda ^{2}} to S {\displaystyle S} . That is, the probability of an event E ⊆ S {\displaystyle E\subseteq S} is simply the area of E {\displaystyle E} . We assume E {\displaystyle E} is a measurable subset of S {\displaystyle S} .
Consider a one-dimensional subset of S {\displaystyle S} such as the line segment L x = { x } × [ 0 , 1 ] {\displaystyle L_{x}=\{x\}\times [0,1]} . L x {\displaystyle L_{x}} has μ {\displaystyle \mu } -measure zero; every subset of L x {\displaystyle L_{x}} is a μ {\displaystyle \mu } -null set; since the Lebesgue measure space is a complete measure space, E ⊆ L x ⟹ μ ( E ) = 0. {\displaystyle E\subseteq L_{x}\implies \mu (E)=0.}
While true, this is somewhat unsatisfying. It would be nice to say that μ {\displaystyle \mu } "restricted to" L x {\displaystyle L_{x}} is the one-dimensional Lebesgue measure λ 1 {\displaystyle \lambda ^{1}} , rather than the zero measure. The probability of a "two-dimensional" event E {\displaystyle E} could then be obtained as an integral of the one-dimensional probabilities of the vertical "slices" E ∩ L x {\displaystyle E\cap L_{x}} : more formally, if μ x {\displaystyle \mu _{x}} denotes one-dimensional Lebesgue measure on L x {\displaystyle L_{x}} , then μ ( E ) = ∫ [ 0 , 1 ] μ x ( E ∩ L x ) d x {\displaystyle \mu (E)=\int _{[0,1]}\mu _{x}(E\cap L_{x})\,\mathrm {d} x} for any "nice" E ⊆ S {\displaystyle E\subseteq S} . The disintegration theorem makes this argument rigorous in the context of measures on metric spaces.
(Hereafter, P ( X ) {\displaystyle {\mathcal {P}}(X)} will denote the collection of Borel probability measures on a topological space ( X , T ) {\displaystyle (X,T)} .) The assumptions of the theorem are as follows:
The conclusion of the theorem: There exists a ν {\displaystyle \nu } -almost everywhere uniquely determined family of probability measures { μ x } x ∈ X ⊆ P ( Y ) {\displaystyle \{\mu _{x}\}_{x\in X}\subseteq {\mathcal {P}}(Y)} , which provides a "disintegration" of μ {\displaystyle \mu } into { μ x } x ∈ X {\displaystyle \{\mu _{x}\}_{x\in X}} , such that:
The original example was a special case of the problem of product spaces, to which the disintegration theorem applies.
When Y {\displaystyle Y} is written as a Cartesian product Y = X 1 × X 2 {\displaystyle Y=X_{1}\times X_{2}} and π i : Y → X i {\displaystyle \pi _{i}:Y\to X_{i}} is the natural projection, then each fibre π 1 − 1 ( x 1 ) {\displaystyle \pi _{1}^{-1}(x_{1})} can be canonically identified with X 2 {\displaystyle X_{2}} and there exists a Borel family of probability measures { μ x 1 } x 1 ∈ X 1 {\displaystyle \{\mu _{x_{1}}\}_{x_{1}\in X_{1}}} in P ( X 2 ) {\displaystyle {\mathcal {P}}(X_{2})} (which is ( π 1 ) ∗ ( μ ) {\displaystyle (\pi _{1})_{*}(\mu )} -almost everywhere uniquely determined) such that μ = ∫ X 1 μ x 1 μ ( π 1 − 1 ( d x 1 ) ) = ∫ X 1 μ x 1 d ( π 1 ) ∗ ( μ ) ( x 1 ) , {\displaystyle \mu =\int _{X_{1}}\mu _{x_{1}}\,\mu \left(\pi _{1}^{-1}(\mathrm {d} x_{1})\right)=\int _{X_{1}}\mu _{x_{1}}\,\mathrm {d} (\pi _{1})_{*}(\mu )(x_{1}),} which is in particular[clarification needed] ∫ X 1 × X 2 f ( x 1 , x 2 ) μ ( d x 1 , d x 2 ) = ∫ X 1 ( ∫ X 2 f ( x 1 , x 2 ) μ ( d x 2 ∣ x 1 ) ) μ ( π 1 − 1 ( d x 1 ) ) {\displaystyle \int _{X_{1}\times X_{2}}f(x_{1},x_{2})\,\mu (\mathrm {d} x_{1},\mathrm {d} x_{2})=\int _{X_{1}}\left(\int _{X_{2}}f(x_{1},x_{2})\mu (\mathrm {d} x_{2}\mid x_{1})\right)\mu \left(\pi _{1}^{-1}(\mathrm {d} x_{1})\right)} and μ ( A × B ) = ∫ A μ ( B ∣ x 1 ) μ ( π 1 − 1 ( d x 1 ) ) . {\displaystyle \mu (A\times B)=\int _{A}\mu \left(B\mid x_{1}\right)\,\mu \left(\pi _{1}^{-1}(\mathrm {d} x_{1})\right).}
The relation to conditional expectation is given by the identities E ( f ∣ π 1 ) ( x 1 ) = ∫ X 2 f ( x 1 , x 2 ) μ ( d x 2 ∣ x 1 ) , {\displaystyle \operatorname {E} (f\mid \pi _{1})(x_{1})=\int _{X_{2}}f(x_{1},x_{2})\mu (\mathrm {d} x_{2}\mid x_{1}),} μ ( A × B ∣ π 1 ) ( x 1 ) = 1 A ( x 1 ) ⋅ μ ( B ∣ x 1 ) . {\displaystyle \mu (A\times B\mid \pi _{1})(x_{1})=1_{A}(x_{1})\cdot \mu (B\mid x_{1}).}
The disintegration theorem can also be seen as justifying the use of a "restricted" measure in vector calculus. For instance, in Stokes' theorem as applied to a vector field flowing through a compact surface Σ ⊂ R 3 {\displaystyle \Sigma \subset \mathbb {R} ^{3}} , it is implicit that the "correct" measure on Σ {\displaystyle \Sigma } is the disintegration of three-dimensional Lebesgue measure λ 3 {\displaystyle \lambda ^{3}} on Σ {\displaystyle \Sigma } , and that the disintegration of this measure on ∂Σ is the same as the disintegration of λ 3 {\displaystyle \lambda ^{3}} on ∂ Σ {\displaystyle \partial \Sigma } .[2]
The disintegration theorem can be applied to give a rigorous treatment of conditional probability distributions in statistics, while avoiding purely abstract formulations of conditional probability.[3] The theorem is related to the Borel–Kolmogorov paradox, for example.