In optimal transport, a branch of mathematics, polar factorization of vector fields is a basic result due to Brenier (1987),[1] with antecedents of Knott-Smith (1984)[2] and Rachev (1985),[3] that generalizes many existing results among which are the polar decomposition of real matrices, and the rearrangement of real-valued functions.
Notation. Denote ξ # μ {\displaystyle \xi _{\#}\mu } the image measure of μ {\displaystyle \mu } through the map ξ {\displaystyle \xi } .
Definition: Measure preserving map. Let ( X , μ ) {\displaystyle (X,\mu )} and ( Y , ν ) {\displaystyle (Y,\nu )} be some probability spaces and σ : X → Y {\displaystyle \sigma :X\rightarrow Y} a measurable map. Then, σ {\displaystyle \sigma } is said to be measure preserving iff σ # μ = ν {\displaystyle \sigma _{\#}\mu =\nu } , where # {\displaystyle \#} is the pushforward measure. Spelled out: for every ν {\displaystyle \nu } -measurable subset Ω {\displaystyle \Omega } of Y {\displaystyle Y} , σ − 1 ( Ω ) {\displaystyle \sigma ^{-1}(\Omega )} is μ {\displaystyle \mu } -measurable, and μ ( σ − 1 ( Ω ) ) = ν ( Ω ) {\displaystyle \mu (\sigma ^{-1}(\Omega ))=\nu (\Omega )} . The latter is equivalent to:
where f {\displaystyle f} is ν {\displaystyle \nu } -integrable and f ∘ σ {\displaystyle f\circ \sigma } is μ {\displaystyle \mu } -integrable.
Theorem. Consider a map ξ : Ω → R d {\displaystyle \xi :\Omega \rightarrow R^{d}} where Ω {\displaystyle \Omega } is a convex subset of R d {\displaystyle R^{d}} , and μ {\displaystyle \mu } a measure on Ω {\displaystyle \Omega } which is absolutely continuous. Assume that ξ # μ {\displaystyle \xi _{\#}\mu } is absolutely continuous. Then there is a convex function φ : Ω → R {\displaystyle \varphi :\Omega \rightarrow R} and a map σ : Ω → Ω {\displaystyle \sigma :\Omega \rightarrow \Omega } preserving μ {\displaystyle \mu } such that
ξ = ( ∇ φ ) ∘ σ {\displaystyle \xi =\left(\nabla \varphi \right)\circ \sigma }
In addition, ∇ φ {\displaystyle \nabla \varphi } and σ {\displaystyle \sigma } are uniquely defined almost everywhere.[1][4]
In dimension 1, and when μ {\displaystyle \mu } is the Lebesgue measure over the unit interval, the result specializes to Ryff's theorem.[5] When d = 1 {\displaystyle d=1} and μ {\displaystyle \mu } is the uniform distribution over [ 0 , 1 ] {\displaystyle \left[0,1\right]} , the polar decomposition boils down to
ξ ( t ) = F X − 1 ( σ ( t ) ) {\displaystyle \xi \left(t\right)=F_{X}^{-1}\left(\sigma \left(t\right)\right)}
where F X {\displaystyle F_{X}} is cumulative distribution function of the random variable ξ ( U ) {\displaystyle \xi \left(U\right)} and U {\displaystyle U} has a uniform distribution over [ 0 , 1 ] {\displaystyle \left[0,1\right]} . F X {\displaystyle F_{X}} is assumed to be continuous, and σ ( t ) = F X ( ξ ( t ) ) {\displaystyle \sigma \left(t\right)=F_{X}\left(\xi \left(t\right)\right)} preserves the Lebesgue measure on [ 0 , 1 ] {\displaystyle \left[0,1\right]} .
When ξ {\displaystyle \xi } is a linear map and μ {\displaystyle \mu } is the Gaussian normal distribution, the result coincides with the polar decomposition of matrices. Assuming ξ ( x ) = M x {\displaystyle \xi \left(x\right)=Mx} where M {\displaystyle M} is an invertible d × d {\displaystyle d\times d} matrix and considering μ {\displaystyle \mu } the N ( 0 , I d ) {\displaystyle {\mathcal {N}}\left(0,I_{d}\right)} probability measure, the polar decomposition boils down to
M = S O {\displaystyle M=SO}
where S {\displaystyle S} is a symmetric positive definite matrix, and O {\displaystyle O} an orthogonal matrix. The connection with the polar factorization is φ ( x ) = x ⊤ S x / 2 {\displaystyle \varphi \left(x\right)=x^{\top }Sx/2} which is convex, and σ ( x ) = O x {\displaystyle \sigma \left(x\right)=Ox} which preserves the N ( 0 , I d ) {\displaystyle {\mathcal {N}}\left(0,I_{d}\right)} measure.
The results also allow to recover Helmholtz decomposition. Letting x → V ( x ) {\displaystyle x\rightarrow V\left(x\right)} be a smooth vector field it can then be written in a unique way as
V = w + ∇ p {\displaystyle V=w+\nabla p}
where p {\displaystyle p} is a smooth real function defined on Ω {\displaystyle \Omega } , unique up to an additive constant, and w {\displaystyle w} is a smooth divergence free vector field, parallel to the boundary of Ω {\displaystyle \Omega } .
The connection can be seen by assuming μ {\displaystyle \mu } is the Lebesgue measure on a compact set Ω ⊂ R n {\displaystyle \Omega \subset R^{n}} and by writing ξ {\displaystyle \xi } as a perturbation of the identity map
ξ ϵ ( x ) = x + ϵ V ( x ) {\displaystyle \xi _{\epsilon }(x)=x+\epsilon V(x)}
where ϵ {\displaystyle \epsilon } is small. The polar decomposition of ξ ϵ {\displaystyle \xi _{\epsilon }} is given by ξ ϵ = ( ∇ φ ϵ ) ∘ σ ϵ {\displaystyle \xi _{\epsilon }=(\nabla \varphi _{\epsilon })\circ \sigma _{\epsilon }} . Then, for any test function f : R n → R {\displaystyle f:R^{n}\rightarrow R} the following holds:
∫ Ω f ( x + ϵ V ( x ) ) d x = ∫ Ω f ( ( ∇ φ ϵ ) ∘ σ ϵ ( x ) ) d x = ∫ Ω f ( ∇ φ ϵ ( x ) ) d x {\displaystyle \int _{\Omega }f(x+\epsilon V(x))dx=\int _{\Omega }f((\nabla \varphi _{\epsilon })\circ \sigma _{\epsilon }\left(x\right))dx=\int _{\Omega }f(\nabla \varphi _{\epsilon }\left(x\right))dx}
where the fact that σ ϵ {\displaystyle \sigma _{\epsilon }} was preserving the Lebesgue measure was used in the second equality.
In fact, as φ 0 ( x ) = 1 2 ‖ x ‖ 2 {\displaystyle \textstyle \varphi _{0}(x)={\frac {1}{2}}\Vert x\Vert ^{2}} , one can expand φ ϵ ( x ) = 1 2 ‖ x ‖ 2 + ϵ p ( x ) + O ( ϵ 2 ) {\displaystyle \textstyle \varphi _{\epsilon }(x)={\frac {1}{2}}\Vert x\Vert ^{2}+\epsilon p(x)+O(\epsilon ^{2})} , and therefore ∇ φ ϵ ( x ) = x + ϵ ∇ p ( x ) + O ( ϵ 2 ) {\displaystyle \textstyle \nabla \varphi _{\epsilon }\left(x\right)=x+\epsilon \nabla p(x)+O(\epsilon ^{2})} . As a result, ∫ Ω ( V ( x ) − ∇ p ( x ) ) ∇ f ( x ) ) d x {\displaystyle \textstyle \int _{\Omega }\left(V(x)-\nabla p(x)\right)\nabla f(x))dx} for any smooth function f {\displaystyle f} , which implies that w ( x ) = V ( x ) − ∇ p ( x ) {\displaystyle w\left(x\right)=V(x)-\nabla p(x)} is divergence-free.[1][6]