In mathematics, the matrix sign function is a matrix function on square matrices analogous to the complex sign function.[1]
It was introduced by J.D. Roberts in 1971 as a tool for model reduction and for solving Lyapunov and Algebraic Riccati equation in a technical report of Cambridge University, which was later published in a journal in 1980.[2][3]
The matrix sign function is a generalization of the complex signum function
csgn ( z ) = { 1 if R e ( z ) > 0 , − 1 if R e ( z ) < 0 , {\displaystyle \operatorname {csgn} (z)={\begin{cases}1&{\text{if }}\mathrm {Re} (z)>0,\\-1&{\text{if }}\mathrm {Re} (z)<0,\end{cases}}}
to the matrix valued analogue csgn ( A ) {\displaystyle \operatorname {csgn} (A)} . Although the sign function is not analytic, the matrix function is well defined for all matrices that have no eigenvalue on the imaginary axis, see for example the Jordan-form-based definition (where the derivatives are all zero).
Theorem: Let A ∈ C n × n {\displaystyle A\in \mathbb {C} ^{n\times n}} , then csgn ( A ) 2 = I {\displaystyle \operatorname {csgn} (A)^{2}=I} .[1]
Theorem: Let A ∈ C n × n {\displaystyle A\in \mathbb {C} ^{n\times n}} , then csgn ( A ) {\displaystyle \operatorname {csgn} (A)} is diagonalizable and has eigenvalues that are ± 1 {\displaystyle \pm 1} .[1]
Theorem: Let A ∈ C n × n {\displaystyle A\in \mathbb {C} ^{n\times n}} , then ( I + csgn ( A ) ) / 2 {\displaystyle (I+\operatorname {csgn} (A))/2} is a projector onto the invariant subspace associated with the eigenvalues in the right-half plane, and analogously for ( I − csgn ( A ) ) / 2 {\displaystyle (I-\operatorname {csgn} (A))/2} and the left-half plane.[1]
Theorem: Let A ∈ C n × n {\displaystyle A\in \mathbb {C} ^{n\times n}} , and A = P [ J + 0 0 J − ] P − 1 {\displaystyle A=P{\begin{bmatrix}J_{+}&0\\0&J_{-}\end{bmatrix}}P^{-1}} be a Jordan decomposition such that J + {\displaystyle J_{+}} corresponds to eigenvalues with positive real part and J − {\displaystyle J_{-}} to eigenvalue with negative real part. Then csgn ( A ) = P [ I + 0 0 − I − ] P − 1 {\displaystyle \operatorname {csgn} (A)=P{\begin{bmatrix}I_{+}&0\\0&-I_{-}\end{bmatrix}}P^{-1}} , where I + {\displaystyle I_{+}} and I − {\displaystyle I_{-}} are identity matrices of sizes corresponding to J + {\displaystyle J_{+}} and J − {\displaystyle J_{-}} , respectively.[1]
The function can be computed with generic methods for matrix functions, but there are also specialized methods.
The Newton iteration can be derived by observing that csgn ( x ) = x 2 / x {\displaystyle \operatorname {csgn} (x)={\sqrt {x^{2}}}/x} , which in terms of matrices can be written as csgn ( A ) = A − 1 A 2 {\displaystyle \operatorname {csgn} (A)=A^{-1}{\sqrt {A^{2}}}} , where we use the matrix square root. If we apply the Babylonian method to compute the square root of the matrix A 2 {\displaystyle A^{2}} , that is, the iteration X k + 1 = 1 2 ( X k + A X k − 1 ) {\textstyle X_{k+1}={\frac {1}{2}}\left(X_{k}+AX_{k}^{-1}\right)} , and define the new iterate Z k = A − 1 X k {\displaystyle Z_{k}=A^{-1}X_{k}} , we arrive at the iteration
Z k + 1 = 1 2 ( Z k + Z k − 1 ) {\displaystyle Z_{k+1}={\frac {1}{2}}\left(Z_{k}+Z_{k}^{-1}\right)} ,
where typically Z 0 = A {\displaystyle Z_{0}=A} . Convergence is global, and locally it is quadratic.[1][2]
The Newton iteration uses the explicit inverse of the iterates Z k {\displaystyle Z_{k}} .
To avoid the need of an explicit inverse used in the Newton iteration, the inverse can be approximated with one step of the Newton iteration for the inverse, Z k − 1 ≈ Z k ( 2 I − Z k 2 ) {\displaystyle Z_{k}^{-1}\approx Z_{k}\left(2I-Z_{k}^{2}\right)} , derived by Schulz(de) in 1933.[4] Substituting this approximation into the previous method, the new method becomes
Z k + 1 = 1 2 Z k ( 3 I − Z k 2 ) {\displaystyle Z_{k+1}={\frac {1}{2}}Z_{k}\left(3I-Z_{k}^{2}\right)} .
Convergence is (still) quadratic, but only local (guaranteed for ‖ I − A 2 ‖ < 1 {\displaystyle \|I-A^{2}\|<1} ).[1]
Theorem:[2][3] Let A , B , C ∈ R n × n {\displaystyle A,B,C\in \mathbb {R} ^{n\times n}} and assume that A {\displaystyle A} and B {\displaystyle B} are stable, then the unique solution to the Sylvester equation, A X + X B = C {\displaystyle AX+XB=C} , is given by X {\displaystyle X} such that
[ − I 2 X 0 I ] = csgn ( [ A − C 0 − B ] ) . {\displaystyle {\begin{bmatrix}-I&2X\\0&I\end{bmatrix}}=\operatorname {csgn} \left({\begin{bmatrix}A&-C\\0&-B\end{bmatrix}}\right).}
Proof sketch: The result follows from the similarity transform
[ A − C 0 − B ] = [ I X 0 I ] [ A 0 0 − B ] [ I X 0 I ] − 1 , {\displaystyle {\begin{bmatrix}A&-C\\0&-B\end{bmatrix}}={\begin{bmatrix}I&X\\0&I\end{bmatrix}}{\begin{bmatrix}A&0\\0&-B\end{bmatrix}}{\begin{bmatrix}I&X\\0&I\end{bmatrix}}^{-1},}
since
csgn ( [ A − C 0 − B ] ) = [ I X 0 I ] [ I 0 0 − I ] [ I − X 0 I ] , {\displaystyle \operatorname {csgn} \left({\begin{bmatrix}A&-C\\0&-B\end{bmatrix}}\right)={\begin{bmatrix}I&X\\0&I\end{bmatrix}}{\begin{bmatrix}I&0\\0&-I\end{bmatrix}}{\begin{bmatrix}I&-X\\0&I\end{bmatrix}},}
due to the stability of A {\displaystyle A} and B {\displaystyle B} .
The theorem is, naturally, also applicable to the Lyapunov equation. However, due to the structure the Newton iteration simplifies to only involving inverses of A {\displaystyle A} and A T {\displaystyle A^{T}} .
There is a similar result applicable to the algebraic Riccati equation, A H P + P A − P F P + Q = 0 {\displaystyle A^{H}P+PA-PFP+Q=0} .[1][2] Define V , W ∈ C 2 n × n {\displaystyle V,W\in \mathbb {C} ^{2n\times n}} as
[ V W ] = csgn ( [ A H Q F − A ] ) − [ I 0 0 I ] . {\displaystyle {\begin{bmatrix}V&W\end{bmatrix}}=\operatorname {csgn} \left({\begin{bmatrix}A^{H}&Q\\F&-A\end{bmatrix}}\right)-{\begin{bmatrix}I&0\\0&I\end{bmatrix}}.}
Under the assumption that F , Q ∈ C n × n {\displaystyle F,Q\in \mathbb {C} ^{n\times n}} are Hermitian and there exists a unique stabilizing solution, in the sense that A − F P {\displaystyle A-FP} is stable, that solution is given by the over-determined, but consistent, linear system
V P = − W . {\displaystyle VP=-W.}
Proof sketch: The similarity transform
[ A H Q F − A ] = [ P − I I 0 ] [ − ( A − F P ) − F 0 ( A − F P ) ] [ P − I I 0 ] − 1 , {\displaystyle {\begin{bmatrix}A^{H}&Q\\F&-A\end{bmatrix}}={\begin{bmatrix}P&-I\\I&0\end{bmatrix}}{\begin{bmatrix}-(A-FP)&-F\\0&(A-FP)\end{bmatrix}}{\begin{bmatrix}P&-I\\I&0\end{bmatrix}}^{-1},}
and the stability of A − F P {\displaystyle A-FP} implies that
( csgn ( [ A H Q F − A ] ) − [ I 0 0 I ] ) [ X − I I 0 ] = [ X − I I 0 ] [ 0 Y 0 − 2 I ] , {\displaystyle \left(\operatorname {csgn} \left({\begin{bmatrix}A^{H}&Q\\F&-A\end{bmatrix}}\right)-{\begin{bmatrix}I&0\\0&I\end{bmatrix}}\right){\begin{bmatrix}X&-I\\I&0\end{bmatrix}}={\begin{bmatrix}X&-I\\I&0\end{bmatrix}}{\begin{bmatrix}0&Y\\0&-2I\end{bmatrix}},}
for some matrix Y ∈ C n × n {\displaystyle Y\in \mathbb {C} ^{n\times n}} .
The Denman–Beavers iteration for the square root of a matrix can be derived from the Newton iteration for the matrix sign function by noticing that A − P I P = 0 {\displaystyle A-PIP=0} is a degenerate algebraic Riccati equation[3] and by definition a solution P {\displaystyle P} is the square root of A {\displaystyle A} .