Matrix sign function

In mathematics, the matrix sign function is a matrix function on square matrices analogous to the complex sign function.^[1]

It was introduced by J.D. Roberts in 1971 as a tool for model reduction and for solving Lyapunov and Algebraic Riccati equation in a technical report of Cambridge University, which was later published in a journal in 1980.^[2]^[3]

Definition

The matrix sign function is a generalization of the complex signum function

$\operatorname {csgn} (z)={\begin{cases}1&{\text{if }}\mathrm {Re} (z)>0,\\-1&{\text{if }}\mathrm {Re} (z)<0,\end{cases}}$

to the matrix valued analogue $\operatorname {csgn} (A)$ . Although the sign function is not analytic, the matrix function is well defined for all matrices that have no eigenvalue on the imaginary axis, see for example the Jordan-form-based definition (where the derivatives are all zero).

Properties

Theorem: Let $A\in \mathbb {C} ^{n\times n}$ , then $\operatorname {csgn} (A)^{2}=I$ .^[1]

Theorem: Let $A\in \mathbb {C} ^{n\times n}$ , then $\operatorname {csgn} (A)$ is diagonalizable and has eigenvalues that are $\pm 1$ .^[1]

Theorem: Let $A\in \mathbb {C} ^{n\times n}$ , then $(I+\operatorname {csgn} (A))/2$ is a projector onto the invariant subspace associated with the eigenvalues in the right-half plane, and analogously for $(I-\operatorname {csgn} (A))/2$ and the left-half plane.^[1]

Theorem: Let $A\in \mathbb {C} ^{n\times n}$ , and $A=P{\begin{bmatrix}J_{+}&0\\0&J_{-}\end{bmatrix}}P^{-1}$ be a Jordan decomposition such that $J_{+}$ corresponds to eigenvalues with positive real part and $J_{-}$ to eigenvalue with negative real part. Then $\operatorname {csgn} (A)=P{\begin{bmatrix}I_{+}&0\\0&-I_{-}\end{bmatrix}}P^{-1}$ , where $I_{+}$ and $I_{-}$ are identity matrices of sizes corresponding to $J_{+}$ and $J_{-}$ , respectively.^[1]

Computational methods

The function can be computed with generic methods for matrix functions, but there are also specialized methods.

Newton iteration

The Newton iteration can be derived by observing that $\operatorname {csgn} (x)={\sqrt {x^{2}}}/x$ , which in terms of matrices can be written as $\operatorname {csgn} (A)=A^{-1}{\sqrt {A^{2}}}$ , where we use the matrix square root. If we apply the Babylonian method to compute the square root of the matrix $A^{2}$ , that is, the iteration ${\textstyle X_{k+1}={\frac {1}{2}}\left(X_{k}+AX_{k}^{-1}\right)}$ , and define the new iterate $Z_{k}=A^{-1}X_{k}$ , we arrive at the iteration

$Z_{k+1}={\frac {1}{2}}\left(Z_{k}+Z_{k}^{-1}\right)$ ,

where typically $Z_{0}=A$ . Convergence is global, and locally it is quadratic.^[1]^[2]

The Newton iteration uses the explicit inverse of the iterates $Z_{k}$ .

Newton–Schulz iteration

To avoid the need of an explicit inverse used in the Newton iteration, the inverse can be approximated with one step of the Newton iteration for the inverse, $Z_{k}^{-1}\approx Z_{k}\left(2I-Z_{k}^{2}\right)$ , derived by Schulz(de) in 1933.^[4] Substituting this approximation into the previous method, the new method becomes

$Z_{k+1}={\frac {1}{2}}Z_{k}\left(3I-Z_{k}^{2}\right)$ .

Convergence is (still) quadratic, but only local (guaranteed for $\|I-A^{2}\|<1$ ).^[1]

Applications

Solutions of Sylvester equations

Theorem:^[2]^[3] Let $A,B,C\in \mathbb {R} ^{n\times n}$ and assume that $A$ and $B$ are stable, then the unique solution to the Sylvester equation, $AX+XB=C$ , is given by $X$ such that

${\begin{bmatrix}-I&2X\\0&I\end{bmatrix}}=\operatorname {csgn} \left({\begin{bmatrix}A&-C\\0&-B\end{bmatrix}}\right).$

Proof sketch: The result follows from the similarity transform

${\begin{bmatrix}A&-C\\0&-B\end{bmatrix}}={\begin{bmatrix}I&X\\0&I\end{bmatrix}}{\begin{bmatrix}A&0\\0&-B\end{bmatrix}}{\begin{bmatrix}I&X\\0&I\end{bmatrix}}^{-1},$

since

$\operatorname {csgn} \left({\begin{bmatrix}A&-C\\0&-B\end{bmatrix}}\right)={\begin{bmatrix}I&X\\0&I\end{bmatrix}}{\begin{bmatrix}I&0\\0&-I\end{bmatrix}}{\begin{bmatrix}I&-X\\0&I\end{bmatrix}},$

due to the stability of $A$ and $B$ .

The theorem is, naturally, also applicable to the Lyapunov equation. However, due to the structure the Newton iteration simplifies to only involving inverses of $A$ and $A^{T}$ .

Solutions of algebraic Riccati equations

There is a similar result applicable to the algebraic Riccati equation, $A^{H}P+PA-PFP+Q=0$ .^[1]^[2] Define $V,W\in \mathbb {C} ^{2n\times n}$ as

${\begin{bmatrix}V&W\end{bmatrix}}=\operatorname {csgn} \left({\begin{bmatrix}A^{H}&Q\\F&-A\end{bmatrix}}\right)-{\begin{bmatrix}I&0\\0&I\end{bmatrix}}.$

Under the assumption that $F,Q\in \mathbb {C} ^{n\times n}$ are Hermitian and there exists a unique stabilizing solution, in the sense that $A-FP$ is stable, that solution is given by the over-determined, but consistent, linear system

$VP=-W.$

Proof sketch: The similarity transform

${\begin{bmatrix}A^{H}&Q\\F&-A\end{bmatrix}}={\begin{bmatrix}P&-I\\I&0\end{bmatrix}}{\begin{bmatrix}-(A-FP)&-F\\0&(A-FP)\end{bmatrix}}{\begin{bmatrix}P&-I\\I&0\end{bmatrix}}^{-1},$

and the stability of $A-FP$ implies that

$\left(\operatorname {csgn} \left({\begin{bmatrix}A^{H}&Q\\F&-A\end{bmatrix}}\right)-{\begin{bmatrix}I&0\\0&I\end{bmatrix}}\right){\begin{bmatrix}X&-I\\I&0\end{bmatrix}}={\begin{bmatrix}X&-I\\I&0\end{bmatrix}}{\begin{bmatrix}0&Y\\0&-2I\end{bmatrix}},$

for some matrix $Y\in \mathbb {C} ^{n\times n}$ .

Computations of matrix square-root

The Denman–Beavers iteration for the square root of a matrix can be derived from the Newton iteration for the matrix sign function by noticing that $A-PIP=0$ is a degenerate algebraic Riccati equation^[3] and by definition a solution $P$ is the square root of $A$ .