Singular value decomposition (SVD)

Theorem   Every (real) m×n matrix A can be written as a product A = USVT. Here, U and V are orthogonal matrices, with U being m×m and V being n×n. S is m×n, and has the form S =
s1 0 0 ... 0 ... 0
0 s2 0 ... 0 ... 0
... ... ... ... ... ... ...
0 0 0 ...sr ... 0
0 0 0 ... 0 ... 0
... ... ... ... ... ... ...
0 0 0 ... 0 ... 0
The diagonal entries are positive, and are ordered from greatest to least; r is the rank of A. The sk's are called the singular values of A.

Applications   Gilbert Strang has called this theorem the "Fundamental Theorem of Linear Algebra." The SVD contains very explicit information concerning everything one would want to know about a matrix.

  • Condition number   The condition number of a square, invertible matrix A is defined by
    cond(A) = s1/sn.
    It measures how many significant digits are preserved when one tries to solve Ax = b. For example, if b is known to 6 digits and cond(A) = 103, then x is known to 6 -3 = 3 digits.

  • Numerical rank   The rank of A is r, the number of (positive) singular values. The numerical rank of A is
    rank(A,t) = # of singular values greater than a tolerance t.
    Again, this is useful in working with problems having finite precision.

  • Least squares   The solution to finding a minimum of $\| Ax - b \|$ is easily done with the SVD. First we rewrite the problem using the SVD for A.
    $\| Ax - b\| = \| USV^Tx -UU^T b\|=\|Sz - c\|$, where $z = V^Tx$ and $c = U^T b$

    Then we note that
    $\|Sz - c \|^2 = \sum_{k=1}^r (s_kz_k-c_k)^2 +\sum_{k=r+1}^n c_k^2$

    Choosing $z_k = c_k/s_k$ for $ k = 1 ... r$ and $z_k = 0$ for $k = r+1 ... n$ then yields $\|Sz - c \|^2= \sum_{k=r+1}^n c_k^2$, which not only solves the problem, but also gives the solution $x = Vz$ with smallest length $\| x \| = \|z\|$.

Finding the SVD of a matrix   The matrix S above is just the matrix of A relative to new bases for both the input and output spaces.

  • The input space basis   The matrix ATA is real and symmetric, so there is an orthonormal basis of eigenvectors relative to which it is diagonal. Let the eigenvalues of ATA be z1, ..., zn, and the corresponding basis of eigenvectors be {v1, ..., vn}. The eigenvalues are nonnegative, as this calculation shows:

    ATA vk = zk vk
    vkT AT A vk = zk vkT vk
    || A vk ||2 = zk || vk ||2
    || A vk ||2 = zk   (|| vk || =1 )

    This calculation also shows that a vector v is in the null space of A if and only if it is an eigenvecor corresponding to the eigenvalue 0. If r = rank(A), then "rank + nullity = # of columns" tells us that the the nullity(A) = n - r. This means that there are r eigenvectors for the remaining eigenvalues. List these as z1 >= z2 >= ... >= zr > 0. Our input basis is now chosen as {v1, ..., vr, vr+1, ..., vn }. The numbering is the same as that for the eigenvalues. We now define the matrix V via
    V = [ v1 ... vr vr+1 ... vn ].

  • The output space basis   For k = 1, ..., r, let
    uk = A vk / || A vk ||
    uk = A vk / z11/2
    We can also write this as the following equation:
    A vk = zk1/2 uk .
    The uk's are orthonormal, for k = 1, ..., r. Again, we see this from these equations.
    ujT uk = vjT AT A vk / zj1/2 zk1/2
    ujT uk = zk vjT vk / zj1/2 zk1/2
    The orthonormality of the v's implies that the right side above is 0 unless j = k, in which case it is 1. Thus, the u's are orthonormal. Fill this set out with m - r vectors to form an orthonormal basis for the output space, Rm. This gives us the basis {u1, .., ur, ur+1, .., um}. As before, we define the m×m orthogonal matrix
    U = [u1, .., ur, ur+1, .., um].

  • The matrix of A relative to the new bases   We let S be MA. We compute it via the formula for the matrix of a linear transformation.
    S = [ [Av1]U [Av2]U ... [Avr]U [Avr+1]U ... [Avn]U ]
    The v's with k = r+1,...,n, are all in the null space of A. Thus the last n - r vectors are 0, and so are their corresponding columns relative to the basis of u's. The other vectors we get from the definition of the u's for k = 1, ..., r. The end result is this.

    S = [ [z11/2 u1]U [z21/2 u2]U ... [zk1/2 ur]U [ 0 ]U ... [ 0 ]U ]

    S = [ z11/2 [u1]U ... zr1/2 [ur]U   0 ... 0 ]

    S = [ z11/2 e1 ... zr1/2 er 0 ... 0 ].

    If we let sk = zk1/2 for k = 1, ..., r, we get the same S as the one given in the statement of the theorem. These sk's are the singular values of A. The matrix S is related to A via multiplication by change-of-basis matrices. The matrix U changes from new output to old output bases, and V changes from new input to old input bases. Since VT = V-1, we have that VT changes from old input to new input bases. In the end, this gives us A =USVT

Example Consider the matrix A =
 2  -2
 1   1
-2   2
Here, ATA =
 9  -7
-7   9
The eigenvalues of this matrix are z1 = 16 and z2 = 2. The singular values are s1 = 4 and s2 = 21/2. We can immediately write out what S is. We have S =
 4 0
 0 21/2
 0 0
The eigenvector corresponding to 16 is v1 = 2-1/2(1,-1)T, and the one corresponding to 4 is v2 = 2-1/2(1,1)T. Hence, we see that V =
 2-1/2   2-1/2
-2-1/2   2-1/2
Next, we find the u's.

u1 = A v1 / z11/2
u1 = 2-1/2 (4, 0, -4)T/4
u1 =( 2-1/2 , 0, - 2-1/2 )T.

A simlar calculation gives us
u2 =( 0, 1, 0 )T.

We now have to add to these to a "fill" vector
u3 =( 2-1/2 , 0, 2-1/2 )T
to complete the new output basis. This finally yields U =

 2-1/2 0  2-1/2
 0    1  0
-2-1/2 0  2-1/2