CR and CAB, Rank Revealing Matrix Factorizations
The rank of a linear transformation is a fundamental concept in linear algebra and matrix factorizations are fundamental concepts in numerical linear algebra. Gil Strang's 2020 Vision of Linear Algebra seeks to introduce these notions early in an introductory linear algebra course.
The CR matrix factorization provides a view of rref, the reduced row echelon form, as a rank revealing matrix factorization. I discussed CR in a pair of posts in October. I now want to describe the CAB factorization, which uses rref twice in order to treat both rows and columns in the same way.
Gil and I are writing a review paper about these ideas, LU and CR Elimination. Here is a link to the current draft, LUCR_43.pdf.
Contents
CAB
This is a quick description of cab.
[C,W,B] = cab(A) returns C = A(:,cols), B = A(rows,:), and W = A(rows,cols) so that
A = C*inv(W)*B
[C,W,B,cols,rows] = cab(A) also returns the indices cols and rows.
All the elements of C, W and B are taken from the input matrix A itself; C is some of its columns, B is some of its rows, and W is the set intersection of C and B. Actually, cols and rows are all this factorization computes. The column indices come from the reduced row echelon form, rref(A), and the row indices from the transpose, rref(A').
Rank
The fact that the number of linearly independent columns of a matrix is equal to the number of linearly independent rows is not trivial and requires proof. This number is the rank of the matrix and is the size of the nonsingular W.
W is r-by-r where rank(A) = r = length(cols) = length(rows).
If you try to find C, W and R with a W that is too small, then C*inv(W)*R will not be able to reproduce A. If W is too large, it will be singular.
Rank one
If A has rank one, A = u*v' for column vector u and row vector v'. Then C is a multiple of u, B is a multiple of v', and W is the first nonzero element in u or v. Here is an example where the first row is deliberately zero.
A = (0:3)'*(4:6)
A = 0 0 0 4 5 6 8 10 12 12 15 18
format compact
[C,W,B] = cab(A)
C = 0 4 8 12 W = 4 B = 4 5 6
Rank two
In this example, even casual inspection will reveal the rank. Note that A is rectangular.
A = reshape(1:24,4,6)'
A = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
[C,W,B,cols,rows] = cab(A)
C = 1 2 5 6 9 10 13 14 17 18 21 22 W = 1 2 5 6 B = 1 2 3 4 5 6 7 8 cols = 1 2 rows = 1 2
rank = length(cols)
rank = 2
Full rank
If A is square and nonsingular, then C = B = W = A.
For example,
A = pascal(3)
A = 1 1 1 1 2 3 1 3 6
[C,W,B] = cab(A)
C = 1 1 1 1 2 3 1 3 6 W = 1 1 1 1 2 3 1 3 6 B = 1 1 1 1 2 3 1 3 6
Franklin magic square
The Franklin semimagic square is attributed to Benjamin Franklin in about 1752. A letter from Franklin to one of his colleagues is included in the Franklin papers referenced below.
All the rows and columns of Franklin's square have the required magic sum, but the diagonals do not, so the matrix isn't fully magic. However, many other interesting submatrices are magic, including bent diagonals and any eight elements arranged symmetrically about the center.
What is the rank of this 8-by-8 matrix?
A = franklin(8)
A = 52 61 4 13 20 29 36 45 14 3 62 51 46 35 30 19 53 60 5 12 21 28 37 44 11 6 59 54 43 38 27 22 55 58 7 10 23 26 39 42 9 8 57 56 41 40 25 24 50 63 2 15 18 31 34 47 16 1 64 49 48 33 32 17
It is hard to think of Franklin's square as a linear transformation, but cab can still compute its rank.
[C,W,B] = cab(A)
C = 52 61 4 14 3 62 53 60 5 11 6 59 55 58 7 9 8 57 50 63 2 16 1 64 W = 52 61 4 14 3 62 53 60 5 B = 52 61 4 13 20 29 36 45 14 3 62 51 46 35 30 19 53 60 5 12 21 28 37 44
So, the rank is only three.
Computing C*inv(W)*B introduces some roundoff error.
test = C*inv(W)*B;
displayer('test')
test = 52.00 61.00 4.00 13.00 20.00 29.00 36.00 45.00 14.00 3.00 62.00 51.00 46.00 35.00 30.00 19.00 53.00 60.00 5.00 12.00 21.00 28.00 37.00 44.00 11.00 6.00 59.00 54.00 43.00 38.00 27.00 22.00 55.00 58.00 7.00 10.00 23.00 26.00 39.00 42.00 9.00 8.00 57.00 56.00 41.00 40.00 25.00 24.00 50.00 63.00 2.00 15.00 18.00 31.00 34.00 47.00 16.00 1.00 64.00 49.00 48.00 33.00 32.00 17.00
Rounding test to the nearest integers reproduces Franklin's magic square exactly.
Backslash and forward slash
Computing C*inv(W)*B by (C/W)*B or C*(W\B) is usually more accurate. If A has integer elements, you might try round(C/W*B) or round(C*(W\B)).
This 5-by-5 matrix was deliberately constructed so that its rank deficiency is not obvious.
A = gallery(5)
A = -9 11 -21 63 -252 70 -69 141 -421 1684 -575 575 -1149 3451 -13801 3891 -3891 7782 -23345 93365 1024 -1024 2048 -6144 24572
[C,W,B] = cab(A)
C = -9 11 -21 63 70 -69 141 -421 -575 575 -1149 3451 3891 -3891 7782 -23345 1024 -1024 2048 -6144 W = -9 11 -21 63 70 -69 141 -421 -575 575 -1149 3451 3891 -3891 7782 -23345 B = -9 11 -21 63 -252 70 -69 141 -421 1684 -575 575 -1149 3451 -13801 3891 -3891 7782 -23345 93365
rank = size(W,1)
rank = 4
In this example, W is mildly ill-conditioned.
kappa = cond(W)
kappa = 2.4301e+04
There are three ways to recreate A.
invw = C*inv(W)*B; back = (C/W) * B; fore = C * (W\B);
The relative error with inversion is affected by the condition of W, while backslash and forward slash are almost five orders of magnitude more accurate,
format short e relerr = [norm(A-invw) norm(A-fore) norm(A-back)]/norm(A)
relerr = 2.1266e-13 4.7760e-18 3.7217e-17
Rounding any of the three to integers reproduces A exactly.
A = round(C*inv(W)*B)
A = -9 11 -21 63 -252 70 -69 141 -421 1684 -575 575 -1149 3451 -13801 3891 -3891 7782 -23345 93365 1024 -1024 2048 -6144 24572
Gauss-Jordan
The algorithm for computing the rref is Gauss-Jordan elimination, which has a dubious reputation in the numerical analysis community. After listening to me trying to characterize the algorithm's notoriety, Gil wrote in the introductory paragraphs of our survey that Gauss-Jordan is "expensive and numerically perilous".
The "expensive" part is easy to explain. The computation of the rref introduces zeros in a column both above and below the pivot element, while Gaussian elimination only zeros below the pivot. That's less work. We can measure computational complexity in terms of a fused multiply-add instruction, FMA, that multiplies two floating point values and then adds a third value to the result in one operation. For a large n-by-n system, Gauss-Jordan requires about (1/2) n^3 FMAs, while Gaussian elimination requires less, only (1/3) n^3 FMAs.
However, we don't intend CR and CAB to be used with large matrices. For our purposes, even using Gauss-Jordan twice is not expensive.
The "numerically perilous" portion of Gauss-Jordan's reputation is more subtle. It derives from solving a linear system A*x = b by computing the rref of an augmented matrix, [A b]. Gaussian elimination, on the other hand, finds triangular factors L and U and then computes x by back substitution. We could say that Gaussian elimination is partial elimination, while Gauss-Jordan is complete elimination. The two approaches are compared in the 1975 paper by Jim Wilkinson and Gwen Peters referenced below. Gaussian elimination almost always computes the solution of a nearby system, but the same cannot be said for Gauss-Jordan on the augmented matrix.
However, we are not using either form of elimination to actually compute the CAB factors; rref just selects indices. The matrices C, W and B from CAB(A) are submatrices of A. They are certainly accurate. There are two potential numerical difficulties -- W might be badly conditioned or it might be the wrong size.
Condition
What if A is badly conditioned? Then W has to inherit that conditioning. Here is an example. Start with
n = 12; U = eye(n,n) - triu(ones(n,n),1)
U = 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 0 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 0 0 1 -1 -1 -1 -1 -1 -1 -1 -1 0 0 0 0 1 -1 -1 -1 -1 -1 -1 -1 0 0 0 0 0 1 -1 -1 -1 -1 -1 -1 0 0 0 0 0 0 1 -1 -1 -1 -1 -1 0 0 0 0 0 0 0 1 -1 -1 -1 -1 0 0 0 0 0 0 0 0 1 -1 -1 -1 0 0 0 0 0 0 0 0 0 1 -1 -1 0 0 0 0 0 0 0 0 0 0 1 -1 0 0 0 0 0 0 0 0 0 0 0 1
U is badly conditioned. cond(U,1) and cond(U,inf) are both equal to n*2^(n-1).
Let
A = U'*U
A = 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 2 0 0 0 0 0 0 0 0 0 0 -1 0 3 1 1 1 1 1 1 1 1 1 -1 0 1 4 2 2 2 2 2 2 2 2 -1 0 1 2 5 3 3 3 3 3 3 3 -1 0 1 2 3 6 4 4 4 4 4 4 -1 0 1 2 3 4 7 5 5 5 5 5 -1 0 1 2 3 4 5 8 6 6 6 6 -1 0 1 2 3 4 5 6 9 7 7 7 -1 0 1 2 3 4 5 6 7 10 8 8 -1 0 1 2 3 4 5 6 7 8 11 9 -1 0 1 2 3 4 5 6 7 8 9 12
A is symmetric, positive definite and full rank. The determinant of A is equal to one. I don't see a neat formula for cond(A), but I know it does grow like 4^n.
Since A is full rank, its A = C*inv(W)*B factorization must be A = A*inv(A)*A.
[C,W,B] = cab(A); test = [isequal(C,A) isequal(W,A) isequal(B,A)]
test = 1×3 logical array 1 1 1
W is badly conditioned. inv(W) has large elements.
Winv = inv(W)
Winv = Columns 1 through 6 1398102 699051 349526 174764 87384 43696 699051 349526 174763 87382 43692 21848 349526 174763 87382 43691 21846 10924 174764 87382 43691 21846 10923 5462 87384 43692 21846 10923 5462 2731 43696 21848 10924 5462 2731 1366 21856 10928 5464 2732 1366 683 10944 5472 2736 1368 684 342 5504 2752 1376 688 344 172 2816 1408 704 352 176 88 1536 768 384 192 96 48 1024 512 256 128 64 32 Columns 7 through 12 21856 10944 5504 2816 1536 1024 10928 5472 2752 1408 768 512 5464 2736 1376 704 384 256 2732 1368 688 352 192 128 1366 684 344 176 96 64 683 342 172 88 48 32 342 171 86 44 24 16 171 86 43 22 12 8 86 43 22 11 6 4 44 22 11 6 3 2 24 12 6 3 2 1 16 8 4 2 1 1
logWinv = floor(log2(Winv))
logWinv = 20 19 18 17 16 15 14 13 12 11 10 10 19 18 17 16 15 14 13 12 11 10 9 9 18 17 16 15 14 13 12 11 10 9 8 8 17 16 15 14 13 12 11 10 9 8 7 7 16 15 14 13 12 11 10 9 8 7 6 6 15 14 13 12 11 10 9 8 7 6 5 5 14 13 12 11 10 9 8 7 6 5 4 4 13 12 11 10 9 8 7 6 5 4 3 3 12 11 10 9 8 7 6 5 4 3 2 2 11 10 9 8 7 6 5 4 3 2 1 1 10 9 8 7 6 5 4 3 2 1 1 0 10 9 8 7 6 5 4 3 2 1 0 0
Follow the diagonal of logWinv from (n,n) to (1,1).
Winv(1,1) > 4^(n-2)
ans = logical 1
Is A close to a matrix of lower rank? The answer is yes, but the lower rank approximation comes from the SVD.
CAB vs. SVD
The SVD is the ultimate rank revealing factorization, but it does not actually attempt to compute the rank.
[U,S,V] = svd(A)
computes a full set of singular values and vectors as if the matrix had full rank. The singular values, s = diag(S), are indexed in decreasing order,
s(1) >= . . . >= s(r) >= s(r+1) . . .
You can compare the singular values with a tolerance to obtain an approximate rank and factorization.
The choice of an effective rank is backed by the Eckart-Young-Mirsky theorem: If Ar is the approximation made by choosing the rank to be r,
Ar = U(:,1:r)*S(1:r,1:r)*V(:,1:r)'
the norm of the error is bounded by the first term that is omitted,
norm(A - Ar,2) <= s(r+1)
If the singular values decrease rapidly, an accurate low rank approximation is achievable with only a few terms.
On the other hand, CAB chooses bases from the columns and rows of a matrix itself. There is no Eckert-Young theorem for the CR or CAB factorizations.
We intend CR and CAB to be used in the classroom on small matrices, usually with integer entries. We will let the SVD and randomized matrix factorizations handle the big stuff.
References
Benjamin Franklin, The Papers of Benjamin Franklin, Volume 44, 1750-53, To Peter Collinson. (American Philosophical Society and Yale University) p. 392. https://franklinpapers.org/framedVolumes.jsp?vol=4&page=392a.
G. Peters and J. H. Wilkinson, "On the stability of Gauss-Jordan elimination with pivoting", Communications ACM 18, 1975, p. 20-24. https://dl.acm.org/doi/10.1145/360569.360653, also at http://www.stat.uchicago.edu/~lekheng/courses/302/gauss-jordan2.pdf
Postscript
Ben Franklin can also supply the graphic for today's post.
A = franklin(8); [~,~,B] = cab(A); plot(B)
Comments
To leave a comment, please click here to sign in to your MathWorks Account or create a new one.