# CR and CAB, Rank Revealing Matrix Factorizations

The rank of a linear transformation is a fundamental concept in linear algebra and matrix factorizations are fundamental concepts in numerical linear algebra. Gil Strang's 2020 Vision of Linear Algebra seeks to introduce these notions early in an introductory linear algebra course.

The CR matrix factorization provides a view of rref, the reduced row echelon form, as a rank revealing matrix factorization. I discussed CR in a pair of posts in October. I now want to describe the CAB factorization, which uses rref twice in order to treat both rows and columns in the same way.

Gil and I are writing a review paper about these ideas, LU and CR Elimination. Here is a link to the current draft, LUCR_43.pdf.

### Contents

#### CAB

This is a quick description of cab.

  [C,W,B] = cab(A) returns C = A(:,cols), B = A(rows,:),
and W = A(rows,cols) so that
      A = C*inv(W)*B
  [C,W,B,cols,rows] = cab(A) also returns the indices cols and rows.

All the elements of C, W and B are taken from the input matrix A itself; C is some of its columns, B is some of its rows, and W is the set intersection of C and B. Actually, cols and rows are all this factorization computes. The column indices come from the reduced row echelon form, rref(A), and the row indices from the transpose, rref(A').

#### Rank

The fact that the number of linearly independent columns of a matrix is equal to the number of linearly independent rows is not trivial and requires proof. This number is the rank of the matrix and is the size of the nonsingular W.

  W is r-by-r where rank(A) = r = length(cols) = length(rows).

If you try to find C, W and R with a W that is too small, then C*inv(W)*R will not be able to reproduce A. If W is too large, it will be singular.

#### Rank one

If A has rank one, A = u*v' for column vector u and row vector v'. Then C is a multiple of u, B is a multiple of v', and W is the first nonzero element in u or v. Here is an example where the first row is deliberately zero.

   A = (0:3)'*(4:6)

A =
0     0     0
4     5     6
8    10    12
12    15    18

   format compact
[C,W,B] = cab(A)

C =
0
4
8
12
W =
4
B =
4     5     6


#### Rank two

In this example, even casual inspection will reveal the rank. Note that A is rectangular.

   A = reshape(1:24,4,6)'

A =
1     2     3     4
5     6     7     8
9    10    11    12
13    14    15    16
17    18    19    20
21    22    23    24

   [C,W,B,cols,rows] = cab(A)

C =
1     2
5     6
9    10
13    14
17    18
21    22
W =
1     2
5     6
B =
1     2     3     4
5     6     7     8
cols =
1     2
rows =
1     2

   rank = length(cols)

rank =
2


#### Full rank

  If A is square and nonsingular, then C = B = W = A.

For example,

   A = pascal(3)

A =
1     1     1
1     2     3
1     3     6

   [C,W,B] = cab(A)

C =
1     1     1
1     2     3
1     3     6
W =
1     1     1
1     2     3
1     3     6
B =
1     1     1
1     2     3
1     3     6


#### Franklin magic square

The Franklin semimagic square is attributed to Benjamin Franklin in about 1752. A letter from Franklin to one of his colleagues is included in the Franklin papers referenced below.

All the rows and columns of Franklin's square have the required magic sum, but the diagonals do not, so the matrix isn't fully magic. However, many other interesting submatrices are magic, including bent diagonals and any eight elements arranged symmetrically about the center.

What is the rank of this 8-by-8 matrix?

    A = franklin(8)

A =
52    61     4    13    20    29    36    45
14     3    62    51    46    35    30    19
53    60     5    12    21    28    37    44
11     6    59    54    43    38    27    22
55    58     7    10    23    26    39    42
9     8    57    56    41    40    25    24
50    63     2    15    18    31    34    47
16     1    64    49    48    33    32    17


It is hard to think of Franklin's square as a linear transformation, but cab can still compute its rank.

   [C,W,B] = cab(A)

C =
52    61     4
14     3    62
53    60     5
11     6    59
55    58     7
9     8    57
50    63     2
16     1    64
W =
52    61     4
14     3    62
53    60     5
B =
52    61     4    13    20    29    36    45
14     3    62    51    46    35    30    19
53    60     5    12    21    28    37    44


So, the rank is only three.

Computing C*inv(W)*B introduces some roundoff error.

   test = C*inv(W)*B;
displayer('test')

test =
52.00  61.00   4.00  13.00  20.00  29.00  36.00  45.00
14.00   3.00  62.00  51.00  46.00  35.00  30.00  19.00
53.00  60.00   5.00  12.00  21.00  28.00  37.00  44.00
11.00   6.00  59.00  54.00  43.00  38.00  27.00  22.00
55.00  58.00   7.00  10.00  23.00  26.00  39.00  42.00
9.00   8.00  57.00  56.00  41.00  40.00  25.00  24.00
50.00  63.00   2.00  15.00  18.00  31.00  34.00  47.00
16.00   1.00  64.00  49.00  48.00  33.00  32.00  17.00


Rounding test to the nearest integers reproduces Franklin's magic square exactly.

#### Backslash and forward slash

  Computing C*inv(W)*B by (C/W)*B or C*(W\B) is usually more accurate.
If A has integer elements, you might try round(C/W*B) or round(C*(W\B)).

This 5-by-5 matrix was deliberately constructed so that its rank deficiency is not obvious.

   A  = gallery(5)

A =
-9          11         -21          63        -252
70         -69         141        -421        1684
-575         575       -1149        3451      -13801
3891       -3891        7782      -23345       93365
1024       -1024        2048       -6144       24572

   [C,W,B] = cab(A)

C =
-9          11         -21          63
70         -69         141        -421
-575         575       -1149        3451
3891       -3891        7782      -23345
1024       -1024        2048       -6144
W =
-9          11         -21          63
70         -69         141        -421
-575         575       -1149        3451
3891       -3891        7782      -23345
B =
-9          11         -21          63        -252
70         -69         141        -421        1684
-575         575       -1149        3451      -13801
3891       -3891        7782      -23345       93365

   rank = size(W,1)

rank =
4


In this example, W is mildly ill-conditioned.

   kappa = cond(W)

kappa =
2.4301e+04


There are three ways to recreate A.

   invw = C*inv(W)*B;
back = (C/W) * B;
fore = C * (W\B);


The relative error with inversion is affected by the condition of W, while backslash and forward slash are almost five orders of magnitude more accurate,

   format short e
relerr = [norm(A-invw) norm(A-fore) norm(A-back)]/norm(A)

relerr =
2.1266e-13   4.7760e-18   3.7217e-17


Rounding any of the three to integers reproduces A exactly.

   A = round(C*inv(W)*B)

A =
-9          11         -21          63        -252
70         -69         141        -421        1684
-575         575       -1149        3451      -13801
3891       -3891        7782      -23345       93365
1024       -1024        2048       -6144       24572


#### Gauss-Jordan

The algorithm for computing the rref is Gauss-Jordan elimination, which has a dubious reputation in the numerical analysis community. After listening to me trying to characterize the algorithm's notoriety, Gil wrote in the introductory paragraphs of our survey that Gauss-Jordan is "expensive and numerically perilous".

The "expensive" part is easy to explain. The computation of the rref introduces zeros in a column both above and below the pivot element, while Gaussian elimination only zeros below the pivot. That's less work. We can measure computational complexity in terms of a fused multiply-add instruction, FMA, that multiplies two floating point values and then adds a third value to the result in one operation. For a large n-by-n system, Gauss-Jordan requires about (1/2) n^3 FMAs, while Gaussian elimination requires less, only (1/3) n^3 FMAs.

However, we don't intend CR and CAB to be used with large matrices. For our purposes, even using Gauss-Jordan twice is not expensive.

The "numerically perilous" portion of Gauss-Jordan's reputation is more subtle. It derives from solving a linear system A*x = b by computing the rref of an augmented matrix, [A b]. Gaussian elimination, on the other hand, finds triangular factors L and U and then computes x by back substitution. We could say that Gaussian elimination is partial elimination, while Gauss-Jordan is complete elimination. The two approaches are compared in the 1975 paper by Jim Wilkinson and Gwen Peters referenced below. Gaussian elimination almost always computes the solution of a nearby system, but the same cannot be said for Gauss-Jordan on the augmented matrix.

However, we are not using either form of elimination to actually compute the CAB factors; rref just selects indices. The matrices C, W and B from CAB(A) are submatrices of A. They are certainly accurate. There are two potential numerical difficulties -- W might be badly conditioned or it might be the wrong size.

#### Condition

What if A is badly conditioned? Then W has to inherit that conditioning. Here is an example. Start with

   n = 12;
U = eye(n,n) - triu(ones(n,n),1)

U =
1    -1    -1    -1    -1    -1    -1    -1    -1    -1    -1    -1
0     1    -1    -1    -1    -1    -1    -1    -1    -1    -1    -1
0     0     1    -1    -1    -1    -1    -1    -1    -1    -1    -1
0     0     0     1    -1    -1    -1    -1    -1    -1    -1    -1
0     0     0     0     1    -1    -1    -1    -1    -1    -1    -1
0     0     0     0     0     1    -1    -1    -1    -1    -1    -1
0     0     0     0     0     0     1    -1    -1    -1    -1    -1
0     0     0     0     0     0     0     1    -1    -1    -1    -1
0     0     0     0     0     0     0     0     1    -1    -1    -1
0     0     0     0     0     0     0     0     0     1    -1    -1
0     0     0     0     0     0     0     0     0     0     1    -1
0     0     0     0     0     0     0     0     0     0     0     1


U is badly conditioned. cond(U,1) and cond(U,inf) are both equal to n*2^(n-1).

Let

   A = U'*U

A =
1    -1    -1    -1    -1    -1    -1    -1    -1    -1    -1    -1
-1     2     0     0     0     0     0     0     0     0     0     0
-1     0     3     1     1     1     1     1     1     1     1     1
-1     0     1     4     2     2     2     2     2     2     2     2
-1     0     1     2     5     3     3     3     3     3     3     3
-1     0     1     2     3     6     4     4     4     4     4     4
-1     0     1     2     3     4     7     5     5     5     5     5
-1     0     1     2     3     4     5     8     6     6     6     6
-1     0     1     2     3     4     5     6     9     7     7     7
-1     0     1     2     3     4     5     6     7    10     8     8
-1     0     1     2     3     4     5     6     7     8    11     9
-1     0     1     2     3     4     5     6     7     8     9    12


A is symmetric, positive definite and full rank. The determinant of A is equal to one. I don't see a neat formula for cond(A), but I know it does grow like 4^n.

Since A is full rank, its A = C*inv(W)*B factorization must be A = A*inv(A)*A.

   [C,W,B] = cab(A);
test = [isequal(C,A) isequal(W,A) isequal(B,A)]

test =
1×3 logical array
1   1   1


W is badly conditioned. inv(W) has large elements.

   Winv = inv(W)

Winv =
Columns 1 through 6
1398102      699051      349526      174764       87384       43696
699051      349526      174763       87382       43692       21848
349526      174763       87382       43691       21846       10924
174764       87382       43691       21846       10923        5462
87384       43692       21846       10923        5462        2731
43696       21848       10924        5462        2731        1366
21856       10928        5464        2732        1366         683
10944        5472        2736        1368         684         342
5504        2752        1376         688         344         172
2816        1408         704         352         176          88
1536         768         384         192          96          48
1024         512         256         128          64          32
Columns 7 through 12
21856       10944        5504        2816        1536        1024
10928        5472        2752        1408         768         512
5464        2736        1376         704         384         256
2732        1368         688         352         192         128
1366         684         344         176          96          64
683         342         172          88          48          32
342         171          86          44          24          16
171          86          43          22          12           8
86          43          22          11           6           4
44          22          11           6           3           2
24          12           6           3           2           1
16           8           4           2           1           1

   logWinv = floor(log2(Winv))

logWinv =
20    19    18    17    16    15    14    13    12    11    10    10
19    18    17    16    15    14    13    12    11    10     9     9
18    17    16    15    14    13    12    11    10     9     8     8
17    16    15    14    13    12    11    10     9     8     7     7
16    15    14    13    12    11    10     9     8     7     6     6
15    14    13    12    11    10     9     8     7     6     5     5
14    13    12    11    10     9     8     7     6     5     4     4
13    12    11    10     9     8     7     6     5     4     3     3
12    11    10     9     8     7     6     5     4     3     2     2
11    10     9     8     7     6     5     4     3     2     1     1
10     9     8     7     6     5     4     3     2     1     1     0
10     9     8     7     6     5     4     3     2     1     0     0


Follow the diagonal of logWinv from (n,n) to (1,1).

   Winv(1,1) > 4^(n-2)

ans =
logical
1


Is A close to a matrix of lower rank? The answer is yes, but the lower rank approximation comes from the SVD.

#### CAB vs. SVD

The SVD is the ultimate rank revealing factorization, but it does not actually attempt to compute the rank.

  [U,S,V] = svd(A)

computes a full set of singular values and vectors as if the matrix had full rank. The singular values, s = diag(S), are indexed in decreasing order,

s(1) >=  . . . >= s(r) >= s(r+1) . . .


You can compare the singular values with a tolerance to obtain an approximate rank and factorization.

The choice of an effective rank is backed by the Eckart-Young-Mirsky theorem: If Ar is the approximation made by choosing the rank to be r,

Ar = U(:,1:r)*S(1:r,1:r)*V(:,1:r)'


the norm of the error is bounded by the first term that is omitted,

norm(A - Ar,2) <= s(r+1)


If the singular values decrease rapidly, an accurate low rank approximation is achievable with only a few terms.

On the other hand, CAB chooses bases from the columns and rows of a matrix itself. There is no Eckert-Young theorem for the CR or CAB factorizations.

We intend CR and CAB to be used in the classroom on small matrices, usually with integer entries. We will let the SVD and randomized matrix factorizations handle the big stuff.

#### References

Benjamin Franklin, The Papers of Benjamin Franklin, Volume 44, 1750-53, To Peter Collinson. (American Philosophical Society and Yale University) p. 392. https://franklinpapers.org/framedVolumes.jsp?vol=4&page=392a.

G. Peters and J. H. Wilkinson, "On the stability of Gauss-Jordan elimination with pivoting", Communications ACM 18, 1975, p. 20-24. https://dl.acm.org/doi/10.1145/360569.360653, also at http://www.stat.uchicago.edu/~lekheng/courses/302/gauss-jordan2.pdf

#### Postscript

Ben Franklin can also supply the graphic for today's post.

    A = franklin(8);
[~,~,B] = cab(A);
plot(B) Published with MATLAB® R2021a

|