Loren on the Art of MATLAB

March 1st, 2007

Creating Sparse Finite-Element Matrices in MATLAB

I'm pleased to introduce Tim Davis as this week's guest blogger. Tim is a professor at the University of Florida, and is the author or co-author of many of our sparse matrix functions (lu, chol, much of sparse backslash, ordering methods such as amd and colamd, and other functions such as etree and symbfact). He is also the author of a recent book, Direct Methods for Sparse Linear Systems, published by SIAM, where more details of MATLAB sparse matrices are discussed ( http://www.cise.ufl.edu/~davis ).

Contents

MATLAB is Slow? Only If You Use It Incorrectly

From time to time, I hear comments such as "MATLAB is slow for large finite-element problems." When I look at the details, the problem is typically due to an overuse of A(i,j)= ... when creating the sparse matrix (assembling the finite elements). This can be seen in typical user's code, MATLAB code in books on the topic, and even in MATLAB itself. The problem is widespread. MATLAB can be very fast for finite-element problems, but not if it's used incorrectly.

How Not to Create a Finite-Element Matrix

A good example of what not to do can be found in the wathen.m function, in MATLAB. A=gallery('wathen',200,200) takes a huge amount of time; a very minor modification cuts the run time drastically. I'm not intending to single out this one function for critique; this is a very common issue that I see over and over again. This function is built into MATLAB, which makes it an accessible example. It was first written when MATLAB did not support sparse matrices, and was modified only slightly to exploit sparsity. It was never meant for generating large sparse finite-element matrices. You can see the entire function with the command:

  type private/wathen.m

Below is an excerpt of the relevant parts of wathen.m. The function wathen1.m creates a finite-element matrix of an nx-by-ny mesh. Each i loop creates a single 8-by-8 finite-element matrix, and adds it into A.

type wathen1
tic
A = wathen1 (200,200) ;
toc
function A = wathen1 (nx,ny)
rand('state',0)
e1 = [6 -6 2 -8;-6 32 -6 20;2 -6 6 -6;-8 20 -6 32];
e2 = [3 -8 2 -6;-8 16 -8 20;2 -8 3 -8;-6 20 -8 16];
e = [e1 e2; e2' e1]/45;
n = 3*nx*ny+2*nx+2*ny+1;
A = sparse(n,n);
RHO = 100*rand(nx,ny);
nn = zeros(8,1);
for j=1:ny
    for i=1:nx
        nn(1) = 3*j*nx+2*i+2*j+1;
        nn(2) = nn(1)-1;
        nn(3) = nn(2)-1;
        nn(4) = (3*j-1)*nx+2*j+i-1;
        nn(5) = 3*(j-1)*nx+2*i+2*j-3;
        nn(6) = nn(5)+1;
        nn(7) = nn(6)+1;
        nn(8) = nn(4)+1;
        em = e*RHO(i,j);
        for krow=1:8
            for kcol=1:8
                A(nn(krow),nn(kcol)) = A(nn(krow),nn(kcol))+em(krow,kcol);
            end
        end
    end
end

Elapsed time is 305.832709 seconds.

What's Wrong with It?

The above code is fine for generating a modest sized matrix, but the A(i,j) = ... statement is quite slow when A is large and sparse, particularly when i and j are also scalars. The inner two for-loops can be vectorized so that i and j are vectors of length 8. Each A(i,j) = ... statement is assembling an entire finite-element matrix into A. However, this leads to very minimal improvement in run time.

type wathen1b
tic
A1b = wathen1b (200,200) ;
toc
function A = wathen1b (nx,ny)
rand('state',0)
e1 = [6 -6 2 -8;-6 32 -6 20;2 -6 6 -6;-8 20 -6 32];
e2 = [3 -8 2 -6;-8 16 -8 20;2 -8 3 -8;-6 20 -8 16];
e = [e1 e2; e2' e1]/45;
n = 3*nx*ny+2*nx+2*ny+1;
A = sparse(n,n);
RHO = 100*rand(nx,ny);
nn = zeros(8,1);
for j=1:ny
    for i=1:nx
        nn(1) = 3*j*nx+2*i+2*j+1;
        nn(2) = nn(1)-1;
        nn(3) = nn(2)-1;
        nn(4) = (3*j-1)*nx+2*j+i-1;
        nn(5) = 3*(j-1)*nx+2*i+2*j-3;
        nn(6) = nn(5)+1;
        nn(7) = nn(6)+1;
        nn(8) = nn(4)+1;
        em = e*RHO(i,j);
        A (nn,nn) = A (nn,nn) + em ;
    end
end

Elapsed time is 282.945593 seconds.
disp (norm (A-A1b,1))
     0

How MATLAB Stores Sparse Matrices

To understand why the above examples are so slow, you need to understand how MATLAB stores its sparse matrices. An n-by-n MATLAB sparse matrix is stored as three arrays; I'll call them p, i, and x. These three arrays are not directly accessible from M, but they can be accessed by a mexFunction. The nonzero entries in column j are stored in i(p(j):p(j+1)-1) and x(p(j):p(j+1)-1), where x holds the numerical values and i holds the corresponding row indices. Below is a very small example. First, I create a full matrix and convert it into a sparse one. This is only so that you can easily see the matrix C and how it's stored in sparse form. You should never create a sparse matrix this way, except for tiny examples.

C = [
4.5 0.0 3.2 0.0
3.1 2.9 0.0 0.9
0.0 1.7 3.0 0.0
3.5 0.4 0.0 1.0 ] ;
C = sparse (C)
C =
   (1,1)       4.5000
   (2,1)       3.1000
   (4,1)       3.5000
   (2,2)       2.9000
   (3,2)       1.7000
   (4,2)       0.4000
   (1,3)       3.2000
   (3,3)       3.0000
   (2,4)       0.9000
   (4,4)       1.0000

Notice that the nonzero entries in C are stored in column order, with sorted row indices. The internal p, i, and x arrays can be reconstructed as follows. The find(C) statement returns a list of "triplets," where the kth triplet is i(k), j(k), and x(k). This specifies that C(i(k),j(k)) is equal to x(k). Next, find(diff(...)) constructs the column pointer array p (this only works if there are no all-zero columns in the matrix).

[i j x] = find (C) ;
n = size(C,2) ;
p = find (diff ([0 ; j ; n+1])) ;
for col = 1:n
    fprintf ('column %d:\n    k      row index     value\n', col) ;
    disp ([(p(col):p(col+1)-1)' i(p(col):p(col+1)-1) x(p(col):p(col+1)-1)])
end
column 1:
    k      row index     value
    1.0000    1.0000    4.5000
    2.0000    2.0000    3.1000
    3.0000    4.0000    3.5000
column 2:
    k      row index     value
    4.0000    2.0000    2.9000
    5.0000    3.0000    1.7000
    6.0000    4.0000    0.4000
column 3:
    k      row index     value
    7.0000    1.0000    3.2000
    8.0000    3.0000    3.0000
column 4:
    k      row index     value
    9.0000    2.0000    0.9000
   10.0000    4.0000    1.0000

Now consider what happens when one entry is added to C:

C(3,1) = 42 ;

[i j x] = find (C) ;
n = size(C,2) ;
p = find (diff ([0 ; j ; n+1])) ;
for col = 1:n
    fprintf ('column %d:\n    k      row index     value\n', col) ;
    disp ([(p(col):p(col+1)-1)' i(p(col):p(col+1)-1) x(p(col):p(col+1)-1)])
end
column 1:
    k      row index     value
    1.0000    1.0000    4.5000
    2.0000    2.0000    3.1000
    3.0000    3.0000   42.0000
    4.0000    4.0000    3.5000
column 2:
    k      row index     value
    5.0000    2.0000    2.9000
    6.0000    3.0000    1.7000
    7.0000    4.0000    0.4000
column 3:
    k      row index     value
    8.0000    1.0000    3.2000
    9.0000    3.0000    3.0000
column 4:
    k      row index     value
   10.0000    2.0000    0.9000
   11.0000    4.0000    1.0000

and you can see that nearly every entry in C has been moved down by one in the i and x arrays. In general, the single statement C(3,1)=42 takes time proportional to the number of entries in matrix. Thus, looping nnz(A) times over the statement A(i,j)=A(i,j)+... takes time proportional to nnz(A)^2.

A Better Way to Create a Finite-Element Matrix

The version below is only slightly different. It could be improved, but I left it nearly the same to illustrate how simple it is to write fast MATLAB code to solve this problem, via a minor tweak. The idea is to create a list of triplets, and let MATLAB convert them into a sparse matrix all at once. If there are duplicates (which a finite-element matrix always has) the duplicates are summed, which is exactly what you want when assembling a finite-element matrix. In MATLAB 7.3 (R2006b), sparse uses quicksort, which takes nnz(A)*log(nnz(A)) time. This is slower than it could be (a linear-time bucket sort can be used, taking essentially nnz(A) time). However, it's still much faster than nnz(A)^2. For this matrix, nnz(A) is about 1.9 million.

type wathen2.m
tic
A2 = wathen2 (200,200) ;
toc
function A = wathen2 (nx,ny)
rand('state',0)
e1 = [6 -6 2 -8;-6 32 -6 20;2 -6 6 -6;-8 20 -6 32];
e2 = [3 -8 2 -6;-8 16 -8 20;2 -8 3 -8;-6 20 -8 16];
e = [e1 e2; e2' e1]/45;
n = 3*nx*ny+2*nx+2*ny+1;
ntriplets = nx*ny*64 ;
I = zeros (ntriplets, 1) ;
J = zeros (ntriplets, 1) ;
X = zeros (ntriplets, 1) ;
ntriplets = 0 ;
RHO = 100*rand(nx,ny);
nn = zeros(8,1);
for j=1:ny
    for i=1:nx
        nn(1) = 3*j*nx+2*i+2*j+1;
        nn(2) = nn(1)-1;
        nn(3) = nn(2)-1;
        nn(4) = (3*j-1)*nx+2*j+i-1;
        nn(5) = 3*(j-1)*nx+2*i+2*j-3;
        nn(6) = nn(5)+1;
        nn(7) = nn(6)+1;
        nn(8) = nn(4)+1;
        em = e*RHO(i,j);
        for krow=1:8
            for kcol=1:8
                ntriplets = ntriplets + 1 ;
                I (ntriplets) = nn (krow) ;
                J (ntriplets) = nn (kcol) ;
                X (ntriplets) = em (krow,kcol) ;
            end
        end
    end
end
A = sparse (I,J,X,n,n) ;

Elapsed time is 1.594073 seconds.
disp (norm (A-A2,1))
  1.4211e-014

If you do not know how many entries your matrix will have, you may not be able to preallocate the I, J, and X arrays, as done in wathen2.m. In that case, start them at a reasonable size (anything larger than zero will do) and add this code to the innermost loop, just after ntriplets is incremented:

    len = length (X) ;
    if (ntriplets > len)
       I (2*len) = 0 ;
       J (2*len) = 0 ;
       X (2*len) = 0 ;
    end

and when done, use sparse(I(1:ntriplets),J(1:ntriplets),X(1:ntriplets),n,n).

Moral: Do Not Abuse A(i,j)=... for Sparse A; Use sparse Instead

Avoid statements such as

C(4,2) = C(4,2) + 42 ;

in a loop. Create a list of triplets (i,j,x) and use sparse instead. This advice holds for any sparse matrix, not just finite-element ones.

Try a Faster Sparse Function

CHOLMOD includes a sparse2 mexFunction which is a replacement for sparse. It uses a linear-time bucket sort. The MATLAB 7.3 (R2006b) sparse accounts for about 3/4ths the total run time of wathen2.m. For this matrix sparse2 in CHOLMOD is about 10 times faster than the MATLAB sparse. CHOLMOD can be found in the SuiteSparse package, in MATLAB Central.

If you would like to see a short and concise C mexFunction implementation of the method used by sparse2, take a look at CSparse, which was written for a concise textbook-style presentation. The cs_sparse mexFunction uses cs_compress.c, to convert the triplets to a compressed column form of A', cs_dupl.c to sum up duplicate entries, and then cs_transpose.c to transpose the result (which also sorts the columns).

When Fast is Not Fast Enough...

Even with this dramatic improvement in constructing the matrix A, MATLAB could still use additional features for faster construction of sparse finite-element matrices. Constructing the matrix should be much faster than x=A\b, since chol is doing about 700 times more work as sparse for this matrix (1.3 billion flops, vs 1.9 million nonzeros in A). The run times are not that different, however:

tic
A = wathen2 (200,200) ;
toc
b = rand (size(A,1),1) ;
tic
x=A\b ;
toc
Elapsed time is 1.720397 seconds.
Elapsed time is 3.125791 seconds.


Get the MATLAB code

Published with MATLAB® 7.4

26 Responses to “Creating Sparse Finite-Element Matrices in MATLAB”

  1. Tom Richardson replied on :

    You suggested that users do effecient dynamic allocation with the code snippet

    len = length (X) ;
    if (ntriplets > len)
    I (2*len) = 0 ;

    What are the pros and cons of having Matlab arrays do this automatically, like C++ std::vector?

  2. David Heintz replied on :

    Hi,

    I added the following line in ‘wathen2′ after the nested loops

    mymat = sortrows([ I J X ], [ 2 1 ]);

    and then created the sparse matrix

    A = sparse(mymat(:, 1), mymat(:, 2), mymat(:, 3), n, n);

    as suggested.

    I was curious if sorting the index vectors would affect performance (although one might expect ’sparse’ to do something akin to this by default).

    I timed the original and modified codes for 10 consecutive runs and got a surprising (?) result:

    >> test
    Elapsed time is 45.761231 seconds.
    >> test
    Elapsed time is 21.771210 seconds.

    The code using sorted index vectors were about twice as fast.

    Any comments on this behavior?

    Best regards,
    David

  3. Eric Sampson replied on :

    David,

    I tried your suggested change. In both MATLAB 7.3 (R2006b) and 7.4 (R2007a), the original non-sorted version was twice as fast as the version with the sort.

    What version of MATLAB and platform are you running the tests with?

    Regards,
    Eric Sampson
    The MathWorks,Inc.

  4. David Heintz replied on :

    Hi Eric,

    I can reproduce these times (or similar) in MATLAB-versions R2006a and R2006b.

    My platform is an Intel Pentium 4 660 Prescott (3.6 GHz), running Fedora Core 6, so I use the 64-bit version of MATLAB for Linux.

    In my runs (R2006a) a sort using ’sortrows’ takes almost 1.5 s, followed by ’sparse’ which barley needs 0.3 s. Without presorting ’sparse’ takes about 3.9 s to complete instead.

    I can provide my function (’wathen3.m’) and script (’test.m’), should it be of any interest to you.

    I tried my sorting recipe in my FE-code, which provided a decent speedup for larger matrices.

    Best regards,
    David

  5. David Heintz replied on :

    I can confirm that the situation is reversed, i.e., like you described it with non-sorting being faster, on a 32-bit system using R2006b (AMD Mobile CPU, running Ubuntu Linux 6.10).

    Why I got the opposite before—which indeed I did—bets me.

    Best regards,
    David

  6. Tim Davis replied on :

    Reply to Tom Richardson:

    If you allow MATLAB to grow the size of the I, J,
    and X arrays, then they will grow by just one entry
    each iteration, and the total work will be quadratic
    (nnz(A)^2). The method I propose doubles the size
    of I, J and X, and the total work will be O(nnz(A)),
    using the ‘dynamic table’ method.

    I’m not a C++ expert, but looking over the documentation
    of the std::vector class, it seems that this class
    makes a distinction between the size of the vector
    (size(X,1) in MATLAB), and the “capacity” of the vector.
    The capacity is how much extra space has been allocated,
    in “elbow room” that allows the vector to grow in size
    without reallocating memory. MATLAB does not have any
    distinction between “size” and “capacity”, with the
    hard-to-use exception of the difference between nnz(A)
    and nzmax(A) for a sparse A.

    What you are really asking for is for MATLAB to take
    O(n) for the following loop:

    n = 1e5 ;

    tic
    x = 0 ;
    for k = 1:n
    x (k) = k ;
    end
    toc

    (elapsed time: 90 seconds)

    To do so, MATLAB would have to implement some notion
    of “capacity”, and allocate more space than needed
    when an entry outside the current size if x is accessed.
    I don’t think that’s necessary, when the following will
    work just as well (with a fprintf added to see what’s
    happening):

    y = 0 ;
    tic
    for k = 1:n
    if (k > length (y))
    y (2 * length (y)) = 0 ;
    fprintf (’k %d, y now has length %d\n’, k, length (y)) ;
    end
    y (k) = k ;
    end
    y = y (1:n) ;
    toc

    (elapsed time: 0.7 seconds, even with the printf’s).
    x and y are now equal.

    The output is:

    k 2, y now has length 2
    k 3, y now has length 4
    k 5, y now has length 8
    k 9, y now has length 16
    k 17, y now has length 32
    k 33, y now has length 64
    k 65, y now has length 128
    k 129, y now has length 256
    k 257, y now has length 512
    k 513, y now has length 1024
    k 1025, y now has length 2048
    k 2049, y now has length 4096
    k 4097, y now has length 8192
    k 8193, y now has length 16384
    k 16385, y now has length 32768
    k 32769, y now has length 65536
    k 65537, y now has length 131072

    If the time to reallocate y is equal to the size of y,
    then the total work is a sum of powers of two, or 2*n total.

    The same thing can be done with a C++ std::vector, if you
    resize the vector by doubling its capacity anytime its
    capacity is insufficient.

    Reference: Cormen, Leiserson, Rivest, “Introduction to
    Algorithms”, MIT Press, section 18.4 on Dynamic Tables.

  7. Tim Davis replied on :

    Regarding the sortrows issue: in MATLAB 7.4 (released today) and in all previous versions of MATLAB with sparse matrices, “sparse” uses quicksort to sort the triplets, taking O(K*log(K)) time to do so, if it has K triplets. If the triplets are already sorted on input, you can tell from your runtimes that this initial sort is skipped. I’m speaking in generalities, about things that can be observed from “sparse” just by looking at its run time. I’m not looking at the code. You can measure it to be O(K*log(K)) for creating a matrix from K triplets (if the triplets are sorted), and O(K) otherwise. Quite a bit can be determined about how an algorithm works with just tic and toc, without looking at the code.

    The CHOLMOD “sparse2″ is asymptotically faster than the “sparse” in MATLAB; it’s about 10 times faster for this wathen(200,200) matrix. sparse2 uses a linear-time bucket sort. Since it’s so fast, it doesn’t bother to check if the input is sorted or not.

    When I, J, X are not sorted, “sparse” takes 3.0 seconds on my machine. sortrows takes 2.4 seconds, and then “sparse” on the result takes 0.5 seconds. The total time is about the same.

    Using sparse2 on the unsorted I,J,X takes 0.4 seconds; it takes the same time on the sorted I,J,X.

    These timings are on a 32-bit Pentium 4, with a 32-bit MATLAB. I don’t yet have a 64-bit port for any of my sparse mexFunctions, because The MathWorks has made it quite difficult to write a portable sparse mexFunction that seemlessly works on 32-bit/64-bit, and current/older versions of MATLAB. My codes do work on 64-bit MATLAB (x=A\b works), but not my own mexFunctions. So I can’t tell you how fast “sparse2″ is on a 64-bit MATLAB.

    The fact that you’re seeing an improvement in “sparse” by sorting the triplets first is really an artifact of the “sparse” algorithm itself. A better algorithm (”sparse2″) is unaffected by the initial order of the triplets. If The MathWorks were to switch to the algorithm used by “sparse2″ then using “sortrows” first would actually slow your code down, since sortrows must take O(K*log(k)) time.

  8. Tim Davis replied on :

    I suppose the reason that sortrows+sparse is faster than sparse on a 64-bit platform, is that internally, sparse is dealing with a 64-bit integer. sortrows is dealing with a 64-bit floating-point value. Perhaps 64-bit integer comparisons are slower than 64-bit fl.pt. comparisons. This is just a guess.

  9. Tim Davis replied on :

    Oops. I got it backwards in post #7. “You can measure it to be O(K*log(K)) for creating a matrix from K triplets (if the triplets are sorted), and O(K) otherwise.” is backwards.

    It should be “O(K log (K)) if the triplets are UNsorted, and O(K) otherwise.”

  10. Tim Davis replied on :

    In my original post, I should have cited the paper by John Gilbert, Cleve Moler, and Rob Schreiber, “Sparse matrices in MATLAB”, SIAM J. Matrix Analysis & Applications, vol 13, 1992, pp. 333-356. They clearly point out the problem that A(i,j)=B and B=A(i,j) are both intrinsically slow. This is not a “bug” in MATLAB; it is intrinsically impossible to provide a data structure that does all things well. If MATLAB were to use a data structure that made A(i,j)=B fast, then (say), C=A*B where A and B are sparse, or x=A\b, would be horribly slow. Take your pick; I think John, Cleve, and Rob made the right choice.

    You can read their paper at http://www.mathworks.com/access/helpdesk/help/pdf_doc/otherdocs/simax.pdf

    Half tongue-in-cheek … but perhaps mlint could flag A(i,j)=B and B=A(i,j) usages and ask “Please read the Gilbert/Moler/Schreiber paper, page 10, Section 3.1.4; after reading it click OK to continue” …

  11. David Heintz replied on :

    First I would like to thank Tim for all help his blog post has provided me with… thank you!

    Next, as you might have guessed (?), I’ll ask for yet some advice:

    The problem is my index vectors, should we call them ‘row’, ‘col’ and ‘val’, grow too large. In fact that large I run out of RAM (3 x 16e6 x 8 / 1024^2 ~ 366 MB).

    Alas, any uint-format seems incompatible with ’sparse’ or ’sparse2′, which otherwise would have brought down memory consumption. I suppose another remedy could be—besides the obvious one of changing to improved hardware—to simply update the sparse matrix once the index vectors start draining memory (that way they could be reset and then the FE assembly process allowed to carry on). But that would, in some sense, be out of line with the spirit of this discussion … :)

    So, finally, my question becomes if it is possible to sum duplicate entries of the index vectors instead, i.e., if

    row = [1 2 2 4 4 5]
    col = [2 4 4 3 3 1]
    val = [1 2 3 4 5 6]

    could be transferred into

    row = [1 2 4 5]
    col = [2 4 3 1]
    val = [1 5 9 6]

    in an efficient manner? Could that prove better performing than updating the sparse matrix?

    This is only a matter for large scale problems, but still something to consider, I presume.

  12. David Heintz replied on :

    The line below seems to do the work, but I have not put it to any serious test:

    [row, col, val] = find(accumarray([row, col], val, [], @sum, [], true));

    Maybe it is all inefficient.

  13. Etienne Non replied on :

    Hi!
    I’m trying to understand why the Matlab function LU.m takes almost 20 times more time to factorize a sparse matrix with nnz=784418 than another one with nnz=1238796, the two matrices being numerically the same as showed by the code below
    >> [IindSCL,JindSCL,SnzSCL,nbrequaSCL]=Stokes3D_SCL(8,8,1,7,0,1);
    >> [Iindcor,Jindcor,Snzcor,nbrequacor]=Stokes3Dcor(8,8,1,1,1,0,1);
    >> a=(IindSCL==Iindcor);b=(JindSCL==Jindcor);c=(nbrequaSCL==nbrequacor);
    >> find(a==0)

    ans =

    Empty matrix: 0-by-1

    >> find(b==0), find(c==0)

    ans =

    Empty matrix: 0-by-1

    ans =

    []

    >> max(max(abs(SnzSCL-Snzcor)))

    ans =

    3.552713678800501e-15

    >> tic;matSCL=sparse2(IindSCL,JindSCL,SnzSCL,nbrequaSCL+1,nbrequaSCL+1);toc
    Elapsed time is 0.993688 seconds.
    >> tic;matcor=sparse2(Iindcor,Jindcor,Snzcor,nbrequacor+1,nbrequacor+1);toc
    Elapsed time is 0.961044 seconds.
    >> tic;matSCL=sparse(IindSCL,JindSCL,SnzSCL,nbrequaSCL+1,nbrequaSCL+1);toc
    Elapsed time is 1.882607 seconds.
    >> tic;matcor=sparse(Iindcor,Jindcor,Snzcor,nbrequacor+1,nbrequacor+1);toc
    Elapsed time is 1.893259 seconds.
    >>
    >>
    >> tic;[L_SCL,U_SCL,P_SCL,Q_SCL,R_SCL]=lu(matSCL);toc
    Elapsed time is 4.642097 seconds.
    >> tic;[L_cor,U_cor,P_cor,Q_cor,R_cor]=lu(matcor);toc
    Elapsed time is 84.450131 seconds.
    >>
    >>
    >> nnz(matSCL),nnz(matcor)

    ans =

    1238796

    ans =

    784418

    >> max(max(abs(matSCL-matcor)))

    ans =

    (1,1) 3.552713678800501e-15

    >>

    May be someone could help

  14. Loren replied on :

    Here’s Cleve’s reply to Etienne:

    The crucial factor is the number and location of the nonzero elements in the LU factors, not in the original matrices. This doesn’t show us the original matrices. “Numerically the same” apparently means the difference is small. That’s not enough. The magnitude of the nonzeros is irrelevant; it’s their existence that matters.
    — Cleve

  15. Roland replied on :

    Hi,
    I’m a newbie in Matlab (R 2007b, Win XP) and tried to implement the way of assembly shown in wathen2.m for my own FE code. Unfortunately, I could not gain any improvements using “wathen2.m” compared to the way used in “wathen1.m” (the way I wrote my code originally); with “wathen2.m” my code becomes even slower (factor > 10). What my problem distinguishes from the one shown here is that I consider a 9node finite plate element with up to 20 degrees of freedom per node.
    Following the information given by the profiler most time is spent for building the three vectors that comprise the sparse matrix subsequently. And what seems strange to me is that while running wathen2.m the statements
    ntriplets = ntriplets + 1 ;
    I (ntriplets) = nn (krow) ;
    need almost the same overall time, in my code the latter takes 100 times the time of the former although preallocation, etc. is the same.

    Maybe someone has any idea what could be the problem.
    Thanks in advance.

    Roland

  16. Tim Davis replied on :

    Roland,

    You need to pre-allocate the I, J, and X matrices so that they are large enough to all the entries.

  17. Tim Davis replied on :

    Etienne,

    Cleve is correct. Do not compare the numerical values of the two matrices. To compare two matrices A and B, look at spones(A) and spones(B). If they differ, the LU factorization can differ.

    Even tiny changes in the pattern can make a huge difference. Inside LU, backslash, chol, etc, is a heuristic that permutes the matrix to reduce fill-in. It’s a heuristic because to find the minimum fill is an NP-hard problem, and would take time exceeding the lifetime of the universe to solve exactly. I think you want your answer before that…. The heursistic can go down a different “path” and find very different fill-in, even with small changes in pattern.

    Even with an uncomputable, perfectly minimum fill ordering, tiny changes in the pattern can lead to huge changes in nnz(L+U).

  18. Etienne Non replied on :

    Thanks guys!!

  19. Phil replied on :

    Echoing David Heintz in comment 11, why does sparse require the index vectors to be doubles?
    On a 32 bit machine, this seems to cause the 2 index vectors and the entry vector to take (8 + 8 + 8)*n_entries bytes rather than (4 + 4 + 8)*n_entries bytes. That’s a 50% increase in memory usage! Plus, sparse must be doing some kind of double to integer conversion anyway.

  20. Pat Quillen replied on :

    Phil,

    Requiring index vectors to be double is a throwback to the days of “everything in MATLAB is a double”. Indeed, our default type for all index vectors is double (take the output of find for example). We’ll take it as an enhancement request to allow sparse to take vectors of an integral type as inputs.

  21. Dave replied on :

    Hi,
    Great stuff! Can anyone point me to something like sparse2 for a poor fortran (90) programmer…? In particular, I need a fast routine to convert a triplet form matrix to compressed sparse row format.
    I would greatly appreciate any suggestions!

  22. Xuexin Liu replied on :

    Dear Madam,

    Thank you for your posts! In this post, you gave us the advice “Moral: Do Not Abuse A(i,j)=… for Sparse A; Use sparse Instead”. My question is that can we always finish our job without using A(i,j)=… ?

    Thanks!

    And below are the details of my questions:

    In the formulation of a dynamic system equation (one example is a circuit containing resistors, capacitors, and inductors), the entries in the sparse matrix may be changed when a new device is loaded (the devices are loaded from a net list specified the interconnection of them). And after all the devices in the circuit are loaded, the program will solve the equation using the sparse matrix.

    In this case, I haven’t figured out a way without using assignment such as
    C(i,j) = C(i,j) + new_param;

    What do you think? Thanks in advance for your advices!

  23. Tim Davis replied on :

    Reply to Xuexin Liu: if you are changing just a few entries, then C(i,j)=C(i,j)+new_param is fine. If you are changing a lot of entries, then create a matrix New_param (for example) with all the changes, using the method described here. Then do C=C+New_param to make all the changes at once. This method will be a lot faster than using the C(i,j)=C(i,j)+new_param inside a loop.

  24. Tim Davis replied on :

    p.s. If you are solving x=A\b where A comes from a circuit matrix, you’ll be using UMFPACK (normally) or CHOLMOD (if A is symmetric positive definite. Both of those are my codes. A\b might also use MA57 (from Iain Duff) if it’s symmetric indefinite, or a banded solver if A is a band matrix.

    However, I also have another package that’s specifically designed for circuit matrices. It’s much faster than x=A\b (UMFPACK, CHOLMOD, etc). It’s not a part of A\b in MATLAB.

    See KLU, in SuiteSparse, at http://www.cise.ufl.edu/research/sparse (just download all of SuiteSparse; KLU is part of it). Using KLU with x=klu(A,’\',b) can be many times faster than x=A\b for circuit matrices.

  25. Aman P.V replied on :

    Hi all,

    I have a small doubt. If use sparse command with a repeated indices what will happen? will that sum up or just replace?
    As all of you know for a same position in the Global stiffness matrix we;ll have contribution from different elements and we’ll have to sum them up.
    Any one please clarify my doubt.

  26. Loren replied on :

    Aman-

    From the help

    “S = sparse(i,j,s,m,n,nzmax) uses vectors i, j, and s to generate an m-by-n sparse matrix such that S(i(k),j(k)) = s(k), with space allocated for nzmax nonzeros. Vectors i, j, and s are all the same length. Any elements of s that are zero are ignored, along with the corresponding values of i and j. Any elements of s that have duplicate values of i and j are added together. “

    Values at repeated indices are summed together.

    –Loren

Leave a Reply

Wrap code fragments inside <pre> tags, like this:

<pre class="code">
a = magic(3);
sum(a)
</pre>

If you have a "<" character in your code, either follow it with a space or replace it with "&lt;" (including the semicolon).


Loren Shure works on design of the MATLAB language at The MathWorks. She writes here about once a week on MATLAB programming and related topics.

  • Jun: I totally can not believe it, Loren. You are really helpful. Thank you so much, MATLAB master!
  • Loren: Wow folks- Always lots of interest when there’s a quickie to try out! I will only make 2 general...
  • Loren: Jun- ismember is your friend here: >> [aa,ind] = ismember(Array2,Arra y1) aa = 1 1 1 1 1 1 1 ind = 1 2 1 4 4 3...
  • Dan: I like the first way better than the second way. Combining the arrays into one and running any is nice, although...
  • James Myatt: How about I = (a == 0 | b == 0); a(I) = []; b(I) = [];
  • Tunc: Hello Loren, love your blog because of such inspiring and challenging comments to such ’small’...
  • Pekka Kumpulainen: Here is my tradeoff. I usually want to keep the original variables as they are most probably...
  • Iain: Followup: Of course, to allow NaNs (counting them as non-zero): mask = (a~=0) & (b~=0); The mask says “a...
  • Matt Fig: I would usually go with something like this: y = a&b; x = a(y); y = b(y); But I was surprised to find...
  • kk: c=all([a;b]) a(c) a(b)

These postings are the author's and don't necessarily represent the opinions of The MathWorks.