# Creating Sparse Finite-Element Matrices in MATLAB49

Posted by Loren Shure,

I'm pleased to introduce Tim Davis as this week's guest blogger. Tim is a professor at the University of Florida, and is the
author or co-author of many of our sparse matrix functions (lu, chol, much of sparse backslash, ordering methods such as amd
and colamd, and other functions such as etree and symbfact). He is also the author of a recent book, Direct Methods for Sparse Linear Systems, published by SIAM, where more details of MATLAB sparse matrices are discussed ( http://www.cise.ufl.edu/~davis ).

### MATLAB is Slow? Only If You Use It Incorrectly

From time to time, I hear comments such as "MATLAB is slow for large finite-element problems." When I look at the details,
the problem is typically due to an overuse of A(i,j)= ... when creating the sparse matrix (assembling the finite elements). This can be seen in typical user's code, MATLAB code in
books on the topic, and even in MATLAB itself. The problem is widespread. MATLAB can be very fast for finite-element problems,
but not if it's used incorrectly.

### How Not to Create a Finite-Element Matrix

A good example of what not to do can be found in the wathen.m function, in MATLAB. A=gallery('wathen',200,200) takes a huge amount of time; a very minor modification cuts the run time drastically. I'm not intending to single out this
one function for critique; this is a very common issue that I see over and over again. This function is built into MATLAB,
which makes it an accessible example. It was first written when MATLAB did not support sparse matrices, and was modified
only slightly to exploit sparsity. It was never meant for generating large sparse finite-element matrices. You can see the
entire function with the command:

  type private/wathen.m

Below is an excerpt of the relevant parts of wathen.m. The function wathen1.m creates a finite-element matrix of an nx-by-ny mesh. Each i loop creates a single 8-by-8 finite-element matrix, and adds it into A.

type wathen1
tic
A = wathen1 (200,200) ;
toc
function A = wathen1 (nx,ny)
rand('state',0)
e1 = [6 -6 2 -8;-6 32 -6 20;2 -6 6 -6;-8 20 -6 32];
e2 = [3 -8 2 -6;-8 16 -8 20;2 -8 3 -8;-6 20 -8 16];
e = [e1 e2; e2' e1]/45;
n = 3*nx*ny+2*nx+2*ny+1;
A = sparse(n,n);
RHO = 100*rand(nx,ny);
nn = zeros(8,1);
for j=1:ny
for i=1:nx
nn(1) = 3*j*nx+2*i+2*j+1;
nn(2) = nn(1)-1;
nn(3) = nn(2)-1;
nn(4) = (3*j-1)*nx+2*j+i-1;
nn(5) = 3*(j-1)*nx+2*i+2*j-3;
nn(6) = nn(5)+1;
nn(7) = nn(6)+1;
nn(8) = nn(4)+1;
em = e*RHO(i,j);
for krow=1:8
for kcol=1:8
A(nn(krow),nn(kcol)) = A(nn(krow),nn(kcol))+em(krow,kcol);
end
end
end
end

Elapsed time is 305.832709 seconds.


### What's Wrong with It?

The above code is fine for generating a modest sized matrix, but the A(i,j) = ... statement is quite slow when A is large and sparse, particularly when i and j are also scalars. The inner two for-loops can be vectorized so that i and j are vectors of length 8. Each A(i,j) = ... statement is assembling an entire finite-element matrix into A. However, this leads to very minimal improvement in run time.

type wathen1b
tic
A1b = wathen1b (200,200) ;
toc
function A = wathen1b (nx,ny)
rand('state',0)
e1 = [6 -6 2 -8;-6 32 -6 20;2 -6 6 -6;-8 20 -6 32];
e2 = [3 -8 2 -6;-8 16 -8 20;2 -8 3 -8;-6 20 -8 16];
e = [e1 e2; e2' e1]/45;
n = 3*nx*ny+2*nx+2*ny+1;
A = sparse(n,n);
RHO = 100*rand(nx,ny);
nn = zeros(8,1);
for j=1:ny
for i=1:nx
nn(1) = 3*j*nx+2*i+2*j+1;
nn(2) = nn(1)-1;
nn(3) = nn(2)-1;
nn(4) = (3*j-1)*nx+2*j+i-1;
nn(5) = 3*(j-1)*nx+2*i+2*j-3;
nn(6) = nn(5)+1;
nn(7) = nn(6)+1;
nn(8) = nn(4)+1;
em = e*RHO(i,j);
A (nn,nn) = A (nn,nn) + em ;
end
end

Elapsed time is 282.945593 seconds.

disp (norm (A-A1b,1))
     0


### How MATLAB Stores Sparse Matrices

To understand why the above examples are so slow, you need to understand how MATLAB stores its sparse matrices. An n-by-n
MATLAB sparse matrix is stored as three arrays; I'll call them p, i, and x. These three arrays are not directly accessible from M, but they can be accessed by a mexFunction. The nonzero entries in
column j are stored in i(p(j):p(j+1)-1) and x(p(j):p(j+1)-1), where x holds the numerical values and i holds the corresponding row indices. Below is a very small example. First, I create a full matrix and convert it into a sparse one. This is only so that you can easily see the matrix C and how it's stored in sparse form. You should never create a sparse matrix this way, except for tiny examples.

C = [
4.5 0.0 3.2 0.0
3.1 2.9 0.0 0.9
0.0 1.7 3.0 0.0
3.5 0.4 0.0 1.0 ] ;
C = sparse (C)
C =
(1,1)       4.5000
(2,1)       3.1000
(4,1)       3.5000
(2,2)       2.9000
(3,2)       1.7000
(4,2)       0.4000
(1,3)       3.2000
(3,3)       3.0000
(2,4)       0.9000
(4,4)       1.0000


Notice that the nonzero entries in C are stored in column order, with sorted row indices. The internal p, i, and x arrays can be reconstructed as follows. The find(C) statement returns a list of "triplets," where the kth triplet is i(k), j(k), and x(k). This specifies that C(i(k),j(k)) is equal to x(k). Next, find(diff(...)) constructs the column pointer array p (this only works if there are no all-zero columns in the matrix).

[i j x] = find (C) ;
n = size(C,2) ;
p = find (diff ([0 ; j ; n+1])) ;
for col = 1:n
fprintf ('column %d:\n    k      row index     value\n', col) ;
disp ([(p(col):p(col+1)-1)' i(p(col):p(col+1)-1) x(p(col):p(col+1)-1)])
end
column 1:
k      row index     value
1.0000    1.0000    4.5000
2.0000    2.0000    3.1000
3.0000    4.0000    3.5000
column 2:
k      row index     value
4.0000    2.0000    2.9000
5.0000    3.0000    1.7000
6.0000    4.0000    0.4000
column 3:
k      row index     value
7.0000    1.0000    3.2000
8.0000    3.0000    3.0000
column 4:
k      row index     value
9.0000    2.0000    0.9000
10.0000    4.0000    1.0000


Now consider what happens when one entry is added to C:

C(3,1) = 42 ;

[i j x] = find (C) ;
n = size(C,2) ;
p = find (diff ([0 ; j ; n+1])) ;
for col = 1:n
fprintf ('column %d:\n    k      row index     value\n', col) ;
disp ([(p(col):p(col+1)-1)' i(p(col):p(col+1)-1) x(p(col):p(col+1)-1)])
end
column 1:
k      row index     value
1.0000    1.0000    4.5000
2.0000    2.0000    3.1000
3.0000    3.0000   42.0000
4.0000    4.0000    3.5000
column 2:
k      row index     value
5.0000    2.0000    2.9000
6.0000    3.0000    1.7000
7.0000    4.0000    0.4000
column 3:
k      row index     value
8.0000    1.0000    3.2000
9.0000    3.0000    3.0000
column 4:
k      row index     value
10.0000    2.0000    0.9000
11.0000    4.0000    1.0000


and you can see that nearly every entry in C has been moved down by one in the i and x arrays. In general, the single statement C(3,1)=42 takes time proportional to the number of entries in matrix. Thus, looping nnz(A) times over the statement A(i,j)=A(i,j)+... takes time proportional to nnz(A)^2.

### A Better Way to Create a Finite-Element Matrix

The version below is only slightly different. It could be improved, but I left it nearly the same to illustrate how simple
it is to write fast MATLAB code to solve this problem, via a minor tweak. The idea is to create a list of triplets, and let
MATLAB convert them into a sparse matrix all at once. If there are duplicates (which a finite-element matrix always has)
the duplicates are summed, which is exactly what you want when assembling a finite-element matrix. In MATLAB 7.3 (R2006b),
sparse uses quicksort, which takes nnz(A)*log(nnz(A)) time. This is slower than it could be (a linear-time bucket sort can be used, taking essentially nnz(A) time). However, it's still much faster than nnz(A)^2. For this matrix, nnz(A) is about 1.9 million.

type wathen2.m
tic
A2 = wathen2 (200,200) ;
toc
function A = wathen2 (nx,ny)
rand('state',0)
e1 = [6 -6 2 -8;-6 32 -6 20;2 -6 6 -6;-8 20 -6 32];
e2 = [3 -8 2 -6;-8 16 -8 20;2 -8 3 -8;-6 20 -8 16];
e = [e1 e2; e2' e1]/45;
n = 3*nx*ny+2*nx+2*ny+1;
ntriplets = nx*ny*64 ;
I = zeros (ntriplets, 1) ;
J = zeros (ntriplets, 1) ;
X = zeros (ntriplets, 1) ;
ntriplets = 0 ;
RHO = 100*rand(nx,ny);
nn = zeros(8,1);
for j=1:ny
for i=1:nx
nn(1) = 3*j*nx+2*i+2*j+1;
nn(2) = nn(1)-1;
nn(3) = nn(2)-1;
nn(4) = (3*j-1)*nx+2*j+i-1;
nn(5) = 3*(j-1)*nx+2*i+2*j-3;
nn(6) = nn(5)+1;
nn(7) = nn(6)+1;
nn(8) = nn(4)+1;
em = e*RHO(i,j);
for krow=1:8
for kcol=1:8
ntriplets = ntriplets + 1 ;
I (ntriplets) = nn (krow) ;
J (ntriplets) = nn (kcol) ;
X (ntriplets) = em (krow,kcol) ;
end
end
end
end
A = sparse (I,J,X,n,n) ;

Elapsed time is 1.594073 seconds.

disp (norm (A-A2,1))
  1.4211e-014


If you do not know how many entries your matrix will have, you may not be able to preallocate the I, J, and X arrays, as done in wathen2.m. In that case, start them at a reasonable size (anything larger than zero will do) and add this code to the innermost loop,
just after ntriplets is incremented:

    len = length (X) ;
if (ntriplets > len)
I (2*len) = 0 ;
J (2*len) = 0 ;
X (2*len) = 0 ;
end

and when done, use sparse(I(1:ntriplets),J(1:ntriplets),X(1:ntriplets),n,n).

### Moral: Do Not Abuse A(i,j)=... for Sparse A; Use sparse Instead

Avoid statements such as

C(4,2) = C(4,2) + 42 ;

in a loop. Create a list of triplets (i,j,x) and use sparse instead. This advice holds for any sparse matrix, not just finite-element ones.

### Try a Faster Sparse Function

CHOLMOD includes a sparse2 mexFunction which is a replacement for sparse. It uses a linear-time bucket sort. The MATLAB 7.3 (R2006b) sparse accounts for about 3/4ths the total run time of wathen2.m. For this matrix sparse2 in CHOLMOD is about 10 times faster than the MATLAB sparse. CHOLMOD can be found in the SuiteSparse package, in MATLAB Central.

If you would like to see a short and concise C mexFunction implementation of the method used by sparse2, take a look at CSparse, which was written for a concise textbook-style presentation. The cs_sparse mexFunction uses cs_compress.c, to convert the triplets to a compressed column form of A', cs_dupl.c to sum up duplicate entries, and then cs_transpose.c to transpose the result (which also sorts the columns).

### When Fast is Not Fast Enough...

Even with this dramatic improvement in constructing the matrix A, MATLAB could still use additional features for faster construction of sparse finite-element matrices. Constructing the
matrix should be much faster than x=A\b, since chol is doing about 700 times more work as sparse for this matrix (1.3 billion flops, vs 1.9 million nonzeros in A). The run times are not that different, however:

tic
A = wathen2 (200,200) ;
toc
b = rand (size(A,1),1) ;
tic
x=A\b ;
toc
Elapsed time is 1.720397 seconds.
Elapsed time is 3.125791 seconds.


Get the MATLAB code

Published with MATLAB® 7.4

### Note

Tom Richardson replied on : 1 of 49

You suggested that users do effecient dynamic allocation with the code snippet

len = length (X) ;
if (ntriplets > len)
I (2*len) = 0 ;

What are the pros and cons of having Matlab arrays do this automatically, like C++ std::vector?

David Heintz replied on : 2 of 49

Hi,

I added the following line in ‘wathen2’ after the nested loops

mymat = sortrows([ I J X ], [ 2 1 ]);

and then created the sparse matrix

A = sparse(mymat(:, 1), mymat(:, 2), mymat(:, 3), n, n);

as suggested.

I was curious if sorting the index vectors would affect performance (although one might expect ‘sparse’ to do something akin to this by default).

I timed the original and modified codes for 10 consecutive runs and got a surprising (?) result:

>> test
Elapsed time is 45.761231 seconds.
>> test
Elapsed time is 21.771210 seconds.

The code using sorted index vectors were about twice as fast.

Best regards,
David

Eric Sampson replied on : 3 of 49

David,

I tried your suggested change. In both MATLAB 7.3 (R2006b) and 7.4 (R2007a), the original non-sorted version was twice as fast as the version with the sort.

What version of MATLAB and platform are you running the tests with?

Regards,
Eric Sampson
The MathWorks,Inc.

David Heintz replied on : 4 of 49

Hi Eric,

I can reproduce these times (or similar) in MATLAB-versions R2006a and R2006b.

My platform is an Intel Pentium 4 660 Prescott (3.6 GHz), running Fedora Core 6, so I use the 64-bit version of MATLAB for Linux.

In my runs (R2006a) a sort using ‘sortrows’ takes almost 1.5 s, followed by ‘sparse’ which barley needs 0.3 s. Without presorting ‘sparse’ takes about 3.9 s to complete instead.

I can provide my function (‘wathen3.m’) and script (‘test.m’), should it be of any interest to you.

I tried my sorting recipe in my FE-code, which provided a decent speedup for larger matrices.

Best regards,
David

David Heintz replied on : 5 of 49

I can confirm that the situation is reversed, i.e., like you described it with non-sorting being faster, on a 32-bit system using R2006b (AMD Mobile CPU, running Ubuntu Linux 6.10).

Why I got the opposite before—which indeed I did—bets me.

Best regards,
David

Tim Davis replied on : 6 of 49

If you allow MATLAB to grow the size of the I, J,
and X arrays, then they will grow by just one entry
each iteration, and the total work will be quadratic
(nnz(A)^2). The method I propose doubles the size
of I, J and X, and the total work will be O(nnz(A)),
using the ‘dynamic table’ method.

I’m not a C++ expert, but looking over the documentation
of the std::vector class, it seems that this class
makes a distinction between the size of the vector
(size(X,1) in MATLAB), and the “capacity” of the vector.
The capacity is how much extra space has been allocated,
in “elbow room” that allows the vector to grow in size
without reallocating memory. MATLAB does not have any
distinction between “size” and “capacity”, with the
hard-to-use exception of the difference between nnz(A)
and nzmax(A) for a sparse A.

What you are really asking for is for MATLAB to take
O(n) for the following loop:

n = 1e5 ;

tic
x = 0 ;
for k = 1:n
x (k) = k ;
end
toc

(elapsed time: 90 seconds)

To do so, MATLAB would have to implement some notion
of “capacity”, and allocate more space than needed
when an entry outside the current size if x is accessed.
I don’t think that’s necessary, when the following will
work just as well (with a fprintf added to see what’s
happening):

y = 0 ;
tic
for k = 1:n
if (k > length (y))
y (2 * length (y)) = 0 ;
fprintf (‘k %d, y now has length %d\n’, k, length (y)) ;
end
y (k) = k ;
end
y = y (1:n) ;
toc

(elapsed time: 0.7 seconds, even with the printf’s).
x and y are now equal.

The output is:

k 2, y now has length 2
k 3, y now has length 4
k 5, y now has length 8
k 9, y now has length 16
k 17, y now has length 32
k 33, y now has length 64
k 65, y now has length 128
k 129, y now has length 256
k 257, y now has length 512
k 513, y now has length 1024
k 1025, y now has length 2048
k 2049, y now has length 4096
k 4097, y now has length 8192
k 8193, y now has length 16384
k 16385, y now has length 32768
k 32769, y now has length 65536
k 65537, y now has length 131072

If the time to reallocate y is equal to the size of y,
then the total work is a sum of powers of two, or 2*n total.

The same thing can be done with a C++ std::vector, if you
resize the vector by doubling its capacity anytime its
capacity is insufficient.

Reference: Cormen, Leiserson, Rivest, “Introduction to
Algorithms”, MIT Press, section 18.4 on Dynamic Tables.

Tim Davis replied on : 7 of 49

Regarding the sortrows issue: in MATLAB 7.4 (released today) and in all previous versions of MATLAB with sparse matrices, “sparse” uses quicksort to sort the triplets, taking O(K*log(K)) time to do so, if it has K triplets. If the triplets are already sorted on input, you can tell from your runtimes that this initial sort is skipped. I’m speaking in generalities, about things that can be observed from “sparse” just by looking at its run time. I’m not looking at the code. You can measure it to be O(K*log(K)) for creating a matrix from K triplets (if the triplets are sorted), and O(K) otherwise. Quite a bit can be determined about how an algorithm works with just tic and toc, without looking at the code.

The CHOLMOD “sparse2” is asymptotically faster than the “sparse” in MATLAB; it’s about 10 times faster for this wathen(200,200) matrix. sparse2 uses a linear-time bucket sort. Since it’s so fast, it doesn’t bother to check if the input is sorted or not.

When I, J, X are not sorted, “sparse” takes 3.0 seconds on my machine. sortrows takes 2.4 seconds, and then “sparse” on the result takes 0.5 seconds. The total time is about the same.

Using sparse2 on the unsorted I,J,X takes 0.4 seconds; it takes the same time on the sorted I,J,X.

These timings are on a 32-bit Pentium 4, with a 32-bit MATLAB. I don’t yet have a 64-bit port for any of my sparse mexFunctions, because The MathWorks has made it quite difficult to write a portable sparse mexFunction that seemlessly works on 32-bit/64-bit, and current/older versions of MATLAB. My codes do work on 64-bit MATLAB (x=A\b works), but not my own mexFunctions. So I can’t tell you how fast “sparse2” is on a 64-bit MATLAB.

The fact that you’re seeing an improvement in “sparse” by sorting the triplets first is really an artifact of the “sparse” algorithm itself. A better algorithm (“sparse2”) is unaffected by the initial order of the triplets. If The MathWorks were to switch to the algorithm used by “sparse2” then using “sortrows” first would actually slow your code down, since sortrows must take O(K*log(k)) time.

Tim Davis replied on : 8 of 49

I suppose the reason that sortrows+sparse is faster than sparse on a 64-bit platform, is that internally, sparse is dealing with a 64-bit integer. sortrows is dealing with a 64-bit floating-point value. Perhaps 64-bit integer comparisons are slower than 64-bit fl.pt. comparisons. This is just a guess.

Tim Davis replied on : 9 of 49

Oops. I got it backwards in post #7. “You can measure it to be O(K*log(K)) for creating a matrix from K triplets (if the triplets are sorted), and O(K) otherwise.” is backwards.

It should be “O(K log (K)) if the triplets are UNsorted, and O(K) otherwise.”

Tim Davis replied on : 10 of 49

In my original post, I should have cited the paper by John Gilbert, Cleve Moler, and Rob Schreiber, “Sparse matrices in MATLAB”, SIAM J. Matrix Analysis & Applications, vol 13, 1992, pp. 333-356. They clearly point out the problem that A(i,j)=B and B=A(i,j) are both intrinsically slow. This is not a “bug” in MATLAB; it is intrinsically impossible to provide a data structure that does all things well. If MATLAB were to use a data structure that made A(i,j)=B fast, then (say), C=A*B where A and B are sparse, or x=A\b, would be horribly slow. Take your pick; I think John, Cleve, and Rob made the right choice.

You can read their paper at http://www.mathworks.com/access/helpdesk/help/pdf_doc/otherdocs/simax.pdf

Half tongue-in-cheek … but perhaps mlint could flag A(i,j)=B and B=A(i,j) usages and ask “Please read the Gilbert/Moler/Schreiber paper, page 10, Section 3.1.4; after reading it click OK to continue” …

David Heintz replied on : 11 of 49

First I would like to thank Tim for all help his blog post has provided me with… thank you!

Next, as you might have guessed (?), I’ll ask for yet some advice:

The problem is my index vectors, should we call them ‘row’, ‘col’ and ‘val’, grow too large. In fact that large I run out of RAM (3 x 16e6 x 8 / 1024^2 ~ 366 MB).

Alas, any uint-format seems incompatible with ‘sparse’ or ‘sparse2’, which otherwise would have brought down memory consumption. I suppose another remedy could be—besides the obvious one of changing to improved hardware—to simply update the sparse matrix once the index vectors start draining memory (that way they could be reset and then the FE assembly process allowed to carry on). But that would, in some sense, be out of line with the spirit of this discussion … :)

So, finally, my question becomes if it is possible to sum duplicate entries of the index vectors instead, i.e., if

row = [1 2 2 4 4 5]
col = [2 4 4 3 3 1]
val = [1 2 3 4 5 6]

could be transferred into

row = [1 2 4 5]
col = [2 4 3 1]
val = [1 5 9 6]

in an efficient manner? Could that prove better performing than updating the sparse matrix?

This is only a matter for large scale problems, but still something to consider, I presume.

David Heintz replied on : 12 of 49

The line below seems to do the work, but I have not put it to any serious test:

[row, col, val] = find(accumarray([row, col], val, [], @sum, [], true));

Maybe it is all inefficient.

Etienne Non replied on : 13 of 49

Hi!
I’m trying to understand why the Matlab function LU.m takes almost 20 times more time to factorize a sparse matrix with nnz=784418 than another one with nnz=1238796, the two matrices being numerically the same as showed by the code below
>> [IindSCL,JindSCL,SnzSCL,nbrequaSCL]=Stokes3D_SCL(8,8,1,7,0,1);
>> [Iindcor,Jindcor,Snzcor,nbrequacor]=Stokes3Dcor(8,8,1,1,1,0,1);
>> a=(IindSCL==Iindcor);b=(JindSCL==Jindcor);c=(nbrequaSCL==nbrequacor);
>> find(a==0)

ans =

Empty matrix: 0-by-1

>> find(b==0), find(c==0)

ans =

Empty matrix: 0-by-1

ans =

[]

>> max(max(abs(SnzSCL-Snzcor)))

ans =

3.552713678800501e-15

>> tic;matSCL=sparse2(IindSCL,JindSCL,SnzSCL,nbrequaSCL+1,nbrequaSCL+1);toc
Elapsed time is 0.993688 seconds.
>> tic;matcor=sparse2(Iindcor,Jindcor,Snzcor,nbrequacor+1,nbrequacor+1);toc
Elapsed time is 0.961044 seconds.
>> tic;matSCL=sparse(IindSCL,JindSCL,SnzSCL,nbrequaSCL+1,nbrequaSCL+1);toc
Elapsed time is 1.882607 seconds.
>> tic;matcor=sparse(Iindcor,Jindcor,Snzcor,nbrequacor+1,nbrequacor+1);toc
Elapsed time is 1.893259 seconds.
>>
>>
>> tic;[L_SCL,U_SCL,P_SCL,Q_SCL,R_SCL]=lu(matSCL);toc
Elapsed time is 4.642097 seconds.
>> tic;[L_cor,U_cor,P_cor,Q_cor,R_cor]=lu(matcor);toc
Elapsed time is 84.450131 seconds.
>>
>>
>> nnz(matSCL),nnz(matcor)

ans =

1238796

ans =

784418

>> max(max(abs(matSCL-matcor)))

ans =

(1,1) 3.552713678800501e-15

>>

May be someone could help

Loren replied on : 14 of 49

The crucial factor is the number and location of the nonzero elements in the LU factors, not in the original matrices. This doesn’t show us the original matrices. “Numerically the same” apparently means the difference is small. That’s not enough. The magnitude of the nonzeros is irrelevant; it’s their existence that matters.
— Cleve

Roland replied on : 15 of 49

Hi,
I’m a newbie in Matlab (R 2007b, Win XP) and tried to implement the way of assembly shown in wathen2.m for my own FE code. Unfortunately, I could not gain any improvements using “wathen2.m” compared to the way used in “wathen1.m” (the way I wrote my code originally); with “wathen2.m” my code becomes even slower (factor > 10). What my problem distinguishes from the one shown here is that I consider a 9node finite plate element with up to 20 degrees of freedom per node.
Following the information given by the profiler most time is spent for building the three vectors that comprise the sparse matrix subsequently. And what seems strange to me is that while running wathen2.m the statements
ntriplets = ntriplets + 1 ;
I (ntriplets) = nn (krow) ;
need almost the same overall time, in my code the latter takes 100 times the time of the former although preallocation, etc. is the same.

Maybe someone has any idea what could be the problem.

Roland

Tim Davis replied on : 16 of 49

Roland,

You need to pre-allocate the I, J, and X matrices so that they are large enough to all the entries.

Tim Davis replied on : 17 of 49

Etienne,

Cleve is correct. Do not compare the numerical values of the two matrices. To compare two matrices A and B, look at spones(A) and spones(B). If they differ, the LU factorization can differ.

Even tiny changes in the pattern can make a huge difference. Inside LU, backslash, chol, etc, is a heuristic that permutes the matrix to reduce fill-in. It’s a heuristic because to find the minimum fill is an NP-hard problem, and would take time exceeding the lifetime of the universe to solve exactly. I think you want your answer before that…. The heursistic can go down a different “path” and find very different fill-in, even with small changes in pattern.

Even with an uncomputable, perfectly minimum fill ordering, tiny changes in the pattern can lead to huge changes in nnz(L+U).

Etienne Non replied on : 18 of 49

Thanks guys!!

Phil replied on : 19 of 49

Echoing David Heintz in comment 11, why does sparse require the index vectors to be doubles?
On a 32 bit machine, this seems to cause the 2 index vectors and the entry vector to take (8 + 8 + 8)*n_entries bytes rather than (4 + 4 + 8)*n_entries bytes. That’s a 50% increase in memory usage! Plus, sparse must be doing some kind of double to integer conversion anyway.

Pat Quillen replied on : 20 of 49

Phil,

Requiring index vectors to be double is a throwback to the days of “everything in MATLAB is a double”. Indeed, our default type for all index vectors is double (take the output of find for example). We’ll take it as an enhancement request to allow sparse to take vectors of an integral type as inputs.

Dave replied on : 21 of 49

Hi,
Great stuff! Can anyone point me to something like sparse2 for a poor fortran (90) programmer…? In particular, I need a fast routine to convert a triplet form matrix to compressed sparse row format.
I would greatly appreciate any suggestions!

Xuexin Liu replied on : 22 of 49

Thank you for your posts! In this post, you gave us the advice “Moral: Do Not Abuse A(i,j)=… for Sparse A; Use sparse Instead”. My question is that can we always finish our job without using A(i,j)=… ?

Thanks!

And below are the details of my questions:

In the formulation of a dynamic system equation (one example is a circuit containing resistors, capacitors, and inductors), the entries in the sparse matrix may be changed when a new device is loaded (the devices are loaded from a net list specified the interconnection of them). And after all the devices in the circuit are loaded, the program will solve the equation using the sparse matrix.

In this case, I haven’t figured out a way without using assignment such as
C(i,j) = C(i,j) + new_param;

Tim Davis replied on : 23 of 49

Reply to Xuexin Liu: if you are changing just a few entries, then C(i,j)=C(i,j)+new_param is fine. If you are changing a lot of entries, then create a matrix New_param (for example) with all the changes, using the method described here. Then do C=C+New_param to make all the changes at once. This method will be a lot faster than using the C(i,j)=C(i,j)+new_param inside a loop.

Tim Davis replied on : 24 of 49

p.s. If you are solving x=A\b where A comes from a circuit matrix, you’ll be using UMFPACK (normally) or CHOLMOD (if A is symmetric positive definite. Both of those are my codes. A\b might also use MA57 (from Iain Duff) if it’s symmetric indefinite, or a banded solver if A is a band matrix.

However, I also have another package that’s specifically designed for circuit matrices. It’s much faster than x=A\b (UMFPACK, CHOLMOD, etc). It’s not a part of A\b in MATLAB.

See KLU, in SuiteSparse, at http://www.cise.ufl.edu/research/sparse (just download all of SuiteSparse; KLU is part of it). Using KLU with x=klu(A,’\’,b) can be many times faster than x=A\b for circuit matrices.

Aman P.V replied on : 25 of 49

Hi all,

I have a small doubt. If use sparse command with a repeated indices what will happen? will that sum up or just replace?
As all of you know for a same position in the Global stiffness matrix we;ll have contribution from different elements and we’ll have to sum them up.
Any one please clarify my doubt.

Loren replied on : 26 of 49

Aman-

From the help

“S = sparse(i,j,s,m,n,nzmax) uses vectors i, j, and s to generate an m-by-n sparse matrix such that S(i(k),j(k)) = s(k), with space allocated for nzmax nonzeros. Vectors i, j, and s are all the same length. Any elements of s that are zero are ignored, along with the corresponding values of i and j. Any elements of s that have duplicate values of i and j are added together. “

Values at repeated indices are summed together.

–Loren

Alex replied on : 27 of 49

Dear Loren,

is the function or suite sparse2 still available? How would one integrate it in MATLAB?

Thank you

Alex

Loren replied on : 28 of 49

Alex-

sparse2.m not part of MATLAB, and it doesn’t seem to be on the File Exchange. It does exist on Tim’s site:

http://www.cise.ufl.edu/research/sparse/ (there is a sparse2.m under there, and lots of documentation).

–Loren

Alex replied on : 29 of 49

Dear Loren,

Thank you for the link. I have another question that is often discussed when using the function eigs.

I call the following function:

val=eigs(Kr,Mr,10,'sm')


My reduced matrices Kr and Mr are finite-element stiffness and mass. They are obtained from the sparse, semi-positive definite matrices K and M as follows:

Kr=H'*K*H
Mr=H'*M*H


The matrix H is mostly zeros off diagonal. The diagonal terms are 1’s. Some off-diagonal terms are complex, exp(1i*pi/4) for example. H is not square. For example K = 10×10, H = 10×6.

Nominally, I expect Kr and Mr to be semi-positive definite and hermitian. This is what I obtain if M, K, and H are defined as full. If however I do this operation with sparse matrices (K,M, and H), I get very small real terms of the order of 1e-9 (max(K)~1e9) off diagonal. This triggers the following error with eigs:

Generalized matrix B must be the same size as A and either a symmetric positive
(semi-)definite matrix or its Cholesky factor.

I traced the error to line 805 of the function eigs. If I comment out this error, the function works properly and I get the correct eigenvalues for my problem.

Could you comment on this?
Is it possible the sparse-matrix multiplication introduces noise?
Could the error about the Cholesky factor be too strict a condition?

Thank you,

Alex

Prashanth replied on : 30 of 49

Is the following operation efficient?
A = [
A11,A12,A13;
A21,A22,A23;
A31,A32,A33;];

A11,A12,…A33 are all sparse FEM matrices assembled as per the suggestions you made in this post. This situation appears when there are more than one degree-of-freedom per node.

Thank you.

Loren replied on : 31 of 49

Prashanth-

If you could make the matrix A in one call without first making A11, etc., that would be more efficient. If not, you might try assembling the 3 “columns” separately and then concatenating those, and profiling that in comparison to the one single statement you have for building A.

–Loren

jiang-ming zhang replied on : 32 of 49

i often need to diagonalize a sparse hermitian matrix

fully diagonalize it

i wonder why eig does not accept any more input to indicate that the matrix is hermitian or not

the algorithm surely depends on whether the matrix is hermitian or not.

that is another stupid thing about eig, i think.

katak goreng replied on : 33 of 49

I’m developing my own Navier Stokes solver by finite element method and currently I’m testing some benchmark problem that need me to solve A\b for very large size of A and b. A is square, asymmetric matrix whereas is a vector. Size of A is 300,000 by 300,000 and b is 300,000 by 1. I used MEX to complied the assembly process for indices I,J,X and then used the indices I,J and X to setup sparse stiffness matrix i.e. Stiffness(I,J,X,variable_num,variable_num).After that I solve A\b. The assembly takes around 10 sec but the solution for A\b takes around >30 sec. How can I improve on this.

Suresh replied on : 34 of 49

Hi Loren,

Thanks for the post.
I have some very big array of matrices, which i want to convert to sparse, But it seems sparse doesnt support array.
Any other suggestion ?
Regards,
Suresh

Loren replied on : 35 of 49

Suresh,

You might see if there’s something on the file exchange. Depending on your problem, you could store sparse planes on a cell array or structure.

Loren

Al replied on : 36 of 49

i’m not sure if this can help for speeding the process of assembly but since the matrices are symmetric you could do this

for krow=1:8
for kcol=krow:8
ntriplets=ntriplets+1;
I(ntriplets)=nn(krow);
J(ntriplets)=nn(kcol);
if krow==kcol
X(ntriplets)=em(krow,kcol)/2;
else
X(ntriplets)=em(krow,kcol);
end
end
end

and after the whole cycle

A=sparse(I,J,X,n,n)+sparse(I,J,X,n,n)’;

Al replied on : 37 of 49

i forgot to mention that now ntriplets is nx*ny*(64/2+4), it saves up memory too (try nx=ny=500)

Al replied on : 38 of 49

last comment :)
it would be even better if the last command to assembly the sparse matrix is:

A=sparse(I,J,X,n,n);
A=A+A’;

wahyoe replied on : 39 of 49

hi all
Can you help me, if a small data file matrix, this listing can work, but if the matrix for large matrix in case of error, can you help fix this listing. thank you

??? Maximum variable size allowed by the program is exceeded.

Error in ==> assembly_t3d at 9
K=zeros(ndof,ndof,nel)

Error in ==> main at 34
[K] = assembly_t3d( index,IF,IR, Keg )

function [K] = assembly_t3d(index,IF,IR, Keg )

%% DATA
[nel,ni]=size(index)
ndof=length([IF IR])

%%Stiffness Matrix
K=zeros(ndof,ndof,nel)
for i=1:nel
for j=1:ni
for k=1:ni
ij=index(i,j)
ik=index(i,j)
K(ij,ik,i)=Keg(j,k,i)
end
end
end
K=(sum(K,3))

Loren replied on : 40 of 49

Wahyoe-

I suspect your progam is asking for more memory than it can find.

–Loren

Meenakshi Sundaram replied on : 41 of 49

Hi Loren,

I am developing a code for non-linear FEM in matlab. I notice that the same sparsity pattern occurs during the assembly of the matrix in every iteration.

Should I create the sparse matrix only once in the first iteration and dereference it again and again by using

K(element_dofs,element_dofs)=K(element_dofs,element_dofs)+ elem_K;


as opposed to assembling it again and again in every iteration.

It boils down to whether dereferencing a created sparse matrix is faster in comparison to creating one?

-Meenakshi

Loren replied on : 42 of 49

That would be the best idea for not thrashing memory.

–Loren

Meenakshi Sundaram replied on : 43 of 49

Loren,

But it needs zeroing it out the entries every time which I think causes it to loose the allocated memory. Any other clever suggestions would be really helpful.

Meenakshi

Loren replied on : 44 of 49

Meenakshi-

Not if the zeros stay in the same place each time, which is what you wrote above.

–Loren

Meenakshi Sundaram replied on : 45 of 49

Dear Loren,

The sparsity pattern remains the same. But the entries in the filled location of the matrix have to be cleared out and then re-filled in with fresh values every iteration.

Meenakshi

Loren replied on : 46 of 49

Meenakshi,

If the locations are the same, which I still can’t tell from your response, then you should be okay. If the sparsity is the same, you will have some overhead, but not more than creating the sparse matrix from scratch, I think. You can try it out and time it yourself.

–Loren

Terence replied on : 47 of 49

Dear Loren

I recently encountered a problem that seems relevant to the current forum.

I have a column vector of positive integers (basically indices), and a second column vector of associated real numbers. The task is basically to sort the first column in ascending order, while maintaing the correspondence with the second column. Any repeated indices must result in the corresponding entries being summed. Moving from for loops and if statements to the sparse and sparse2 functions resulted in very large savings in time. However since the above operations must be performed some 10^5 or 10^6 times, I wonder if there is a faster method than using sparse? Would C++ or fortran be able to do such a task faster?

A modified example of the code is as follows (for other reasons the for loop cannot be removed, Nt here is only 10, but needs to be 10^5 or 10^6):

NG = 20000;
N = 5e5;
Nt = 10;

for loop = 1 : Nt

col1 = floor(NG*rand(N, 1)) + 1;
col2 = rand(N, 1);

A = sparse(col1, 1, col2, NG, 1);

end


The A vector is then used elsewhere (not shown).

Thanks
Terence

Pat replied on : 48 of 49

Hi Terence.

If you only wish to use A, then consider the following:

A = zeros(NG,1);
for k = 1:N
A(col1(k)) = A(col1(k)) + col2(k);
end


While this is not quite as elegant as

A = sparse(col1, 1, col2, NG, 1);


it is linear in the number of elements in col1 and avoids a sort. On the other hand, if many entries of A wind up zero, then the code above will most likely use more memory.

Take care.
Pat.

Terence replied on : 49 of 49

Dear Pat

Thanks for the suggestion. I tried it and it works quite well, speeding up the algorithm by almost a factor of 10.

Out of interest I wrote a simple C code and used the mex function to put it in MATLAB. While this was about 2 times faster than sparse2 for my application, your suggest works much better and avoids the need for all the mex function complexity (observation: just don’t time the new code with the profiler; the overhead increases enormously as it goes through each step in the k loop!!)

Thanks
Terence