{"id":79,"date":"2007-03-01T12:23:51","date_gmt":"2007-03-01T17:23:51","guid":{"rendered":"https:\/\/blogs.mathworks.com\/loren\/?p=79"},"modified":"2017-02-24T20:00:16","modified_gmt":"2017-02-25T01:00:16","slug":"creating-sparse-finite-element-matrices-in-matlab","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/loren\/2007\/03\/01\/creating-sparse-finite-element-matrices-in-matlab\/","title":{"rendered":"Creating Sparse Finite-Element Matrices in MATLAB"},"content":{"rendered":"<div class=\"content\">\n<p>I'm pleased to introduce Tim Davis as this week's guest blogger. Tim is a professor at the University of Florida, and is the<br \/>\nauthor or co-author of many of our sparse matrix functions (lu, chol, much of sparse backslash, ordering methods such as amd<br \/>\nand colamd, and other functions such as etree and symbfact). He is also the author of a recent book, Direct Methods for Sparse Linear Systems, published by SIAM, where more details of MATLAB sparse matrices are discussed ( <a href=\"http:\/\/www.cise.ufl.edu\/~davis\">http:\/\/www.cise.ufl.edu\/~davis<\/a> ).<\/p>\n<p>&nbsp;<\/p>\n<h3>Contents<\/h3>\n<div>\n<ul>\n<li><a href=\"#1\">MATLAB is Slow? Only If You Use It Incorrectly<\/a><\/li>\n<li><a href=\"#2\">How Not to Create a Finite-Element Matrix<\/a><\/li>\n<li><a href=\"#3\">What's Wrong with It?<\/a><\/li>\n<li><a href=\"#5\">How MATLAB Stores Sparse Matrices<\/a><\/li>\n<li><a href=\"#9\">A Better Way to Create a Finite-Element Matrix<\/a><\/li>\n<li><a href=\"#12\">Moral: Do Not Abuse A(i,j)=... for Sparse A; Use sparse Instead<\/a><\/li>\n<li><a href=\"#14\">Try a Faster Sparse Function<\/a><\/li>\n<li><a href=\"#15\">When Fast is Not Fast Enough...<\/a><\/li>\n<\/ul>\n<\/div>\n<h3>MATLAB is Slow? Only If You Use It Incorrectly<a name=\"1\"><\/a><\/h3>\n<p>From time to time, I hear comments such as \"MATLAB is slow for large finite-element problems.\" When I look at the details,<br \/>\nthe problem is typically due to an overuse of <tt>A(i,j)= ...<\/tt> when creating the sparse matrix (assembling the finite elements). This can be seen in typical user's code, MATLAB code in<br \/>\nbooks on the topic, and even in MATLAB itself. The problem is widespread. MATLAB can be very fast for finite-element problems,<br \/>\nbut not if it's used incorrectly.<\/p>\n<h3>How Not to Create a Finite-Element Matrix<a name=\"2\"><\/a><\/h3>\n<p>A good example of what not to do can be found in the <tt>wathen.m<\/tt> function, in MATLAB. <tt>A=gallery('wathen',200,200)<\/tt> takes a huge amount of time; a very minor modification cuts the run time drastically. I'm not intending to single out this<br \/>\none function for critique; this is a very common issue that I see over and over again. This function is built into MATLAB,<br \/>\nwhich makes it an accessible example. It was first written when MATLAB did not support sparse matrices, and was modified<br \/>\nonly slightly to exploit sparsity. It was never meant for generating large sparse finite-element matrices. You can see the<br \/>\nentire function with the command:<\/p>\n<pre>  type private\/wathen.m<\/pre>\n<p>Below is an excerpt of the relevant parts of <tt>wathen.m<\/tt>. The function <tt>wathen1.m<\/tt> creates a finite-element matrix of an nx-by-ny mesh. Each <tt>i<\/tt> loop creates a single 8-by-8 finite-element matrix, and adds it into <tt>A<\/tt>.<\/p>\n<pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid #c8c8c8;\">type <span style=\"color: #a020f0;\">wathen1<\/span>\r\ntic\r\nA = wathen1 (200,200) ;\r\ntoc<\/pre>\n<pre style=\"font-style: oblique;\">function A = wathen1 (nx,ny)\r\nrand('state',0)\r\ne1 = [6 -6 2 -8;-6 32 -6 20;2 -6 6 -6;-8 20 -6 32];\r\ne2 = [3 -8 2 -6;-8 16 -8 20;2 -8 3 -8;-6 20 -8 16];\r\ne = [e1 e2; e2' e1]\/45;\r\nn = 3*nx*ny+2*nx+2*ny+1;\r\nA = sparse(n,n);\r\nRHO = 100*rand(nx,ny);\r\nnn = zeros(8,1);\r\nfor j=1:ny\r\n    for i=1:nx\r\n        nn(1) = 3*j*nx+2*i+2*j+1;\r\n        nn(2) = nn(1)-1;\r\n        nn(3) = nn(2)-1;\r\n        nn(4) = (3*j-1)*nx+2*j+i-1;\r\n        nn(5) = 3*(j-1)*nx+2*i+2*j-3;\r\n        nn(6) = nn(5)+1;\r\n        nn(7) = nn(6)+1;\r\n        nn(8) = nn(4)+1;\r\n        em = e*RHO(i,j);\r\n        for krow=1:8\r\n            for kcol=1:8\r\n                A(nn(krow),nn(kcol)) = A(nn(krow),nn(kcol))+em(krow,kcol);\r\n            end\r\n        end\r\n    end\r\nend\r\n\r\nElapsed time is 305.832709 seconds.\r\n<\/pre>\n<h3>What's Wrong with It?<a name=\"3\"><\/a><\/h3>\n<p>The above code is fine for generating a modest sized matrix, but the <tt>A(i,j) = ...<\/tt> statement is quite slow when <tt>A<\/tt> is large and sparse, particularly when <tt>i<\/tt> and <tt>j<\/tt> are also scalars. The inner two for-loops can be vectorized so that <tt>i<\/tt> and <tt>j<\/tt> are vectors of length 8. Each <tt>A(i,j) = ...<\/tt> statement is assembling an entire finite-element matrix into <tt>A<\/tt>. However, this leads to very minimal improvement in run time.<\/p>\n<pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid #c8c8c8;\">type <span style=\"color: #a020f0;\">wathen1b<\/span>\r\ntic\r\nA1b = wathen1b (200,200) ;\r\ntoc<\/pre>\n<pre style=\"font-style: oblique;\">function A = wathen1b (nx,ny)\r\nrand('state',0)\r\ne1 = [6 -6 2 -8;-6 32 -6 20;2 -6 6 -6;-8 20 -6 32];\r\ne2 = [3 -8 2 -6;-8 16 -8 20;2 -8 3 -8;-6 20 -8 16];\r\ne = [e1 e2; e2' e1]\/45;\r\nn = 3*nx*ny+2*nx+2*ny+1;\r\nA = sparse(n,n);\r\nRHO = 100*rand(nx,ny);\r\nnn = zeros(8,1);\r\nfor j=1:ny\r\n    for i=1:nx\r\n        nn(1) = 3*j*nx+2*i+2*j+1;\r\n        nn(2) = nn(1)-1;\r\n        nn(3) = nn(2)-1;\r\n        nn(4) = (3*j-1)*nx+2*j+i-1;\r\n        nn(5) = 3*(j-1)*nx+2*i+2*j-3;\r\n        nn(6) = nn(5)+1;\r\n        nn(7) = nn(6)+1;\r\n        nn(8) = nn(4)+1;\r\n        em = e*RHO(i,j);\r\n        A (nn,nn) = A (nn,nn) + em ;\r\n    end\r\nend\r\n\r\nElapsed time is 282.945593 seconds.\r\n<\/pre>\n<pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid #c8c8c8;\">disp (norm (A-A1b,1))<\/pre>\n<pre style=\"font-style: oblique;\">     0\r\n<\/pre>\n<h3>How MATLAB Stores Sparse Matrices<a name=\"5\"><\/a><\/h3>\n<p>To understand why the above examples are so slow, you need to understand how MATLAB stores its sparse matrices. An n-by-n<br \/>\nMATLAB sparse matrix is stored as three arrays; I'll call them <tt>p<\/tt>, <tt>i<\/tt>, and <tt>x<\/tt>. These three arrays are not directly accessible from M, but they can be accessed by a mexFunction. The nonzero entries in<br \/>\ncolumn <tt>j<\/tt> are stored in <tt>i(p(j):p(j+1)-1)<\/tt> and <tt>x(p(j):p(j+1)-1)<\/tt>, where <tt>x<\/tt> holds the numerical values and <tt>i<\/tt> holds the corresponding row indices. Below is a very small example. First, I create a <b>full<\/b> matrix and convert it into a sparse one. This is only so that you can easily see the matrix <tt>C<\/tt> and how it's stored in sparse form. You should never create a sparse matrix this way, except for tiny examples.<\/p>\n<pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid #c8c8c8;\">C = [\r\n4.5 0.0 3.2 0.0\r\n3.1 2.9 0.0 0.9\r\n0.0 1.7 3.0 0.0\r\n3.5 0.4 0.0 1.0 ] ;\r\nC = sparse (C)<\/pre>\n<pre style=\"font-style: oblique;\">C =\r\n   (1,1)       4.5000\r\n   (2,1)       3.1000\r\n   (4,1)       3.5000\r\n   (2,2)       2.9000\r\n   (3,2)       1.7000\r\n   (4,2)       0.4000\r\n   (1,3)       3.2000\r\n   (3,3)       3.0000\r\n   (2,4)       0.9000\r\n   (4,4)       1.0000\r\n<\/pre>\n<p>Notice that the nonzero entries in <tt>C<\/tt> are stored in column order, with sorted row indices. The internal <tt>p<\/tt>, <tt>i<\/tt>, and <tt>x<\/tt> arrays can be reconstructed as follows. The <tt>find(C)<\/tt> statement returns a list of \"triplets,\" where the kth triplet is <tt>i(k)<\/tt>, <tt>j(k)<\/tt>, and <tt>x(k)<\/tt>. This specifies that <tt>C(i(k),j(k))<\/tt> is equal to <tt>x(k)<\/tt>. Next, <tt>find(diff(...))<\/tt> constructs the column pointer array <tt>p<\/tt> (this only works if there are no all-zero columns in the matrix).<\/p>\n<pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid #c8c8c8;\">[i j x] = find (C) ;\r\nn = size(C,2) ;\r\np = find (diff ([0 ; j ; n+1])) ;\r\n<span style=\"color: #0000ff;\">for<\/span> col = 1:n\r\n    fprintf (<span style=\"color: #a020f0;\">'column %d:\\n    k      row index     value\\n'<\/span>, col) ;\r\n    disp ([(p(col):p(col+1)-1)' i(p(col):p(col+1)-1) x(p(col):p(col+1)-1)])\r\n<span style=\"color: #0000ff;\">end<\/span><\/pre>\n<pre style=\"font-style: oblique;\">column 1:\r\n    k      row index     value\r\n    1.0000    1.0000    4.5000\r\n    2.0000    2.0000    3.1000\r\n    3.0000    4.0000    3.5000\r\ncolumn 2:\r\n    k      row index     value\r\n    4.0000    2.0000    2.9000\r\n    5.0000    3.0000    1.7000\r\n    6.0000    4.0000    0.4000\r\ncolumn 3:\r\n    k      row index     value\r\n    7.0000    1.0000    3.2000\r\n    8.0000    3.0000    3.0000\r\ncolumn 4:\r\n    k      row index     value\r\n    9.0000    2.0000    0.9000\r\n   10.0000    4.0000    1.0000\r\n<\/pre>\n<p>Now consider what happens when one entry is added to <tt>C<\/tt>:<\/p>\n<pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid #c8c8c8;\">C(3,1) = 42 ;\r\n\r\n[i j x] = find (C) ;\r\nn = size(C,2) ;\r\np = find (diff ([0 ; j ; n+1])) ;\r\n<span style=\"color: #0000ff;\">for<\/span> col = 1:n\r\n    fprintf (<span style=\"color: #a020f0;\">'column %d:\\n    k      row index     value\\n'<\/span>, col) ;\r\n    disp ([(p(col):p(col+1)-1)' i(p(col):p(col+1)-1) x(p(col):p(col+1)-1)])\r\n<span style=\"color: #0000ff;\">end<\/span><\/pre>\n<pre style=\"font-style: oblique;\">column 1:\r\n    k      row index     value\r\n    1.0000    1.0000    4.5000\r\n    2.0000    2.0000    3.1000\r\n    3.0000    3.0000   42.0000\r\n    4.0000    4.0000    3.5000\r\ncolumn 2:\r\n    k      row index     value\r\n    5.0000    2.0000    2.9000\r\n    6.0000    3.0000    1.7000\r\n    7.0000    4.0000    0.4000\r\ncolumn 3:\r\n    k      row index     value\r\n    8.0000    1.0000    3.2000\r\n    9.0000    3.0000    3.0000\r\ncolumn 4:\r\n    k      row index     value\r\n   10.0000    2.0000    0.9000\r\n   11.0000    4.0000    1.0000\r\n<\/pre>\n<p>and you can see that nearly every entry in <tt>C<\/tt> has been moved down by one in the <tt>i<\/tt> and <tt>x<\/tt> arrays. In general, the single statement <tt>C(3,1)=42<\/tt> takes time proportional to the number of entries in matrix. Thus, looping <tt>nnz(A)<\/tt> times over the statement <tt>A(i,j)=A(i,j)+...<\/tt> takes time proportional to <tt>nnz(A)^2<\/tt>.<\/p>\n<h3>A Better Way to Create a Finite-Element Matrix<a name=\"9\"><\/a><\/h3>\n<p>The version below is only slightly different. It could be improved, but I left it nearly the same to illustrate how simple<br \/>\nit is to write fast MATLAB code to solve this problem, via a minor tweak. The idea is to create a list of triplets, and let<br \/>\nMATLAB convert them into a sparse matrix all at once. If there are duplicates (which a finite-element matrix always has)<br \/>\nthe duplicates are summed, which is exactly what you want when assembling a finite-element matrix. In MATLAB 7.3 (R2006b),<br \/>\n<tt>sparse<\/tt> uses quicksort, which takes <tt>nnz(A)*log(nnz(A))<\/tt> time. This is slower than it could be (a linear-time bucket sort can be used, taking essentially <tt>nnz(A)<\/tt> time). However, it's still much faster than <tt>nnz(A)^2<\/tt>. For this matrix, <tt>nnz(A)<\/tt> is about 1.9 million.<\/p>\n<pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid #c8c8c8;\">type <span style=\"color: #a020f0;\">wathen2.m<\/span>\r\ntic\r\nA2 = wathen2 (200,200) ;\r\ntoc<\/pre>\n<pre style=\"font-style: oblique;\">function A = wathen2 (nx,ny)\r\nrand('state',0)\r\ne1 = [6 -6 2 -8;-6 32 -6 20;2 -6 6 -6;-8 20 -6 32];\r\ne2 = [3 -8 2 -6;-8 16 -8 20;2 -8 3 -8;-6 20 -8 16];\r\ne = [e1 e2; e2' e1]\/45;\r\nn = 3*nx*ny+2*nx+2*ny+1;\r\nntriplets = nx*ny*64 ;\r\nI = zeros (ntriplets, 1) ;\r\nJ = zeros (ntriplets, 1) ;\r\nX = zeros (ntriplets, 1) ;\r\nntriplets = 0 ;\r\nRHO = 100*rand(nx,ny);\r\nnn = zeros(8,1);\r\nfor j=1:ny\r\n    for i=1:nx\r\n        nn(1) = 3*j*nx+2*i+2*j+1;\r\n        nn(2) = nn(1)-1;\r\n        nn(3) = nn(2)-1;\r\n        nn(4) = (3*j-1)*nx+2*j+i-1;\r\n        nn(5) = 3*(j-1)*nx+2*i+2*j-3;\r\n        nn(6) = nn(5)+1;\r\n        nn(7) = nn(6)+1;\r\n        nn(8) = nn(4)+1;\r\n        em = e*RHO(i,j);\r\n        for krow=1:8\r\n            for kcol=1:8\r\n                ntriplets = ntriplets + 1 ;\r\n                I (ntriplets) = nn (krow) ;\r\n                J (ntriplets) = nn (kcol) ;\r\n                X (ntriplets) = em (krow,kcol) ;\r\n            end\r\n        end\r\n    end\r\nend\r\nA = sparse (I,J,X,n,n) ;\r\n\r\nElapsed time is 1.594073 seconds.\r\n<\/pre>\n<pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid #c8c8c8;\">disp (norm (A-A2,1))<\/pre>\n<pre style=\"font-style: oblique;\">  1.4211e-014\r\n<\/pre>\n<p>If you do not know how many entries your matrix will have, you may not be able to preallocate the <tt>I<\/tt>, <tt>J<\/tt>, and <tt>X<\/tt> arrays, as done in <tt>wathen2.m<\/tt>. In that case, start them at a reasonable size (anything larger than zero will do) and add this code to the innermost loop,<br \/>\njust after <tt>ntriplets<\/tt> is incremented:<\/p>\n<pre>    len = length (X) ;\r\n    if (ntriplets &gt; len)\r\n       I (2*len) = 0 ;\r\n       J (2*len) = 0 ;\r\n       X (2*len) = 0 ;\r\n    end<\/pre>\n<p>and when done, use <tt>sparse(I(1:ntriplets),J(1:ntriplets),X(1:ntriplets),n,n)<\/tt>.<\/p>\n<h3>Moral: Do Not Abuse A(i,j)=... for Sparse A; Use sparse Instead<a name=\"12\"><\/a><\/h3>\n<p>Avoid statements such as<\/p>\n<pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid #c8c8c8;\">C(4,2) = C(4,2) + 42 ;<\/pre>\n<p>in a loop. Create a list of triplets (i,j,x) and use <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/sparse.html\"><tt>sparse<\/tt><\/a> instead. This advice holds for any sparse matrix, not just finite-element ones.<\/p>\n<h3>Try a Faster Sparse Function<a name=\"14\"><\/a><\/h3>\n<p>CHOLMOD includes a <tt>sparse2<\/tt> mexFunction which is a replacement for <tt>sparse<\/tt>. It uses a linear-time bucket sort. The MATLAB 7.3 (R2006b) <tt>sparse<\/tt> accounts for about 3\/4ths the total run time of <tt>wathen2.m<\/tt>. For this matrix <tt>sparse2<\/tt> in CHOLMOD is about 10 times faster than the MATLAB <tt>sparse<\/tt>. CHOLMOD can be found in the SuiteSparse package, in MATLAB Central.<\/p>\n<p>If you would like to see a short and concise C mexFunction implementation of the method used by <tt>sparse2<\/tt>, take a look at CSparse, which was written for a concise textbook-style presentation. The <tt>cs_sparse<\/tt> mexFunction uses <tt>cs_compress.c<\/tt>, to convert the triplets to a compressed column form of <tt>A'<\/tt>, <tt>cs_dupl.c<\/tt> to sum up duplicate entries, and then <tt>cs_transpose.c<\/tt> to transpose the result (which also sorts the columns).<\/p>\n<h3>When Fast is Not Fast Enough...<a name=\"15\"><\/a><\/h3>\n<p>Even with this dramatic improvement in constructing the matrix <tt>A<\/tt>, MATLAB could still use additional features for faster construction of sparse finite-element matrices. Constructing the<br \/>\nmatrix should be much faster than <tt>x=A\\b<\/tt>, since <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/chol.html\"><tt>chol<\/tt><\/a> is doing about 700 times more work as <tt>sparse<\/tt> for this matrix (1.3 billion flops, vs 1.9 million nonzeros in <tt>A<\/tt>). The run times are not that different, however:<\/p>\n<pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid #c8c8c8;\">tic\r\nA = wathen2 (200,200) ;\r\ntoc\r\nb = rand (size(A,1),1) ;\r\ntic\r\nx=A\\b ;\r\ntoc<\/pre>\n<pre style=\"font-style: oblique;\">Elapsed time is 1.720397 seconds.\r\nElapsed time is 3.125791 seconds.\r\n<\/pre>\n<p><script>\/\/ <![CDATA[\nfunction grabCode_21d29e7e6e0e4c2e827c670f6ecf434e() {\n        \/\/ Remember the title so we can use it in the new page\n        title = document.title;\n\n        \/\/ Break up these strings so that their presence\n        \/\/ in the Javascript doesn't mess up the search for\n        \/\/ the MATLAB code.\n        t1='21d29e7e6e0e4c2e827c670f6ecf434e ' + '##### ' + 'SOURCE BEGIN' + ' #####';\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 21d29e7e6e0e4c2e827c670f6ecf434e';\n    \n        b=document.getElementsByTagName('body')[0];\n        i1=b.innerHTML.indexOf(t1)+t1.length;\n        i2=b.innerHTML.indexOf(t2);\n \n        code_string = b.innerHTML.substring(i1, i2);\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\n\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \n        \/\/ in the XML parser.\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\n        \/\/ doesn't go ahead and substitute the less-than character. \n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\n\n        author = 'Loren Shure';\n        copyright = 'Copyright 2006 The MathWorks, Inc.';\n\n        w = window.open();\n        d = w.document;\n        d.write('\n\n\n\n\n\n<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add author and copyright lines at the bottom if specified.\r\n        if ((author.length > 0) || (copyright.length > 0)) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (author.length > 0) {\r\n                d.writeln('% _' + author + '_');\r\n            }\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\n\n\n\n\n\n\n\\n');\n      \n      d.title = title + ' (MATLAB code)';\n      d.close();\n      }\n\/\/ ]]><\/script><\/p>\n<p style=\"text-align: right; font-size: xx-small; font-weight: lighter; font-style: italic; color: gray;\"><a><span style=\"font-size: x-small; font-style: italic;\">Get<br \/>\nthe MATLAB code<br \/>\n<noscript>(requires JavaScript)<\/noscript><\/span><\/a><\/p>\n<p>Published with MATLAB\u00ae 7.4<\/p>\n<\/div>\n<p><!--\n21d29e7e6e0e4c2e827c670f6ecf434e ##### SOURCE BEGIN #####\n%% Creating Sparse Finite-Element Matrices in MATLAB\n% I'm pleased to introduce Tim Davis as this week's guest blogger.\n% Tim is a professor at the University of Florida, and is the author\n% or co-author of many of our sparse matrix functions (lu, chol, much\n% of sparse backslash, ordering methods such as amd and colamd, and other\n% functions such as etree and symbfact).\n% He is also the author of a recent book,\n% <http:\/\/ec-securehost.com\/SIAM\/FA02.html Direct Methods for Sparse Linear Systems>,\n% published by SIAM, where more details of MATLAB\n% sparse matrices are discussed ( http:\/\/www.cise.ufl.edu\/~davis ).\n\n%% MATLAB is Slow?  Only If You Use It Incorrectly\n% From time to time, I hear comments such as \"MATLAB is slow for large\n% finite-element problems.\"  When I look at the details, the problem is\n% typically due to an overuse of |A(i,j)= ...| when creating the sparse\n% matrix (assembling the finite elements).  This can be seen in typical\n% user's code, MATLAB code in books on the topic, and even in MATLAB\n% itself.  The problem is widespread.  MATLAB can be very fast\n% for finite-element problems, but not if it's used incorrectly.\n\n%% How Not to Create a Finite-Element Matrix\n% A good example of what not to do can be found in the |wathen.m|\n% function, in MATLAB.  |A=gallery('wathen',200,200)| takes a huge amount\n% of time; a very minor modification cuts the run time drastically.\n% I'm not intending to single out this one function for\n% critique; this is a very common issue that I see over and over again.\n% This function is built into MATLAB, which makes it an accessible\n% example.  It was first written when MATLAB did not support sparse\n% matrices, and was modified only slightly to exploit sparsity.  It was\n% never meant for generating large sparse finite-element matrices.\n% You can see the entire function with the command:\n%\n%    type private\/wathen.m\n%\n% Below is an excerpt of the relevant parts of |wathen.m|.  The function\n% |wathen1.m| creates a finite-element matrix of an nx-by-ny mesh.\n% Each |i| loop creates a single 8-by-8 finite-element matrix, and adds it\n% into |A|.\n\ntype wathen1\ntic\nA = wathen1 (200,200) ;\ntoc\n\n%% What's Wrong with It?\n% The above code is fine for generating a modest sized matrix, but\n% the |A(i,j) = ...| statement is quite slow when |A| is large and sparse,\n% particularly when |i| and |j| are also scalars.  The inner two for-loops\n% can be vectorized so that |i| and |j| are vectors of length 8.  Each\n% |A(i,j) = ...| statement is assembling an entire finite-element matrix\n% into |A|.  However, this leads to very minimal improvement in run time.\n\ntype wathen1b\ntic\nA1b = wathen1b (200,200) ;\ntoc\n\n%%\n\ndisp (norm (A-A1b,1))\n\n%% How MATLAB Stores Sparse Matrices\n% To understand why the above examples are so slow, you need to understand\n% how MATLAB stores its sparse matrices.\n% An n-by-n MATLAB sparse matrix is\n% stored as three arrays; I'll call them |p|, |i|, and |x|.  These three arrays\n% are not directly accessible from M, but they can be accessed by a\n% mexFunction.\n% The nonzero entries in column |j| are stored in |i(p(j):p(j+1)-1)| and\n% |x(p(j):p(j+1)-1)|, where |x| holds the numerical values and |i| holds the\n% corresponding row indices.  Below is a very small example.  First,\n% I create a *full* matrix and convert it into a sparse one.  This is\n% only so that you can easily see the matrix |C| and how it's stored in\n% sparse form.  You should never create a sparse matrix this way,\n% except for tiny examples.\n\nC = [\n4.5 0.0 3.2 0.0\n3.1 2.9 0.0 0.9\n0.0 1.7 3.0 0.0\n3.5 0.4 0.0 1.0 ] ;\nC = sparse (C)\n\n%%\n% Notice that the nonzero entries in |C| are stored in column order, with sorted\n% row indices.  The internal |p|, |i|, and |x| arrays can be reconstructed as\n% follows.\n% The |find(C)| statement returns a list of \"triplets,\" where the kth triplet is |i(k)|, |j(k)|, and\n% |x(k)|.  This specifies that |C(i(k),j(k))| is equal to |x(k)|.  Next,\n% |find(diff(...))|\n% constructs the column pointer array |p| (this only works if there are no\n% all-zero columns in the matrix).\n\n[i j x] = find (C) ;\nn = size(C,2) ;\np = find (diff ([0 ; j ; n+1])) ;\nfor col = 1:n\nfprintf ('column %d:\\n    k      row index     value\\n', col) ;\ndisp ([(p(col):p(col+1)-1)' i(p(col):p(col+1)-1) x(p(col):p(col+1)-1)])\nend\n\n%%\n% Now consider what happens when one entry is added to |C|:\n\nC(3,1) = 42 ;\n\n[i j x] = find (C) ;\nn = size(C,2) ;\np = find (diff ([0 ; j ; n+1])) ;\nfor col = 1:n\nfprintf ('column %d:\\n    k      row index     value\\n', col) ;\ndisp ([(p(col):p(col+1)-1)' i(p(col):p(col+1)-1) x(p(col):p(col+1)-1)])\nend\n\n%%\n% and you can see that nearly every entry in |C| has been moved down by one\n% in the |i| and |x| arrays.\n% In general, the single statement |C(3,1)=42| takes time proportional to the\n% number of entries in matrix.  Thus, looping |nnz(A)| times over the\n% statement |A(i,j)=A(i,j)+...| takes time proportional to |nnz(A)^2|.\n\n%% A Better Way to Create a Finite-Element Matrix\n% The version below is only slightly different. It could be improved,\n% but I left it nearly the same to illustrate how simple it is to write\n% fast MATLAB code to solve this problem, via a minor tweak.  The idea\n% is to create a list of triplets, and let MATLAB convert them into a\n% sparse matrix all at once.  If there are duplicates (which a finite-element\n% matrix always has) the duplicates are summed, which is exactly what you\n% want when assembling a finite-element matrix.  In MATLAB 7.3 (R2006b), |sparse| uses quicksort,\n% which takes |nnz(A)*log(nnz(A))| time.  This is slower than it could be\n% (a linear-time bucket sort can be used, taking essentially |nnz(A)| time).\n% However, it's still much faster than |nnz(A)^2|.  For this matrix,\n% |nnz(A)|\n% is about 1.9 million.\n\ntype wathen2.m\ntic\nA2 = wathen2 (200,200) ;\ntoc\n\n%%\n\ndisp (norm (A-A2,1))\n\n%%\n% If you do not know how many entries your matrix will have, you may not\n% be able to preallocate the |I|, |J|, and |X| arrays, as done in |wathen2.m|.\n% In that case,\n% start them at a reasonable size (anything larger than zero will do) and\n% add this code to the\n% innermost loop, just after |ntriplets| is\n% incremented:\n%\n%      len = length (X) ;\n%      if (ntriplets > len)\n%         I (2*len) = 0 ;\n%         J (2*len) = 0 ;\n%         X (2*len) = 0 ;\n%      end\n%\n% and when done, use\n% |sparse(I(1:ntriplets),J(1:ntriplets),X(1:ntriplets),n,n)|.\n\n%% Moral: Do Not Abuse A(i,j)=... for Sparse A; Use sparse Instead\n% Avoid statements such as\n\nC(4,2) = C(4,2) + 42 ;\n\n%%\n% in a loop. Create a list of triplets (i,j,x) and use\n% <https:\/\/www.mathworks.com\/help\/matlab\/ref\/sparse.html |sparse|> instead.\n% This advice holds for any sparse matrix, not just finite-element ones.\n\n%% Try a Faster Sparse Function\n% CHOLMOD includes a |sparse2| mexFunction which is a replacement for\n% |sparse|.  It uses a linear-time bucket sort.   The MATLAB 7.3 (R2006b)\n% |sparse|\n% accounts for about 3\/4ths the total run time of |wathen2.m|.  For this\n% matrix |sparse2| in CHOLMOD is about 10 times faster than the MATLAB |sparse|.\n% CHOLMOD can be found in the\n% <https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/loadFile.do?objectId=12233&objectType=FILE SuiteSparse package>, in MATLAB Central.\n%\n% If you would like to see a short and concise C mexFunction implementation of the method\n% used by |sparse2|, take a look at\n% <https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/loadFile.do?objectId=11740 CSparse>,\n% which was written for a concise textbook-style presentation.  The\n% <http:\/\/www.cise.ufl.edu\/research\/sparse\/CSparse\/CSparse\/MATLAB\/CSparse\/cs_sparse_mex.c |cs_sparse|>\n% mexFunction uses\n% <http:\/\/www.cise.ufl.edu\/research\/sparse\/CSparse\/CSparse\/Source\/cs_compress.c |cs_compress.c|>,\n% to convert the triplets to a compressed column form of |A'|,\n% <http:\/\/www.cise.ufl.edu\/research\/sparse\/CSparse\/CSparse\/Source\/cs_dupl.c |cs_dupl.c|>\n% to sum up duplicate entries, and then\n% <http:\/\/www.cise.ufl.edu\/research\/sparse\/CSparse\/CSparse\/Source\/cs_transpose.c |cs_transpose.c|>\n% to transpose the result (which also sorts the columns).\n\n%% When Fast is Not Fast Enough...\n% Even with this dramatic improvement in constructing the matrix |A|,\n% MATLAB could still use additional features for faster construction of sparse\n% finite-element matrices.  Constructing the matrix should be much faster\n% than |x=A\\b|, since\n% <https:\/\/www.mathworks.com\/help\/matlab\/ref\/chol.html |chol|>\n% is doing about 700 times more work as |sparse| for this matrix\n% (1.3 billion flops, vs 1.9 million nonzeros in |A|).\n% The run times are not that different, however:\n\ntic\nA = wathen2 (200,200) ;\ntoc\nb = rand (size(A,1),1) ;\ntic\nx=A\\b ;\ntoc\n\n##### SOURCE END ##### 21d29e7e6e0e4c2e827c670f6ecf434e\n--><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\nI'm pleased to introduce Tim Davis as this week's guest blogger. Tim is a professor at the University of Florida, and is the<br \/>\nauthor or co-author of many of our sparse matrix functions (lu, chol,... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/loren\/2007\/03\/01\/creating-sparse-finite-element-matrices-in-matlab\/\">read more >><\/a><\/p>\n","protected":false},"author":39,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[16,10,15,36,12],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/79"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/comments?post=79"}],"version-history":[{"count":5,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/79\/revisions"}],"predecessor-version":[{"id":2235,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/79\/revisions\/2235"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/media?parent=79"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/categories?post=79"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/tags?post=79"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}