{"id":143,"date":"2008-06-25T06:46:37","date_gmt":"2008-06-25T11:46:37","guid":{"rendered":"https:\/\/blogs.mathworks.com\/loren\/2008\/06\/25\/speeding-up-matlab-applications\/"},"modified":"2017-05-23T13:40:56","modified_gmt":"2017-05-23T18:40:56","slug":"speeding-up-matlab-applications","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/loren\/2008\/06\/25\/speeding-up-matlab-applications\/","title":{"rendered":"Speeding Up MATLAB Applications"},"content":{"rendered":"<div xmlns:mwsh=\"https:\/\/www.mathworks.com\/namespace\/mcode\/v1\/syntaxhighlight.dtd\" class=\"content\">\r\n   <introduction>\r\n      <p>Today I&#8217;d like to introduce a guest blogger, <a href=\"mailto:sarah.zaranek@mathworks.com\">Sarah Wait Zaranek<\/a>, who is an application engineer here at The MathWorks. Today she talks about speeding up code from a customer to get acceptable\r\n         performance.\r\n      <\/p>\r\n   <\/introduction>\r\n   <h3>Contents<\/h3>\r\n   <div>\r\n      <ul>\r\n         <li><a href=\"#1\">Introduction<\/a><\/li>\r\n         <li><a href=\"#2\">What Does This Code Do?<\/a><\/li>\r\n         <li><a href=\"#3\">The Initial Code<\/a><\/li>\r\n         <li><a href=\"#6\">Listen to M-Lint<\/a><\/li>\r\n         <li><a href=\"#11\">The Code with Preallocation<\/a><\/li>\r\n         <li><a href=\"#14\">Vectorization<\/a><\/li>\r\n         <li><a href=\"#15\">The Code with Preallocation and Vectorization of the Inner Two Loops<\/a><\/li>\r\n         <li><a href=\"#18\">The Code with Preallocation and Vectorization of the Inner Three Loops<\/a><\/li>\r\n         <li><a href=\"#22\">The Code with Preallocation, Vectorization and Only Calculating Once<\/a><\/li>\r\n         <li><a href=\"#25\">Additional Resources<\/a><\/li>\r\n         <li><a href=\"#26\">Your Examples<\/a><\/li>\r\n      <\/ul>\r\n   <\/div>\r\n   <h3>Introduction<a name=\"1\"><\/a><\/h3>\r\n   <p>A customer ran into slow performance issues with her code in MATLAB. She saw such slow performance in that she decided to\r\n      recode her algorithm in another language. I wanted to show her some simple techniques in MATLAB that could bring her code\r\n      down to a more reasonable running time.\r\n   <\/p>\r\n   <p>I enlisted the help of my fellow application engineers to suggest possible changes. Many of them chimed in with some really\r\n      great suggestions. I have integrated most of our thoughts into a few code versions to share with you. These versions illustrate\r\n      our thought processes as we worked through the code collaboratively to improve its speed.\r\n   <\/p>\r\n   <p>In this post, I focus on<\/p>\r\n   <div>\r\n      <ul>\r\n         <li>preallocation<\/li>\r\n         <li>vectorization<\/li>\r\n         <li>elimination of repeated calculations<\/li>\r\n      <\/ul>\r\n   <\/div>\r\n   <h3>What Does This Code Do?<a name=\"2\"><\/a><\/h3>\r\n   <p>For security reasons, the user couldn't give me her exact code. So the code we will use in this blog serves as an approximation\r\n      of the code she is using.\r\n   <\/p>\r\n   <p>The code generates locations on a 2D grid with dimensions <tt>nx1<\/tt> by <tt>nx2<\/tt>. These grid locations serve as values for initial and final positions used by the main part of the calculation. The code\r\n      iterates through all possible combinations of these initial and final positions.\r\n   <\/p>\r\n   <p>Given an initial and final position, the code calculates the quantity <tt>exponent<\/tt> . If <tt>exponent<\/tt> is less than a particular threshold value, <tt>gausThresh<\/tt>, <tt>exponent<\/tt> is then used to calculate <tt>out<\/tt>. <tt>subs<\/tt> contains the indices corresponding to the initial and final positions used to calculate <tt>out<\/tt>.\r\n   <\/p>\r\n   <p>For some of you, this probably is better understood just by looking at the code itself.<\/p>\r\n   <h3>The Initial Code<a name=\"3\"><\/a><\/h3><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\"><span style=\"color: #228B22\">% Initialize the grid and initial and final points<\/span>\r\nnx1 = 10; nx2 =  10;\r\nx1l =  0; x1u = 100;\r\nx2l =  0; x2u = 100;\r\n\r\nx1 = linspace(x1l,x1u,nx1+1);\r\nx2 = linspace(x2l,x2u,nx2+1);\r\n\r\nlimsf1 = 1:nx1+1; limsf2 = 1:nx2+1;\r\n\r\n<span style=\"color: #228B22\">% Initalize other variables<\/span>\r\nt = 1;\r\nsigmax1 = 0.5; sigmax2 = 1;\r\nsigma = t * [sigmax1^2 0; 0 sigmax2^2];\r\ninvSig = inv(sigma);\r\ndetSig = det(sigma);\r\n\r\nexpF = [1 0; 0 1];\r\nn = size (expF, 1);\r\ngausThresh = 10;\r\n\r\nsmall = 0; subs = []; vals = [];\r\n\r\n<span style=\"color: #228B22\">% Iterate through all possible initial<\/span>\r\n<span style=\"color: #228B22\">% and final positions and calculate<\/span>\r\n<span style=\"color: #228B22\">% the values of exponent and out<\/span>\r\n<span style=\"color: #228B22\">% if exponent &gt; gausThresh.<\/span>\r\n\r\n<span style=\"color: #0000FF\">for<\/span> i1 = 1:nx1+1\r\n    <span style=\"color: #0000FF\">for<\/span> i2 = 1:nx2+1\r\n        <span style=\"color: #0000FF\">for<\/span> f1 = limsf1\r\n            <span style=\"color: #0000FF\">for<\/span> f2 = limsf2\r\n\r\n                <span style=\"color: #228B22\">% Initial and final position<\/span>\r\n                xi = [x1(i1) x2(i2)]';\r\n                xf = [x1(f1) x2(f2)]';\r\n\r\n                exponent = 0.5 * (xf - expF * xi)'<span style=\"color: #0000FF\">...<\/span>\r\n                    * invSig * (xf - expF * xi);\r\n\r\n                <span style=\"color: #0000FF\">if<\/span> exponent &gt; gausThresh\r\n                    small = small + 1;\r\n                <span style=\"color: #0000FF\">else<\/span>\r\n                    out = 1 \/ (sqrt((2 * pi)^n * detSig))<span style=\"color: #0000FF\">...<\/span>\r\n                        * exp(-exponent);\r\n                    subs = [subs; i1 i2 f1 f2];\r\n                    vals = [vals; out];\r\n                <span style=\"color: #0000FF\">end<\/span>\r\n\r\n            <span style=\"color: #0000FF\">end<\/span>\r\n        <span style=\"color: #0000FF\">end<\/span>\r\n    <span style=\"color: #0000FF\">end<\/span>\r\n<span style=\"color: #0000FF\">end<\/span><\/pre><p>Here is a graphical representation of the results for <tt>nx1=nx2=100<\/tt>. Because the data is quite dense, I have magnified a portion of the total graph.  The red lines connect the initial and final\r\n      points where <tt>exponent &lt; gausThresh<\/tt> and the thickness of the line reflects the value of <tt>vals<\/tt>.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">open(<span style=\"color: #A020F0\">'results.fig'<\/span>);<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/143\/customerVectorizeSZ_01.png\"> <p>Here are the runtimes on my computer, a T60 Lenovo dual-core laptop with multithreading enabled.<\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">displayRunTimes(1)<\/pre><pre style=\"font-style:oblique\">    nx1      nx2      time\r\n    50       50       296 seconds\r\n    100      100      9826 seconds\r\n<\/pre><h3>Listen to M-Lint<a name=\"6\"><\/a><\/h3>\r\n   <p>The first and easiest step is to listen to the suggestions from <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/mlint.html\">M-Lint<\/a>, the static code analyzer that ships with core MATLAB. You can either access it via the editor or command line. In this case,\r\n      I use the command line and define a simple function <tt>displayMlint<\/tt> so that the display is compact.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">output = mlint(<span style=\"color: #A020F0\">'initial.m'<\/span>);\r\ndisplayMlint(output);<\/pre><pre style=\"font-style:oblique\">'subs' might be growing inside a\r\n loop. Consider preallocating\r\n for speed.\r\n     \r\n'vals' might be growing inside a\r\n loop. Consider preallocating\r\n for speed.\r\n     \r\n<\/pre><p>Following these suggestions, I know I need to preallocate some of the arrays.<\/p>\r\n   <p>Why?<\/p>\r\n   <p>MATLAB needs contiguous blocks of memory for each array. Therefore, by repeatedly resizing the array, I keep asking MATLAB\r\n      to look for a new contiguous blocks of memory and move the old array elements into that new block. If I preallocate the maximum\r\n      amount of space that would be required by the array, I do not have to suffer the additional cost to do this searching and\r\n      moving.\r\n   <\/p>\r\n   <p>However, since I am unsure of the final size of <tt>subs<\/tt> and <tt>vars<\/tt>, I don't know what size I should preallocate. I <i>could<\/i> make them the maximum possible size. However for the case of <tt>nx1 = nx2 = 100<\/tt>, that requires a vector of 100^4 elements, which gets me into trouble!\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\"><span style=\"color: #0000FF\">try<\/span>\r\n    zeros(1,100^4)\r\n<span style=\"color: #0000FF\">catch<\/span> ME\r\n<span style=\"color: #0000FF\">end<\/span>\r\ndisplay(ME.message)<\/pre><pre style=\"font-style:oblique\">Out of memory. Type HELP MEMORY for your options.\r\n<\/pre><p>Therefore, I need to block allocate the variable.<\/p>\r\n   <p>I do this by first preallocating a reasonable size (that doesn't create an out of memory error) for <tt>subs<\/tt> and <tt>vars<\/tt> and then increase that allocation by a reasonable amount, chunk by chunk, if <tt>subs<\/tt> and <tt>vars<\/tt> go beyond their preallocated sizes.\r\n   <\/p>\r\n   <p>This way, I only suffer the cost of resizing the array a few times instead of every time a new element is added.<\/p>\r\n   <p>I have also introduced a function, <tt>initialize<\/tt>, to do the variable initializations so that you don't have to repeatedly look at the part of the code that does not change.\r\n   <\/p>\r\n   <h3>The Code with Preallocation<a name=\"11\"><\/a><\/h3><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\"><span style=\"color: #228B22\">% Initialization<\/span>\r\nnx1 = 10;\r\nnx2 = 10;\r\n\r\n[x1,x2,limsf1,limsf2,expF,gausThresh,small,invSig,detSig,n]= <span style=\"color: #0000FF\">...<\/span>\r\n    initialize(nx1,nx2);\r\n\r\n<span style=\"color: #228B22\">% Initial guess for preallocation<\/span>\r\nmm = min((nx1+1)^2*(nx2+1)^2, 10^6);\r\nsubs = zeros(mm,4);\r\nvals = zeros(mm,1);\r\n\r\ncounter = 0;\r\n\r\n<span style=\"color: #228B22\">% Iterate through all possible initial<\/span>\r\n<span style=\"color: #228B22\">% and final positions<\/span>\r\n<span style=\"color: #0000FF\">for<\/span> i1 = 1:nx1+1\r\n    <span style=\"color: #0000FF\">for<\/span> i2 = 1:nx2+1\r\n        <span style=\"color: #0000FF\">for<\/span> f1 = limsf1\r\n            <span style=\"color: #0000FF\">for<\/span> f2 = limsf2\r\n\r\n                xi = [x1(i1) x2(i2)]'; <span style=\"color: #228B22\">%% Initial position<\/span>\r\n                xf = [x1(f1) x2(f2)]'; <span style=\"color: #228B22\">%% Final position<\/span>\r\n\r\n                exponent = 0.5 * (xf - expF * xi)'<span style=\"color: #0000FF\">...<\/span>\r\n                    * invSig * (xf - expF * xi);\r\n\r\n                <span style=\"color: #228B22\">% Increase preallocation if necessary<\/span>\r\n                <span style=\"color: #0000FF\">if<\/span> counter == length(vals)\r\n                    subs = [subs; zeros(mm, 4)];\r\n                    vals = [vals; zeros(mm, 1)];\r\n                <span style=\"color: #0000FF\">end<\/span>\r\n\r\n                <span style=\"color: #0000FF\">if<\/span> exponent &gt; gausThresh\r\n                    small = small + 1;\r\n                <span style=\"color: #0000FF\">else<\/span>\r\n                    <span style=\"color: #228B22\">% Counter introduced<\/span>\r\n                    counter=counter + 1;\r\n                    out = 1 \/ (sqrt((2 * pi)^n * detSig))<span style=\"color: #0000FF\">...<\/span>\r\n                        * exp(-exponent);\r\n                    subs(counter,:) = [i1 i2 f1 f2];\r\n                    vals(counter) = out;\r\n                <span style=\"color: #0000FF\">end<\/span>\r\n\r\n            <span style=\"color: #0000FF\">end<\/span>\r\n        <span style=\"color: #0000FF\">end<\/span>\r\n    <span style=\"color: #0000FF\">end<\/span>\r\n<span style=\"color: #0000FF\">end<\/span>\r\n\r\n<span style=\"color: #228B22\">% Remove zero components that came from preallocation<\/span>\r\nvals = vals(vals &gt; 0);\r\nsubs = subs(vals &gt; 0);<\/pre><p>The preallocation makes for slightly faster code.<\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">displayRunTimes(2)<\/pre><pre style=\"font-style:oblique\">    nx1      nx2      time\r\n    50       50       267 seconds\r\n    100      100      4228 seconds\r\n<\/pre><p>However, this isn't exactly the speedup I was looking for. So, let's try to tackle the code using vectorization.<\/p>\r\n   <h3>Vectorization<a name=\"14\"><\/a><\/h3>\r\n   <p>Because MATLAB is a array-based language, code written in terms of matrix and array operations is often faster than code which\r\n      relies on <tt>for<\/tt> loops, especially in the case of nested <tt>for<\/tt> loops. In vectorizing, matrix and vector operations substitute for the equivalent <tt>for<\/tt> loop calculations.\r\n   <\/p>\r\n   <p>There are several possibilities for vectorizing this code. I am going to explore two in this blog entry.<\/p>\r\n   <div>\r\n      <ul>\r\n         <li>Vectorize the Inner Two loops<\/li>\r\n         <li>Vectorize the Inner Three loops<\/li>\r\n      <\/ul>\r\n   <\/div>\r\n   <h3>The Code with Preallocation and Vectorization of the Inner Two Loops<a name=\"15\"><\/a><\/h3>\r\n   <p><b>Try 1: Vectorize the Inner Two loops<\/b><\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\"><span style=\"color: #228B22\">% Initialization<\/span>\r\n\r\nnx1 = 10;\r\nnx2 = 10;\r\n\r\n[x1,x2,limsf1,limsf2,expF,gausThresh,small,invSig,detSig,n]= <span style=\"color: #0000FF\">...<\/span>\r\n    initialize(nx1,nx2);\r\n\r\nvals = cell(nx1+1,nx2+1); <span style=\"color: #228B22\">% Cell preallocation<\/span>\r\nsubs = cell(nx1+1,nx2+1); <span style=\"color: #228B22\">% Cell preallocation<\/span>\r\n\r\n[xind,yind] = meshgrid(limsf1,limsf2);\r\nxyindices = [xind(:)' ; yind(:)'];\r\n\r\n[x,y] = meshgrid(x1(limsf1),x2(limsf2));\r\nxyfinal = [x(:)' ; y(:)'];\r\n\r\nexptotal = zeros(length(xyfinal),1);\r\n\r\n<span style=\"color: #228B22\">% Loop over all possible combinations of positions<\/span>\r\n<span style=\"color: #0000FF\">for<\/span> i1 = 1:nx1+1\r\n    <span style=\"color: #0000FF\">for<\/span> i2 = 1:nx2+1\r\n\r\n        xyinitial = repmat([x1(i1);x2(i2)],1,length(xyfinal));\r\n\r\n        expa = 0.5 * (xyfinal - expF * xyinitial);\r\n        expb = invSig * (xyfinal - expF * xyinitial);\r\n        exptotal(:,1) = expa(1,:).*expb(1,:)+expa(2,:).*expb(2,:);\r\n\r\n        index = find(exptotal &lt; gausThresh);\r\n        expreduced = exptotal(exptotal &lt; gausThresh);\r\n\r\n        out = 1 \/ (sqrt((2 * pi)^n * detSig)) * exp(-(expreduced));\r\n        vals{i1,i2} = out;\r\n        subs{i1,i2} = [i1*ones(1,length(index)) ; <span style=\"color: #0000FF\">...<\/span>\r\n            i2*ones(1,length(index)); xyindices(1,index); <span style=\"color: #0000FF\">...<\/span>\r\n            xyindices(2,index)]' ;\r\n\r\n    <span style=\"color: #0000FF\">end<\/span>\r\n<span style=\"color: #0000FF\">end<\/span>\r\n\r\n<span style=\"color: #228B22\">% Reshape and convert output so it is in a<\/span>\r\n<span style=\"color: #228B22\">% simple matrix format<\/span>\r\nvals = cell2mat(vals(:));\r\nsubs = cell2mat(subs(:));\r\n\r\nsmall = ((nx1+1)^2*(nx2+1)^2)-length(subs);<\/pre><p>Wow! With just this little bit of vectorization, the run time decreased tremendously.<\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">displayRunTimes(3)<\/pre><pre style=\"font-style:oblique\">    nx1      nx2      time\r\n    50       50       1.51 seconds\r\n    100      100      19.28 seconds\r\n<\/pre><p>Let's go over the main techniques I've illustrated here.<\/p>\r\n   <div>\r\n      <ul>\r\n         <li>Use of <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/meshgrid.html\"><tt>meshgrid<\/tt><\/a> and <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/repmat.html\"><tt>repmat<\/tt><\/a> to create matrices\r\n         <\/li>\r\n         <li>Use of vector and matrix operations<\/li>\r\n         <li>Use of element-wise operations when appropriate<\/li>\r\n      <\/ul>\r\n   <\/div>\r\n   <p><tt>meshgrid<\/tt> produces two vectors of the <tt>x<\/tt> and <tt>y<\/tt> positions for all possible combinations of the <tt>x<\/tt> and <tt>y<\/tt> input vectors. In other words, given the vectors <tt>x<\/tt> and <tt>y<\/tt>, <tt>meshgrid<\/tt> gives all possible positions in a mesh or grid defined on a side by those <tt>x<\/tt> and <tt>y<\/tt> input vectors. The n-dimensional version of <tt>meshgrid<\/tt> is <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/ndgrid.html\"><tt>ndgrid<\/tt><\/a>, which I use it in the next example.\r\n   <\/p>\r\n   <p>The function <tt>repmat<\/tt> replicates and tiles arrays.\r\n   <\/p>\r\n   <p>Generally I use matrices as part of the vectorization, so whenever possible I use vector and matrix operations.<\/p>\r\n   <p>However, in one segment of the code it is actually faster to explicitly write out a vector-vector multiply element by element.\r\n      This is somewhat unintuitive, so let's discuss this a bit more in detail.\r\n   <\/p>\r\n   <p>To calculate <tt>exponent<\/tt> in the original code, there is a series of vector and matrix multiplies with the original and final positions, each represented\r\n      by two-element vectors.  The values of the initial and final position are then changed every step through the loop.\r\n   <\/p>\r\n   <p>When the code is vectorized, all the initial positions and final positions are now contained in two matrices <tt>xyinitial<\/tt> and <tt>xyfinal<\/tt>. Therefore, to calculate the value of <tt>exponent<\/tt>, only the diagonal terms of the multiplication of <tt>expa<\/tt> and <tt>expb<\/tt> are relevant. However, in this case, taking the diagonal of a matrix multiply turns out to be a rather expensive operation.\r\n      Instead I choose to write it as an element-by-element operation.\r\n   <\/p>\r\n   <p>I have only shown you a few functions that are useful for vectorization in this post. <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/matlab_prog\/vectorization.html\">Here<\/a> is a more extensive list of useful functions for vectorization.\r\n   <\/p>\r\n   <h3>The Code with Preallocation and Vectorization of the Inner Three Loops<a name=\"18\"><\/a><\/h3>\r\n   <p><b>Try 2: Vectorize the Inner Three Loops<\/b><\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\"><span style=\"color: #228B22\">% Initialization<\/span>\r\nnx1 = 10;\r\nnx2 = 10;\r\n\r\n[x1,x2,limsf1,limsf2,expF,gausThresh,small,invSig,detSig,n]= <span style=\"color: #0000FF\">...<\/span>\r\n    initialize(nx1,nx2);\r\n\r\nlimsi1 = limsf1;\r\nlimsi2 = limsf2;\r\n\r\n<span style=\"color: #228B22\">% ndgrid gives a matrix of all the possible combinations<\/span>\r\n[aind,bind,cind] = ndgrid(limsi2,limsf1,limsf2);\r\n[a,b,c] = ndgrid(x2,x1,x2);\r\n\r\nvals = cell(nx1+1,nx2+1);  <span style=\"color: #228B22\">% Cell preallocation<\/span>\r\nsubs = cell(nx1+1,nx2+1);  <span style=\"color: #228B22\">% Cell preallocation<\/span>\r\n\r\n<span style=\"color: #228B22\">% Convert grids to single vector to use in a single loop<\/span>\r\nb = b(:); aind = aind(:); bind = bind(:); cind = cind(:);\r\n\r\nexpac = a(:)-c(:); <span style=\"color: #228B22\">% Calculate x2-x1<\/span>\r\n\r\n<span style=\"color: #228B22\">% Iterate through initial x1 positions (i1)<\/span>\r\n<span style=\"color: #0000FF\">for<\/span> i1 = limsi1\r\n\r\n    exbx1= b-x1(i1);\r\n    expaux = invSig(2)*exbx1.*expac;\r\n    exponent = 0.5*(invSig(1)*exbx1.*exbx1+expaux);\r\n\r\n    index = find(exponent &lt; gausThresh);\r\n    expreduced = exponent(exponent &lt; gausThresh);\r\n\r\n    vals{i1} = 1 \/ (sqrt((2 * pi)^n * detSig))<span style=\"color: #0000FF\">...<\/span>\r\n        .*exp(-expreduced);\r\n\r\n    subs{i1} = [i1*ones(1,length(index));\r\n        aind(index)' ; bind(index)';<span style=\"color: #0000FF\">...<\/span>\r\n        cind(index)']';\r\n\r\n<span style=\"color: #0000FF\">end<\/span>\r\n\r\nvals = cell2mat(vals(:));\r\nsubs = cell2mat(subs(:));\r\n\r\nsmall = ((nx1+1)^2*(nx2+1)^2)-length(subs);<\/pre><p>Now the code runs in even less time.<\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">displayRunTimes(4)<\/pre><pre style=\"font-style:oblique\">    nx1      nx2      time\r\n    50       50       0.658 seconds\r\n    100      100      8.77 seconds\r\n<\/pre><p>Why don't I try to vectorize all the loops?<\/p>\r\n   <p>I certainly could. In this case, I didn't see much of a benefit with total vectorization, and the code became a bit harder\r\n      to read. So, I am not going to show the fully vectorized code. Considering performance and readability of the code, try 2\r\n      seems to be the best solution.\r\n   <\/p>\r\n   <p>Once the code is vectorized, you can easily see that I repeatedly calculate values that only need to be calculated once. Also,\r\n      assuming some of the vectors keep the same form (such as <tt>sigma<\/tt>), other calculations (e.g., <tt>InvSig<\/tt>) are unnecessary.\r\n   <\/p>\r\n   <p>Some of these changes are quite easy to do, and some are a bit more advanced, but I wanted to finish this entry with the fastest\r\n      version of the code from the application engineering participants.\r\n   <\/p>\r\n   <h3>The Code with Preallocation, Vectorization and Only Calculating Once<a name=\"22\"><\/a><\/h3><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">nx1 =  100; nx2 = 100;\r\nx1l =   0; x1u = 100;\r\nx2l =   0; x2u = 100;\r\n\r\nx1 = linspace(x1l,x1u,nx1+1);\r\nx2 = linspace(x2l,x2u,nx2+1);\r\n\r\nlimsi1 = 1:nx1+1;\r\nlimsi2 = 1:nx2+1;\r\n\r\nlimsf1 = 1:nx1+1;\r\nlimsf2 = 1:nx2+1;\r\n\r\nt = 1;\r\n\r\nsigmax1 = 0.5;\r\nsigmax2 = 1;\r\n\r\nsigma = t * [sigmax1^2 sigmax2^2];\r\ndetSig = sigma(1)*sigma(2);\r\ninvSig = [1\/sigma(1) 1\/sigma(2)];\r\n\r\ngausThresh = 10;\r\nn=3;\r\nconst=1 \/ (sqrt((2 * pi)^n * detSig));\r\n\r\n<span style=\"color: #228B22\">% ndgrid gives a matrix of all the possible combinations<\/span>\r\n<span style=\"color: #228B22\">% of position, except limsi1 which we iterate over<\/span>\r\n\r\n[aind,bind,cind] = ndgrid(limsi2,limsf1,limsf2);\r\n[a,b,c] = ndgrid(x2,x1,x2);\r\n\r\nvals = cell(nx1+1,nx2+1);  <span style=\"color: #228B22\">% Cell preallocation<\/span>\r\nsubs = cell(nx1+1,nx2+1);  <span style=\"color: #228B22\">% Cell preallocation<\/span>\r\n\r\n<span style=\"color: #228B22\">% Convert grids to single vector to<\/span>\r\n<span style=\"color: #228B22\">% use in a single for-loop<\/span>\r\nb = b(:);\r\naind = aind(:);\r\nbind = bind(:);\r\ncind = cind(:);\r\n\r\nexpac= a(:)-c(:);\r\nexpaux = invSig(2)*expac.*expac;\r\n\r\n<span style=\"color: #228B22\">% Iterate through initial x1 positions<\/span>\r\n\r\n<span style=\"color: #0000FF\">for<\/span> i1 = limsi1\r\n\r\n    expbx1= b-x1(i1);\r\n    exponent = 0.5*(invSig(1)*expbx1.*expbx1+expaux);\r\n\r\n    <span style=\"color: #228B22\">% Find indices where exponent &lt; gausThresh<\/span>\r\n    index = find(exponent &lt; gausThresh);\r\n\r\n    <span style=\"color: #228B22\">% Find and keep values where exp &lt; gausThresh<\/span>\r\n\r\n    expreduced = exponent(exponent &lt; gausThresh);\r\n\r\n    vals{i1} = const.*exp(-expreduced);\r\n\r\n    subs{i1} = [i1*ones(1,length(index));\r\n        aind(index)' ; bind(index)';<span style=\"color: #0000FF\">...<\/span>\r\n        cind(index)']';\r\n<span style=\"color: #0000FF\">end<\/span>\r\n\r\nvals = cell2mat(vals(:));\r\nsubs = cell2mat(subs(:));\r\n\r\nsmall = ((nx1+1)^2*(nx2+1)^2)-length(subs);<\/pre><p>With this, here are the magical times.<\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">displayRunTimes(5)<\/pre><pre style=\"font-style:oblique\">    nx1      nx2      time\r\n    50       50       0.568 seconds\r\n    100      100      8.36 seconds\r\n<\/pre><p>Pretty good speedup. In fact, you see the run-time reduced to about 2% of the original times.<\/p>\r\n   <h3>Additional Resources<a name=\"25\"><\/a><\/h3>\r\n   <p>In this post, I focused on preallocation, vectorization, and only doing calculations a single time. However, there are also\r\n      additional techniques to help you speed up your MATLAB code such as using the profiler to diagnose bottlenecks, using <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/mex.html\">MEX-files<\/a>, using <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/bsxfun.html\"><tt>bsxfun<\/tt><\/a>, and enabling multithreading. The documentation for MATLAB has a very helpful section entitled Performance. I encourage you to go there to read more about these and other relevant topics.\r\n   <\/p>\r\n   <h3>Your Examples<a name=\"26\"><\/a><\/h3>\r\n   <p>Tell me about some of the ways you speed up MATLAB applications or feel free to post a example of a technique I didn't illustrate\r\n      <a href=\"https:\/\/blogs.mathworks.com\/loren\/?p=143#respond\">here<\/a>.\r\n   <\/p><script language=\"JavaScript\">\r\n<!--\r\n\r\n    function grabCode_f43173d740924cdc90adb76763a404d9() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='f43173d740924cdc90adb76763a404d9 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' f43173d740924cdc90adb76763a404d9';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        author = 'Loren Shure';\r\n        copyright = 'Copyright 2008 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add author and copyright lines at the bottom if specified.\r\n        if ((author.length > 0) || (copyright.length > 0)) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (author.length > 0) {\r\n                d.writeln('% _' + author + '_');\r\n            }\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n      \r\n      d.title = title + ' (MATLAB code)';\r\n      d.close();\r\n      }   \r\n      \r\n-->\r\n<\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_f43173d740924cdc90adb76763a404d9()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n            the MATLAB code \r\n            <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; 7.6<br><\/p>\r\n<\/div>\r\n<!--\r\nf43173d740924cdc90adb76763a404d9 ##### SOURCE BEGIN #####\r\n%% Speeding Up MATLAB Applications\r\n% Today I\u00e2\u20ac\u2122d like to introduce a guest blogger, \r\n% <mailto:sarah.zaranek@mathworks.com Sarah Wait Zaranek>, who is an application engineer here\r\n% at The MathWorks. Today she talks about speeding up code from a customer\r\n% to get acceptable performance.\r\n%% Introduction\r\n% A customer ran into slow performance issues with her code in\r\n% MATLAB. She saw such slow performance in that she decided to\r\n% recode her algorithm in another language. I wanted to show her some\r\n% simple techniques in MATLAB that could bring her code down to a more\r\n% reasonable running time. \r\n%\r\n% I enlisted the help of my fellow application engineers to suggest\r\n% possible changes. Many of them chimed in with some really great\r\n% suggestions. I have integrated most of our thoughts into a few code\r\n% versions to share with you. These versions illustrate our thought\r\n% processes as we worked through the code collaboratively to improve its\r\n% speed. \r\n%\r\n% In this post, I focus on \r\n%\r\n% * preallocation\r\n% * vectorization\r\n% * elimination of repeated calculations\r\n%% What Does This Code Do?\r\n% For security reasons, the user couldn't give me her exact code. So the\r\n% code we will use in this blog serves as an approximation of the code she\r\n% is using. \r\n%\r\n% The code generates locations on a 2D grid with dimensions |nx1| by |nx2|.\r\n% These grid locations serve as values for initial and final positions used\r\n% by the main part of the calculation. The code iterates through all\r\n% possible combinations of these initial and final positions.\r\n%\r\n% Given an initial and final position, the code calculates the quantity \r\n% |exponent| . If |exponent| is less than a particular threshold\r\n% value, |gausThresh|, |exponent| is then used to calculate |out|.\r\n% |subs| contains the indices corresponding to the initial and final\r\n% positions used to calculate |out|.\r\n%\r\n% For some of you, this probably is better understood just by looking at the\r\n% code itself. \r\n%% The Initial Code\r\n\r\n% Initialize the grid and initial and final points\r\nnx1 = 10; nx2 =  10; \r\nx1l =  0; x1u = 100; \r\nx2l =  0; x2u = 100;\r\n\r\nx1 = linspace(x1l,x1u,nx1+1);\r\nx2 = linspace(x2l,x2u,nx2+1);\r\n\r\nlimsf1 = 1:nx1+1; limsf2 = 1:nx2+1;\r\n\r\n% Initalize other variables\r\nt = 1;\r\nsigmax1 = 0.5; sigmax2 = 1;\r\nsigma = t * [sigmax1^2 0; 0 sigmax2^2];\r\ninvSig = inv(sigma);\r\ndetSig = det(sigma);\r\n\r\nexpF = [1 0; 0 1];\r\nn = size (expF, 1);\r\ngausThresh = 10;\r\n\r\nsmall = 0; subs = []; vals = [];\r\n\r\n% Iterate through all possible initial \r\n% and final positions and calculate\r\n% the values of exponent and out\r\n% if exponent > gausThresh.\r\n\r\nfor i1 = 1:nx1+1\r\n    for i2 = 1:nx2+1\r\n        for f1 = limsf1\r\n            for f2 = limsf2\r\n                \r\n                % Initial and final position\r\n                xi = [x1(i1) x2(i2)]'; \r\n                xf = [x1(f1) x2(f2)]'; \r\n                \r\n                exponent = 0.5 * (xf - expF * xi)'...\r\n                    * invSig * (xf - expF * xi);\r\n\r\n                if exponent > gausThresh\r\n                    small = small + 1;\r\n                else\r\n                    out = 1 \/ (sqrt((2 * pi)^n * detSig))...\r\n                        * exp(-exponent);\r\n                    subs = [subs; i1 i2 f1 f2];\r\n                    vals = [vals; out];\r\n                end\r\n\r\n            end\r\n        end\r\n    end\r\nend\r\n\r\n%%\r\n% Here is a graphical representation of the results for |nx1=nx2=100|.\r\n% Because the data is quite dense, I have magnified a portion of the total\r\n% graph.  The red lines connect the initial and final points where\r\n% |exponent < gausThresh| and the thickness of the line reflects the value\r\n% of |vals|. \r\n%\r\nopen('results.fig');\r\n%% \r\n% Here are the runtimes on my computer, a T60 Lenovo dual-core laptop with\r\n% multithreading enabled. \r\n\r\ndisplayRunTimes(1)\r\n\r\n%% Listen to M-Lint \r\n% The first and easiest step is to listen to the suggestions from \r\n% <https:\/\/www.mathworks.com\/help\/matlab\/ref\/mlint.html M-Lint>, \r\n% the static code analyzer that ships with core MATLAB. You can either\r\n% access it via the editor or command line. In this case, I use the command\r\n% line and define a simple function |displayMlint| so that the display is\r\n% compact.\r\n\r\n%%\r\noutput = mlint('initial.m');\r\ndisplayMlint(output);\r\n%%\r\n% Following these suggestions, I know I need to preallocate some of the\r\n% arrays. \r\n%\r\n% Why? \r\n% \r\n% MATLAB needs contiguous blocks of memory for each array. Therefore, by\r\n% repeatedly resizing the array, I keep asking MATLAB to look for a new\r\n% contiguous blocks of memory and move the old array elements into that\r\n% new block. If I preallocate the maximum amount of space that would be\r\n% required by the array, I do not have to suffer the additional cost\r\n% to do this searching and moving. \r\n%\r\n% However, since I am unsure of the final size of |subs| and |vars|, I\r\n% don't know what size I should preallocate. I _could_ make them the\r\n% maximum possible size. However for the case of |nx1 = nx2 = 100|, that\r\n% requires a vector of 100^4 elements, which gets me into trouble!  \r\n%%\r\ntry\r\n    zeros(1,100^4)\r\ncatch ME\r\nend\r\ndisplay(ME.message)\r\n%%\r\n% Therefore, I need to block allocate the variable. \r\n%\r\n% I do this by first preallocating a reasonable size (that doesn't create\r\n% an out of memory error) for |subs| and |vars| and then increase that\r\n% allocation by a reasonable amount, chunk by chunk, if |subs| and |vars|\r\n% go beyond their preallocated sizes. \r\n%\r\n% This way, I only suffer the cost of resizing the array a few times\r\n% instead of every time a new element is added. \r\n%\r\n% I have also introduced a function, |initialize|, to do the variable\r\n% initializations so that you don't have to repeatedly look at the part of\r\n% the code that does not change. \r\n\r\n%% The Code with Preallocation\r\n\r\n% Initialization\r\nnx1 = 10;\r\nnx2 = 10;\r\n\r\n[x1,x2,limsf1,limsf2,expF,gausThresh,small,invSig,detSig,n]= ...\r\n    initialize(nx1,nx2);  \r\n\r\n% Initial guess for preallocation\r\nmm = min((nx1+1)^2*(nx2+1)^2, 10^6);\r\nsubs = zeros(mm,4);\r\nvals = zeros(mm,1);\r\n\r\ncounter = 0;\r\n\r\n% Iterate through all possible initial \r\n% and final positions \r\nfor i1 = 1:nx1+1\r\n    for i2 = 1:nx2+1\r\n        for f1 = limsf1\r\n            for f2 = limsf2\r\n\r\n                xi = [x1(i1) x2(i2)]'; %% Initial position\r\n                xf = [x1(f1) x2(f2)]'; %% Final position\r\n\r\n                exponent = 0.5 * (xf - expF * xi)'...\r\n                    * invSig * (xf - expF * xi);\r\n\r\n                % Increase preallocation if necessary\r\n                if counter == length(vals)\r\n                    subs = [subs; zeros(mm, 4)]; \r\n                    vals = [vals; zeros(mm, 1)]; \r\n                end\r\n                \r\n                if exponent > gausThresh\r\n                    small = small + 1;\r\n                else\r\n                    % Counter introduced\r\n                    counter=counter + 1; \r\n                    out = 1 \/ (sqrt((2 * pi)^n * detSig))...\r\n                        * exp(-exponent);\r\n                    subs(counter,:) = [i1 i2 f1 f2];\r\n                    vals(counter) = out;\r\n                end\r\n\r\n            end\r\n        end\r\n    end\r\nend\r\n\r\n% Remove zero components that came from preallocation\r\nvals = vals(vals > 0); \r\nsubs = subs(vals > 0); \r\n%%\r\n% The preallocation makes for slightly faster code.\r\ndisplayRunTimes(2)\r\n%%\r\n% However, this isn't exactly the speedup I was looking for. So, let's \r\n% try to tackle the code using vectorization.\r\n%\r\n%% Vectorization\r\n% Because MATLAB is a array-based language, code written in terms of matrix\r\n% and array operations is often faster than code which relies on\r\n% |for| loops, especially in the case of nested |for| loops. In vectorizing, \r\n% matrix and vector operations substitute for the equivalent |for| loop \r\n% calculations.\r\n%\r\n% There are several possibilities for vectorizing this code. I \r\n% am going to explore two in this blog entry. \r\n% \r\n% * Vectorize the Inner Two loops\r\n% * Vectorize the Inner Three loops\r\n% \r\n%% The Code with Preallocation and Vectorization of the Inner Two Loops\r\n% *Try 1: Vectorize the Inner Two loops*\r\n\r\n% Initialization\r\n\r\nnx1 = 10;\r\nnx2 = 10;\r\n\r\n[x1,x2,limsf1,limsf2,expF,gausThresh,small,invSig,detSig,n]= ...\r\n    initialize(nx1,nx2);\r\n\r\nvals = cell(nx1+1,nx2+1); % Cell preallocation\r\nsubs = cell(nx1+1,nx2+1); % Cell preallocation\r\n\r\n[xind,yind] = meshgrid(limsf1,limsf2);\r\nxyindices = [xind(:)' ; yind(:)'];\r\n\r\n[x,y] = meshgrid(x1(limsf1),x2(limsf2));\r\nxyfinal = [x(:)' ; y(:)'];\r\n\r\nexptotal = zeros(length(xyfinal),1);\r\n\r\n% Loop over all possible combinations of positions\r\nfor i1 = 1:nx1+1\r\n    for i2 = 1:nx2+1\r\n\r\n        xyinitial = repmat([x1(i1);x2(i2)],1,length(xyfinal));\r\n\r\n        expa = 0.5 * (xyfinal - expF * xyinitial);\r\n        expb = invSig * (xyfinal - expF * xyinitial);\r\n        exptotal(:,1) = expa(1,:).*expb(1,:)+expa(2,:).*expb(2,:);\r\n\r\n        index = find(exptotal < gausThresh);\r\n        expreduced = exptotal(exptotal < gausThresh);\r\n\r\n        out = 1 \/ (sqrt((2 * pi)^n * detSig)) * exp(-(expreduced));\r\n        vals{i1,i2} = out;\r\n        subs{i1,i2} = [i1*ones(1,length(index)) ; ...\r\n            i2*ones(1,length(index)); xyindices(1,index); ...\r\n            xyindices(2,index)]' ;\r\n\r\n    end\r\nend\r\n\r\n% Reshape and convert output so it is in a \r\n% simple matrix format\r\nvals = cell2mat(vals(:));\r\nsubs = cell2mat(subs(:));\r\n\r\nsmall = ((nx1+1)^2*(nx2+1)^2)-length(subs);\r\n\r\n%% \r\n% Wow! With just this little bit of vectorization, the run time decreased\r\n% tremendously.\r\ndisplayRunTimes(3)\r\n%%\r\n% \r\n% Let's go over the main techniques I've illustrated here. \r\n% \r\n% * Use of <https:\/\/www.mathworks.com\/help\/matlab\/ref\/meshgrid.html |meshgrid|>\r\n% and <https:\/\/www.mathworks.com\/help\/matlab\/ref\/repmat.html |repmat|> \r\n% to create matrices\r\n% * Use of vector and matrix operations \r\n% * Use of element-wise operations when appropriate\r\n%\r\n% |meshgrid| produces two vectors of the |x| and |y| positions for all\r\n% possible combinations of the |x| and |y| input vectors. In other words,\r\n% given the vectors |x| and |y|, |meshgrid| gives all possible positions in\r\n% a mesh or grid defined on a side by those |x| and |y| input vectors. The\r\n% n-dimensional version of |meshgrid| is \r\n% <https:\/\/www.mathworks.com\/help\/matlab\/ref\/ndgrid.html |ndgrid|>,\r\n% which I use it in the next example. \r\n%\r\n% The function |repmat| replicates and tiles arrays.\r\n% \r\n% Generally I use matrices as part of the vectorization, so whenever\r\n% possible I use vector and matrix operations. \r\n% \r\n% However, in one segment of the code it is actually faster to explicitly\r\n% write out a vector-vector multiply element by element. This is somewhat \r\n% unintuitive, so let's discuss this a bit more in detail. \r\n%\r\n% To calculate |exponent| in the original code, there is a series\r\n% of vector and matrix multiplies with the original and final positions,\r\n% each represented by two-element vectors.  The values of the initial and\r\n% final position are then changed every step through the loop. \r\n%\r\n% When the code is vectorized, all the initial positions and final\r\n% positions are now contained in two matrices |xyinitial| and |xyfinal|.\r\n% Therefore, to calculate the value of |exponent|, only the diagonal terms\r\n% of the multiplication of |expa| and |expb| are relevant. However, in this\r\n% case, taking the diagonal of a matrix multiply turns out to be a rather\r\n% expensive operation. Instead I choose to write it as an\r\n% element-by-element operation. \r\n%\r\n% I have only shown you a few functions that are useful for vectorization in this post. \r\n% <https:\/\/www.mathworks.com\/access\/helpdesk\/help\/techdoc\/matlab_prog\/f8-784135.html#f8-783589 Here> \r\n% is a more extensive list of useful functions for vectorization.\r\n\r\n%% The Code with Preallocation and Vectorization of the Inner Three Loops\r\n% *Try 2: Vectorize the Inner Three Loops*\r\n\r\n% Initialization\r\nnx1 = 10;\r\nnx2 = 10;\r\n\r\n[x1,x2,limsf1,limsf2,expF,gausThresh,small,invSig,detSig,n]= ...\r\n    initialize(nx1,nx2);\r\n\r\nlimsi1 = limsf1;\r\nlimsi2 = limsf2;\r\n\r\n% ndgrid gives a matrix of all the possible combinations\r\n[aind,bind,cind] = ndgrid(limsi2,limsf1,limsf2);\r\n[a,b,c] = ndgrid(x2,x1,x2);\r\n\r\nvals = cell(nx1+1,nx2+1);  % Cell preallocation\r\nsubs = cell(nx1+1,nx2+1);  % Cell preallocation\r\n\r\n% Convert grids to single vector to use in a single loop\r\nb = b(:); aind = aind(:); bind = bind(:); cind = cind(:);\r\n\r\nexpac = a(:)-c(:); % Calculate x2-x1 \r\n\r\n% Iterate through initial x1 positions (i1)\r\nfor i1 = limsi1\r\n    \r\n    exbx1= b-x1(i1);\r\n    expaux = invSig(2)*exbx1.*expac;\r\n    exponent = 0.5*(invSig(1)*exbx1.*exbx1+expaux);\r\n\r\n    index = find(exponent < gausThresh);\r\n    expreduced = exponent(exponent < gausThresh);\r\n\r\n    vals{i1} = 1 \/ (sqrt((2 * pi)^n * detSig))...\r\n        .*exp(-expreduced);\r\n    \r\n    subs{i1} = [i1*ones(1,length(index));\r\n        aind(index)' ; bind(index)';...\r\n        cind(index)']';\r\n    \r\nend\r\n\r\nvals = cell2mat(vals(:));\r\nsubs = cell2mat(subs(:));\r\n\r\nsmall = ((nx1+1)^2*(nx2+1)^2)-length(subs);\r\n%% \r\n% Now the code runs in even less time.\r\ndisplayRunTimes(4)\r\n%%\r\n%\r\n% Why don't I try to vectorize all the loops?\r\n%\r\n% I certainly could. In this case, I didn't see much of a benefit with\r\n% total vectorization, and the code became a bit harder to read. So, I am\r\n% not going to show the fully vectorized code. Considering performance and\r\n% readability of the code, try 2 seems to be the best solution. \r\n%\r\n%%\r\n% Once the code is vectorized, you can easily see that I repeatedly\r\n% calculate values that only need to be calculated once. Also, assuming\r\n% some of the vectors keep the same form (such as |sigma|), other \r\n% calculations (e.g., |InvSig|) are unnecessary.\r\n% \r\n% Some of these changes are quite easy to do, and some are a bit more \r\n% advanced, but I wanted to finish this entry with the fastest version of\r\n% the code from the application engineering participants.  \r\n%\r\n%% The Code with Preallocation, Vectorization and Only Calculating Once\r\nnx1 =  100; nx2 = 100;\r\nx1l =   0; x1u = 100;\r\nx2l =   0; x2u = 100;\r\n\r\nx1 = linspace(x1l,x1u,nx1+1);\r\nx2 = linspace(x2l,x2u,nx2+1);\r\n\r\nlimsi1 = 1:nx1+1;\r\nlimsi2 = 1:nx2+1;\r\n\r\nlimsf1 = 1:nx1+1;\r\nlimsf2 = 1:nx2+1;\r\n\r\nt = 1;\r\nsigmax1 = 0.5;\r\nsigmax2 = 1;\r\n\r\nsigma = t * [sigmax1^2 sigmax2^2];\r\ndetSig = sigma(1)*sigma(2);\r\ninvSig = [1\/sigma(1) 1\/sigma(2)];\r\n\r\ngausThresh = 10;\r\nn=3;\r\nconst=1 \/ (sqrt((2 * pi)^n * detSig));\r\n\r\n% ndgrid gives a matrix of all the possible combinations\r\n% of position, except limsi1 which we iterate over\r\n\r\n[aind,bind,cind] = ndgrid(limsi2,limsf1,limsf2);\r\n[a,b,c] = ndgrid(x2,x1,x2);\r\n\r\nvals = cell(nx1+1,nx2+1);  % Cell preallocation\r\nsubs = cell(nx1+1,nx2+1);  % Cell preallocation\r\n\r\n% Convert grids to single vector to \r\n% use in a single for-loop\r\nb = b(:);\r\naind = aind(:);\r\nbind = bind(:);\r\ncind = cind(:);\r\n\r\nexpac= a(:)-c(:);\r\nexpaux = invSig(2)*expac.*expac;\r\n\r\n% Iterate through initial x1 positions \r\n\r\nfor i1 = limsi1\r\n    \r\n    expbx1= b-x1(i1); \r\n    exponent = 0.5*(invSig(1)*expbx1.*expbx1+expaux);\r\n    \r\n    % Find indices where exponent < gausThresh\r\n    index = find(exponent < gausThresh);\r\n    \r\n    % Find and keep values where exp < gausThresh\r\n\r\n    expreduced = exponent(exponent < gausThresh);\r\n    \r\n    vals{i1} = const.*exp(-expreduced);\r\n    \r\n    subs{i1} = [i1*ones(1,length(index));\r\n        aind(index)' ; bind(index)';...\r\n        cind(index)']';\r\nend\r\n\r\nvals = cell2mat(vals(:));\r\nsubs = cell2mat(subs(:));\r\n\r\nsmall = ((nx1+1)^2*(nx2+1)^2)-length(subs);\r\n\r\n%%\r\n% With this, here are the magical times. \r\ndisplayRunTimes(5)\r\n%%\r\n% Pretty good speedup. In fact, you see the run-time reduced to about\r\n% 2% of the original times. \r\n%\r\n%% Additional Resources\r\n% In this post, I focused on preallocation, vectorization, and only doing\r\n% calculations a single time. However, there are also additional techniques\r\n% to help you speed up your MATLAB code such as using the \r\n% <https:\/\/www.mathworks.com\/access\/helpdesk\/help\/techdoc\/learn_matlab\/f1-26972.html#f1-30972 profiler> to \r\n% diagnose bottlenecks, using <https:\/\/www.mathworks.com\/help\/matlab\/ref\/mex.html MEX-files>, using\r\n% <https:\/\/www.mathworks.com\/help\/matlab\/ref\/bsxfun.html |bsxfun|>, and enabling\r\n% multithreading. The documentation for MATLAB has a very helpful section\r\n% entitled\r\n% <https:\/\/www.mathworks.com\/access\/helpdesk\/help\/techdoc\/matlab_prog\/f8-705906.html Performance.>\r\n% I encourage you to go there to read more about these and other relevant topics. \r\n% \r\n%% Your Examples\r\n% Tell me about some of the ways you speed up MATLAB applications or feel\r\n% free to post a example of a technique I didn't illustrate\r\n% <https:\/\/blogs.mathworks.com\/loren\/?p=143#respond here>. \r\n\r\n\r\n\r\n\r\n##### SOURCE END ##### f43173d740924cdc90adb76763a404d9\r\n-->","protected":false},"excerpt":{"rendered":"<p>\r\n   \r\n      Today I&#8217;d like to introduce a guest blogger, Sarah Wait Zaranek, who is an application engineer here at The MathWorks. Today she talks about speeding up code from a customer to get... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/loren\/2008\/06\/25\/speeding-up-matlab-applications\/\">read more >><\/a><\/p>","protected":false},"author":39,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[10,12],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/143"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/comments?post=143"}],"version-history":[{"count":5,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/143\/revisions"}],"predecessor-version":[{"id":2343,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/143\/revisions\/2343"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/media?parent=143"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/categories?post=143"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/tags?post=143"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}