{"id":688,"date":"2013-05-04T09:47:31","date_gmt":"2013-05-04T14:47:31","guid":{"rendered":"https:\/\/blogs.mathworks.com\/loren\/?p=688"},"modified":"2013-05-04T09:47:31","modified_gmt":"2013-05-04T14:47:31","slug":"recent-question-about-speed-with-subarray-calculations","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/loren\/2013\/05\/04\/recent-question-about-speed-with-subarray-calculations\/","title":{"rendered":"Recent Question about Speed with Subarray Calculations"},"content":{"rendered":"\r\n<div class=\"content\"><!--introduction--><p>Recently someone asked me to explain the speed behavior doing a calculation using a loop and array indexing vs. getting the subarray first.<\/p><!--\/introduction--><h3>Contents<\/h3><div><ul><li><a href=\"#993880b3-5843-4778-bc94-32d83ac85cac\">Example<\/a><\/li><li><a href=\"#6af5b04d-a4a2-4d87-87e3-80b5f5e558e1\">First Method<\/a><\/li><li><a href=\"#268c5a14-73fe-439d-8a70-9c107435afbe\">Second Method<\/a><\/li><li><a href=\"#9ccb7a79-b332-4e32-abe0-81a3ed286deb\">Same Results?<\/a><\/li><li><a href=\"#6ae5ef22-d2db-4d73-9f37-b3d80d85f82d\">Compare Runtime<\/a><\/li><li><a href=\"#aea57373-3b4c-4692-ad92-ac8c6dfe920c\">What's Happening?<\/a><\/li><li><a href=\"#b2f25522-87bc-4cc4-a85d-00bc6f60d4fc\">Your Results?<\/a><\/li><\/ul><\/div><h4>Example<a name=\"993880b3-5843-4778-bc94-32d83ac85cac\"><\/a><\/h4><p>Suppose I have a function of two inputs, the first input being the column (of a square array), the second, a scalar, and the output, a vector.<\/p><pre class=\"codeinput\">myfun = @(x,z) x'*x+z;\r\n<\/pre><p>And even though this may be calculated in a fully vectorized manner, let's explore what happens when we work on subarrays from the array input.<\/p><p>I am now creating the input array <tt>x<\/tt> and the results output arrays for doing the calculation two ways, with an additional intermediate step in one of the methods.<\/p><pre class=\"codeinput\">n = 500;\r\nx = randn(n,n);\r\nresult1 = zeros(n,1);\r\nresult2 = zeros(n,1);\r\n<\/pre><h4>First Method<a name=\"6af5b04d-a4a2-4d87-87e3-80b5f5e558e1\"><\/a><\/h4><p>Here we see and time the first method.  In this one, we create a temporary array for <tt>x(:,k)<\/tt> <tt>n<\/tt> times through the outer loop.<\/p><pre class=\"codeinput\">tic\r\n<span class=\"keyword\">for<\/span> k = 1:n\r\n    <span class=\"keyword\">for<\/span> z = 1:n\r\n        result1(z) = myfun(x(:,k), z);\r\n    <span class=\"keyword\">end<\/span>\r\n    result1 = result1+x(:,k);\r\n<span class=\"keyword\">end<\/span>\r\nruntime(1) = toc;\r\n<\/pre><h4>Second Method<a name=\"268c5a14-73fe-439d-8a70-9c107435afbe\"><\/a><\/h4><p>In this method, we extract the column of interest first in the outer loop, and reuse that temporary array each time through the inner loop. Again we see and time the results.<\/p><pre class=\"codeinput\">tic\r\n<span class=\"keyword\">for<\/span> k = 1:n\r\n    xt = x(:,k);\r\n    <span class=\"keyword\">for<\/span> z = 1:n\r\n        result2(z) = myfun(xt, z);\r\n    <span class=\"keyword\">end<\/span>\r\n    result2 = result2+x(:,k);\r\n<span class=\"keyword\">end<\/span>\r\nruntime(2) = toc;\r\n<\/pre><h4>Same Results?<a name=\"9ccb7a79-b332-4e32-abe0-81a3ed286deb\"><\/a><\/h4><p>First, let's make sure we get the same answer both ways.  You can see that we do.<\/p><pre class=\"codeinput\">theSame = isequal(result1,result2)\r\n<\/pre><pre class=\"codeoutput\">theSame =\r\n     1\r\n<\/pre><h4>Compare Runtime<a name=\"6ae5ef22-d2db-4d73-9f37-b3d80d85f82d\"><\/a><\/h4><p>Next, let's compare the times.  I want to remind you that doing timing from a script generally has more overhead than when the same code is run inside a function.  We just want to see the relative behavior so we should get some insight from this exercise.<\/p><pre class=\"codeinput\">disp([<span class=\"string\">'Run times are: '<\/span>,num2str(runtime)])\r\n<\/pre><pre class=\"codeoutput\">Run times are: 2.3936      1.9558\r\n<\/pre><h4>What's Happening?<a name=\"aea57373-3b4c-4692-ad92-ac8c6dfe920c\"><\/a><\/h4><p>Here's what's going on.  In the first method, we create a temporary variable <tt>n<\/tt> times through the outer loop, even though that array is a constant for a fixed column.  In the second method, we extract the relevant column once, and reuse it <tt>n<\/tt> times through the inner loop.<\/p><p>Be thoughtful if you do play around with this.  Depending on the details of your function, if the calculations you do each time are large compared to the time to extract a column vector, you may not see much difference between the two methods.  However, if the calculations are sufficiently short in duration, then the repeated creation of the temporary variable could add a tremendous amount of overhead to the calculation.  In general, you should not be worse off always capturing the temporary array the fewest number of times possible.<\/p><h4>Your Results?<a name=\"b2f25522-87bc-4cc4-a85d-00bc6f60d4fc\"><\/a><\/h4><p>Have you noticed similar timing \"puzzles\" when analyzing one of your algorithms?  I'd love to hear more <a href=\"https:\/\/blogs.mathworks.com\/loren\/?p=688#respond\">here<\/a>.<\/p><script language=\"JavaScript\"> <!-- \r\n    function grabCode_97656ef0fbe04d9386fa579d56ff2ea6() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='97656ef0fbe04d9386fa579d56ff2ea6 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 97656ef0fbe04d9386fa579d56ff2ea6';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2013 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_97656ef0fbe04d9386fa579d56ff2ea6()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2013a<br><\/p><p class=\"footer\"><br>\r\n      Published with MATLAB&reg; R2013a<br><\/p><\/div><!--\r\n97656ef0fbe04d9386fa579d56ff2ea6 ##### SOURCE BEGIN #####\r\n%% Recent Question about Speed with Subarray Calculations\r\n% Recently someone asked me to explain the speed behavior doing a\r\n% calculation using a loop and array indexing vs. getting the subarray\r\n% first.\r\n%% Example\r\n% Suppose I have a function of two inputs, the first input being the column\r\n% (of a square array), the second, a scalar, and the output, a vector. \r\nmyfun = @(x,z) x'*x+z;\r\n\r\n%%\r\n% And even though this may be calculated in a fully vectorized manner,\r\n% let's explore what happens when we work on subarrays from the array\r\n% input.\r\n%%\r\n% I am now creating the input array |x| and the results output arrays for\r\n% doing the calculation two ways, with an additional intermediate step in\r\n% one of the methods.\r\nn = 500;\r\nx = randn(n,n);\r\nresult1 = zeros(n,1);\r\nresult2 = zeros(n,1);\r\n%% First Method\r\n% Here we see and time the first method.  In this one, we create a\r\n% temporary array for |x(:,k)| |n| times through the outer loop.\r\ntic\r\nfor k = 1:n\r\n    for z = 1:n\r\n        result1(z) = myfun(x(:,k), z);\r\n    end\r\n    result1 = result1+x(:,k);\r\nend\r\nruntime(1) = toc;\r\n\r\n%% Second Method\r\n% In this method, we extract the column of interest first in the outer\r\n% loop, and reuse that temporary array each time through the inner loop.\r\n% Again we see and time the results.\r\ntic\r\nfor k = 1:n\r\n    xt = x(:,k);\r\n    for z = 1:n\r\n        result2(z) = myfun(xt, z);\r\n    end\r\n    result2 = result2+x(:,k);\r\nend\r\nruntime(2) = toc;\r\n%% Same Results?\r\n% First, let's make sure we get the same answer both ways.  You can see\r\n% that we do.\r\ntheSame = isequal(result1,result2)\r\n\r\n%% Compare Runtime\r\n% Next, let's compare the times.  I want to remind you that doing timing\r\n% from a script generally has more overhead than when the same code is run\r\n% inside a function.  We just want to see the relative behavior so we\r\n% should get some insight from this exercise.\r\ndisp(['Run times are: ',num2str(runtime)])\r\n%% What's Happening?\r\n% Here's what's going on.  In the first method, we create a temporary\r\n% variable |n| times through the outer loop, even though that array is a\r\n% constant for a fixed column.  In the second method, we extract the\r\n% relevant column once, and reuse it |n| times through the inner loop.\r\n%\r\n% Be thoughtful if you do play around with this.  Depending on the details\r\n% of your function, if the calculations you do each time are large compared\r\n% to the time to extract a column vector, you may not see much difference\r\n% between the two methods.  However, if the calculations are sufficiently\r\n% short in duration, then the repeated creation of the temporary variable\r\n% could add a tremendous amount of overhead to the calculation.  In\r\n% general, you should not be worse off always capturing the temporary array\r\n% the fewest number of times possible.\r\n%% Your Results?\r\n% Have you noticed similar timing \"puzzles\" when analyzing one of your\r\n% algorithms?  I'd love to hear more\r\n% <https:\/\/blogs.mathworks.com\/loren\/?p=688#respond here>.\r\n\r\n##### SOURCE END ##### 97656ef0fbe04d9386fa579d56ff2ea6\r\n-->","protected":false},"excerpt":{"rendered":"<!--introduction--><p>Recently someone asked me to explain the speed behavior doing a calculation using a loop and array indexing vs. getting the subarray first.... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/loren\/2013\/05\/04\/recent-question-about-speed-with-subarray-calculations\/\">read more >><\/a><\/p>","protected":false},"author":39,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[16,10,29],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/688"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/comments?post=688"}],"version-history":[{"count":4,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/688\/revisions"}],"predecessor-version":[{"id":692,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/688\/revisions\/692"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/media?parent=688"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/categories?post=688"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/tags?post=688"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}