{"id":582,"date":"2012-11-29T14:50:16","date_gmt":"2012-11-29T19:50:16","guid":{"rendered":"https:\/\/blogs.mathworks.com\/loren\/?p=582"},"modified":"2012-11-15T14:58:36","modified_gmt":"2012-11-15T19:58:36","slug":"understanding-array-preallocation","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/loren\/2012\/11\/29\/understanding-array-preallocation\/","title":{"rendered":"Understanding Array Preallocation"},"content":{"rendered":"\r\n<!DOCTYPE html\r\n  PUBLIC \"-\/\/W3C\/\/DTD HTML 4.01 Transitional\/\/EN\">\r\n<style type=\"text\/css\">\r\n\r\nh1 { font-size:18pt; }\r\nh2.titlebg { font-size:13pt; }\r\nh3 { color:#4A4F55; padding:0px; margin:5px 0px 5px; font-family:Arial, Helvetica, sans-serif; font-size:11pt; font-weight:bold; line-height:140%; border-bottom:1px solid #d6d4d4; display:block; }\r\nh4 { color:#4A4F55; padding:0px; margin:0px 0px 5px; font-family:Arial, Helvetica, sans-serif; font-size:10pt; font-weight:bold; line-height:140%; border-bottom:1px solid #d6d4d4; display:block; }\r\n   \r\np { padding:0px; margin:0px 0px 20px; }\r\nimg { padding:0px; margin:0px 0px 20px; border:none; }\r\np img, pre img, tt img, li img { margin-bottom:0px; } \r\n\r\nul { padding:0px; margin:0px 0px 20px 23px; list-style:square; }\r\nul li { padding:0px; margin:0px 0px 7px 0px; background:none; }\r\nul li ul { padding:5px 0px 0px; margin:0px 0px 7px 23px; }\r\nul li ol li { list-style:decimal; }\r\nol { padding:0px; margin:0px 0px 20px 0px; list-style:decimal; }\r\nol li { padding:0px; margin:0px 0px 7px 23px; list-style-type:decimal; }\r\nol li ol { padding:5px 0px 0px; margin:0px 0px 7px 0px; }\r\nol li ol li { list-style-type:lower-alpha; }\r\nol li ul { padding-top:7px; }\r\nol li ul li { list-style:square; }\r\n\r\npre, tt, code { font-size:12px; }\r\npre { margin:0px 0px 20px; }\r\npre.error { color:red; }\r\npre.codeinput { padding:10px; border:1px solid #d3d3d3; background:#f7f7f7; }\r\npre.codeoutput { padding:10px 11px; margin:0px 0px 20px; color:#4c4c4c; }\r\n\r\n@media print { pre.codeinput, pre.codeoutput { word-wrap:break-word; width:100%; } }\r\n\r\nspan.keyword { color:#0000FF }\r\nspan.comment { color:#228B22 }\r\nspan.string { color:#A020F0 }\r\nspan.untermstring { color:#B20000 }\r\nspan.syscmd { color:#B28C00 }\r\n\r\n.footer { width:auto; padding:10px 0px; margin:25px 0px 0px; border-top:1px dotted #878787; font-size:0.8em; line-height:140%; font-style:italic; color:#878787; text-align:left; float:none; }\r\n.footer p { margin:0px; }\r\n\r\n  <\/style><div class=\"content\"><!--introduction--><p>Today I would like to introduce guest blogger Jeremy Greenwald who works in the Development group here at MathWorks. Jeremy works on the Code Analyzer and will be discussing when preallocating MATLAB arrays is useful and when it should be avoided.<\/p><!--\/introduction--><h3>Contents<\/h3><div><ul><li><a href=\"#a34d36ee-9644-4d7d-b0fd-b72969180fba\">Why Preallocation is Useful<\/a><\/li><li><a href=\"#e3eabc21-ef41-4f07-93fd-c45e284cac29\">The Code Analyzer and the MATLAB Editor<\/a><\/li><li><a href=\"#006dbd69-ef57-49e3-9cac-b223686de0cc\">A Common Misunderstanding<\/a><\/li><li><a href=\"#8f7275da-a697-4697-800e-8d8b5d6bb148\">Conclusions<\/a><\/li><\/ul><\/div><h4>Why Preallocation is Useful<a name=\"a34d36ee-9644-4d7d-b0fd-b72969180fba\"><\/a><\/h4><p>There are numerous resources that discuss preallocation, such as sections of our <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/matlab_prog\/techniques-for-improving-performance.html#f8-793781\">documentation<\/a> and articles discussing improvements to <a href=\"https:\/\/blogs.mathworks.com\/steve\/2011\/05\/16\/automatic-array-growth-gets-a-lot-faster-in-r2011a\/\">MATLAB allocation strategies<\/a>. While we will quickly review the topic of preallocation here, readers unfamiliar with this topic are encouraged to read some of the provided links.<\/p><p>Imagine we write the following small function to fetch our data from some external source. The function returns the variable <tt>data<\/tt> after assigning to it, one element at a time.<\/p><pre class=\"codeinput\"><span class=\"keyword\">function<\/span> data = fillData\r\n<span class=\"keyword\">for<\/span> idx = 1:100\r\n    data(idx) = fetchData();\r\n<span class=\"keyword\">end<\/span>\r\n<span class=\"keyword\">end<\/span>\r\n<\/pre><p>MATLAB will reallocate memory numerous times while executing this loop. After reallocating memory, MATLAB has to copy the old values to the new memory location. This memory allocation and copying of values can be very expensive in terms of computation time.  It also has the effect of increasing peak memory usage, since the old and new copy must both exist for a period of time.<\/p><p>In this example we know that the final size of the variable <tt>data<\/tt> is 1-by-100, so we can easily fix the issue by preallocating the variable with the <tt>zeros<\/tt> function. In this version of the function, there will only be a single memory allocation and the values of data never have to be copied from one location to another.<\/p><pre class=\"codeinput\"><span class=\"keyword\">function<\/span> data = fillDataWithPreallocation\r\ndata = zeros(1,100);\r\n<span class=\"keyword\">for<\/span> idx = 1:100\r\n    data(idx) = fetchData();\r\n<span class=\"keyword\">end<\/span>\r\n<span class=\"keyword\">end<\/span>\r\n<\/pre><p>While this may not be an important optimization for small data sizes (such as 1-by-100), it can be a significant improvement if the size of the data is large. For example, in an image processing application, the data may consist of thousands of high resolution images, each image using hundreds of megabytes of memory. With such applications, correct usage of preallocation can lead to a significant improvement in execution time.<\/p><h4>The Code Analyzer and the MATLAB Editor<a name=\"e3eabc21-ef41-4f07-93fd-c45e284cac29\"><\/a><\/h4><p>The MATLAB Editor uses a feature called the <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/matlab_prog\/check-code-for-errors-and-warnings.html\">Code Analyzer<\/a> to detect certain programming patterns that may not be optimal. The Code Analyzer offers suggestions on how to rewrite these patterns. It then communicates with the Editor to underline such code. If you copy-and-paste the first function above into the MATLAB Editor, the variable <tt>data<\/tt> appears underlined in orange. Hovering over the variable with the cursor causes a tooltip to appear with the following message.<\/p><pre>The variable 'data' appears to change size on every loop iteration.\r\nConsider preallocating for speed.<\/pre><p>The tooltip also contains a button labeled <b>Details<\/b>. Clicking on that button causes the tooltip box to expand and contain a fuller explanation of the message. Finally, inside the fuller explanation is a link to the section of the MATLAB documentation already mentioned in this post. MATLAB tries to offer a lot of guidance on when and how to preallocate. For the first function shown above<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2012\/CodeAnalyzerMessage.png\" alt=\"\"> <\/p><p>There are other code patterns that can also cause the size of a variable to change in such a way that preallocation would help. The Code Analyzer can catch many of these common patterns. The function below contains several examples.<\/p><pre class=\"codeinput\"><span class=\"keyword\">function<\/span> data = fillLotsOfData\r\n<span class=\"comment\">% all three different variables are growing inside the loop<\/span>\r\n<span class=\"comment\">% and all three are underlined in the MATLAB Editor<\/span>\r\ndata2 = [];\r\ndata3 = [];\r\n<span class=\"keyword\">for<\/span> idx = 1:100\r\n    data1(idx) = fetchData();\r\n    data2(end+1) = fetchSomeOtherData();\r\n    data3 = [ data3 fetchYetMoreData() ];\r\n<span class=\"keyword\">end<\/span>\r\n\r\ndata = { data1, data2, data3 };\r\n<span class=\"keyword\">end<\/span>\r\n<\/pre><h4>A Common Misunderstanding<a name=\"006dbd69-ef57-49e3-9cac-b223686de0cc\"><\/a><\/h4><p>Users have been told so often to preallocate that we sometimes see code where variables are preallocated even when it is unnecessary. This not only complicates code, but can actually cause the very issues that preallocation is meant to alleviate, i.e., runtime performance and peak memory usage. The unnecessary preallocation often looks something like this.<\/p><pre class=\"codeinput\"><span class=\"keyword\">function<\/span> data = fillDataWithUnecessaryPreallocation\r\n<span class=\"comment\">% note the Code Analyzer message<\/span>\r\n<span class=\"comment\">%  The value assigned to variable 'data' might be unused.<\/span>\r\ndata = zeros(1,100);\r\ndata = fetchAllData();\r\n<span class=\"keyword\">end<\/span>\r\n<\/pre><p>The variable <tt>data<\/tt> is first preallocated with the <tt>zeros<\/tt> function. Then it is reassigned with the return value of <tt>fetchAllData<\/tt>. That second assignment would <b>not<\/b> have caused the issue preallocation is meant to avoid. The memory allocated by the call to <tt>zeros<\/tt> cannot be reused for the data that is returned from <tt>fetchAllData<\/tt>. Instead, it is thrown away once the call to <tt>fetchAllData<\/tt> successfully returns.  This has the effect of requiring twice as much memory as needed, one chunk for the preallocated zeros and one chunk for the return value of <tt>fetchAllData<\/tt>.<\/p><p>Note that if you copy-and-paste the above code into the MATLAB Editor, the following Code Analyzer message appears.<\/p><pre>The value assigned to variable 'data' might be unused.<\/pre><p>This is an indication that the values (and hence the underlying memory) first assigned to <tt>data<\/tt> will never be used. The appearance of this message on a line of code that is preallocating a variable is a good sign that the preallocation is unneeded. Since the Code Analyzer can detect numerous patterns that would benefit from preallocation, if the Code Analyzer does not detect such a pattern and it detects an unused variable, together these indicate a high likelihood that the preallocation is not needed.  While the Code Analyzer may occasionally miss code patterns that could benefit from preallocation, it can be relied on to catch the most common such patterns.<\/p><h4>Conclusions<a name=\"8f7275da-a697-4697-800e-8d8b5d6bb148\"><\/a><\/h4><p>Preallocating is not free. Therefore you should not preallocate all large variables by default. Instead, you should rely on the Code Analyzer to detect code that might benefit from preallocation. If a preallocation line causes the unused message to appear, try removing that line and seeing if the variable changing size message appears. If this message does not appear, then the original line likely had the opposite effect you were hoping for.<\/p><p>Did you see the variable unused message? Have you been confused by this message? What could the Code Analyzer have done to make it more clear that there was an issue?  Let us know <a href=\"https:\/\/blogs.mathworks.com\/loren\/?p=582#respond\">here<\/a>.<\/p><script language=\"JavaScript\"> <!-- \r\n    function grabCode_26dabee14d784ca0bdd85ed96c36d63b() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='26dabee14d784ca0bdd85ed96c36d63b ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 26dabee14d784ca0bdd85ed96c36d63b';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2012 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_26dabee14d784ca0bdd85ed96c36d63b()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2012b<br><\/p><p class=\"footer\"><br>\r\n      Published with MATLAB&reg; R2012b<br><\/p><\/div><!--\r\n26dabee14d784ca0bdd85ed96c36d63b ##### SOURCE BEGIN #####\r\n%% Understanding Array Preallocation\r\n% Today I would like to introduce guest blogger Jeremy Greenwald who works\r\n% in the Development group here at MathWorks. Jeremy works on the Code\r\n% Analyzer and will be discussing when preallocating MATLAB arrays is\r\n% useful and when it should be avoided.\r\n\r\n%% Why Preallocation is Useful\r\n% There are numerous resources that discuss preallocation, such as sections\r\n% of our\r\n% <https:\/\/www.mathworks.com\/help\/matlab\/matlab_prog\/techniques-for-improving-performance.html#f8-793781\r\n% documentation> and articles discussing improvements to\r\n% <https:\/\/blogs.mathworks.com\/steve\/2011\/05\/16\/automatic-array-growth-gets-a-lot-faster-in-r2011a\/\r\n% MATLAB allocation strategies>. While we will quickly review the topic of\r\n% preallocation here, readers unfamiliar with this topic are encouraged to\r\n% read some of the provided links.\r\n%\r\n% Imagine we write the following small function to fetch our data from some\r\n% external source. The function returns the variable |data| after assigning\r\n% to it, one element at a time. \r\nfunction data = fillData\r\nfor idx = 1:100\r\n    data(idx) = fetchData();\r\nend\r\nend\r\n\r\n%%\r\n% MATLAB will reallocate memory numerous times while executing this loop.\r\n% After reallocating memory, MATLAB has to copy the old values to the new\r\n% memory location. This memory allocation and copying of values can be very\r\n% expensive in terms of computation time.  It also has the effect of\r\n% increasing peak memory usage, since the old and new copy must both exist\r\n% for a period of time.\r\n%\r\n% In this example we know that the final size of the variable |data| is\r\n% 1-by-100, so we can easily fix the issue by preallocating the variable\r\n% with the |zeros| function. In this version of the function, there will\r\n% only be a single memory allocation and the values of data never have to\r\n% be copied from one location to another.\r\nfunction data = fillDataWithPreallocation\r\ndata = zeros(1,100);\r\nfor idx = 1:100\r\n    data(idx) = fetchData();\r\nend\r\nend\r\n\r\n%%\r\n% While this may not be an important optimization for small data sizes\r\n% (such as 1-by-100), it can be a significant improvement if the size of\r\n% the data is large. For example, in an image processing application, the\r\n% data may consist of thousands of high resolution images, each image using\r\n% hundreds of megabytes of memory. With such applications, correct usage of\r\n% preallocation can lead to a significant improvement in execution time.\r\n\r\n%% The Code Analyzer and the MATLAB Editor\r\n% The MATLAB Editor uses a feature called the\r\n% <https:\/\/www.mathworks.com\/help\/matlab\/matlab_prog\/check-code-for-errors-and-warnings.html\r\n% Code Analyzer> to detect certain programming patterns that may not be\r\n% optimal. The Code Analyzer offers suggestions on how to rewrite these\r\n% patterns. It then communicates with the Editor to underline such code. If\r\n% you copy-and-paste the first function above into the MATLAB Editor, the\r\n% variable |data| appears underlined in orange. Hovering over the variable\r\n% with the cursor causes a tooltip to appear with the following message.\r\n%\r\n%  The variable 'data' appears to change size on every loop iteration. \r\n%  Consider preallocating for speed.\r\n%\r\n% The tooltip also contains a button labeled *Details*. Clicking on that\r\n% button causes the tooltip box to expand and contain a fuller explanation\r\n% of the message. Finally, inside the fuller explanation is a link to the\r\n% section of the MATLAB documentation already mentioned in this post.\r\n% MATLAB tries to offer a lot of guidance on when and how to preallocate.\r\n% For the first function shown above\r\n% \r\n% <<CodeAnalyzerMessage.png>>\r\n% \r\n\r\n%%\r\n% There are other code patterns that can also cause the size of a\r\n% variable to change in such a way that preallocation would help. The Code\r\n% Analyzer can catch many of these common patterns. The function below\r\n% contains several examples.\r\n\r\nfunction data = fillLotsOfData\r\n% all three different variables are growing inside the loop\r\n% and all three are underlined in the MATLAB Editor\r\ndata2 = [];\r\ndata3 = [];\r\nfor idx = 1:100\r\n    data1(idx) = fetchData();\r\n    data2(end+1) = fetchSomeOtherData();\r\n    data3 = [ data3 fetchYetMoreData() ];\r\nend\r\n\r\ndata = { data1, data2, data3 };\r\nend\r\n\r\n%% A Common Misunderstanding\r\n% Users have been told so often to preallocate that we sometimes see code\r\n% where variables are preallocated even when it is unnecessary. This not\r\n% only complicates code, but can actually cause the very issues that\r\n% preallocation is meant to alleviate, i.e., runtime performance and peak\r\n% memory usage. The unnecessary preallocation often looks something like\r\n% this.\r\n\r\nfunction data = fillDataWithUnecessaryPreallocation\r\n% note the Code Analyzer message\r\n%  The value assigned to variable 'data' might be unused.\r\ndata = zeros(1,100); \r\ndata = fetchAllData();\r\nend\r\n\r\n%% \r\n% The variable |data| is first preallocated with the |zeros| function. Then\r\n% it is reassigned with the return value of |fetchAllData|. That second\r\n% assignment would *not* have caused the issue preallocation is meant to\r\n% avoid. The memory allocated by the call to |zeros| cannot be reused for\r\n% the data that is returned from |fetchAllData|. Instead, it is thrown away\r\n% once the call to |fetchAllData| successfully returns.  This has the\r\n% effect of requiring twice as much memory as needed, one chunk for the\r\n% preallocated zeros and one chunk for the return value of |fetchAllData|.\r\n%\r\n% Note that if you copy-and-paste the above code into the MATLAB Editor,\r\n% the following Code Analyzer message appears. \r\n% \r\n%  The value assigned to variable 'data' might be unused.\r\n% \r\n% This is an indication that the values (and hence the underlying memory)\r\n% first assigned to |data| will never be used. The appearance of this\r\n% message on a line of code that is preallocating a variable is a good sign\r\n% that the preallocation is unneeded. Since the Code Analyzer can detect\r\n% numerous patterns that would benefit from preallocation, if the Code\r\n% Analyzer does not detect such a pattern and it detects an unused\r\n% variable, together these indicate a high likelihood that the\r\n% preallocation is not needed.  While the Code Analyzer may occasionally\r\n% miss code patterns that could benefit from preallocation, it can be\r\n% relied on to catch the most common such patterns.\r\n\r\n%% Conclusions\r\n% Preallocating is not free. Therefore you should not preallocate all large\r\n% variables by default. Instead, you should rely on the Code Analyzer to\r\n% detect code that might benefit from preallocation. If a preallocation\r\n% line causes the unused message to appear, try removing that line and\r\n% seeing if the variable changing size message appears. If this message\r\n% does not appear, then the original line likely had the opposite effect\r\n% you were hoping for. \r\n%\r\n% Did you see the variable unused message? Have you been confused by this\r\n% message? What could the Code Analyzer have done to make it more clear\r\n% that there was an issue?  Let us know\r\n% <https:\/\/blogs.mathworks.com\/loren\/?p=582#respond here>.\r\n\r\n##### SOURCE END ##### 26dabee14d784ca0bdd85ed96c36d63b\r\n-->","protected":false},"excerpt":{"rendered":"<!--introduction--><p>Today I would like to introduce guest blogger Jeremy Greenwald who works in the Development group here at MathWorks. Jeremy works on the Code Analyzer and will be discussing when preallocating MATLAB arrays is useful and when it should be avoided.... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/loren\/2012\/11\/29\/understanding-array-preallocation\/\">read more >><\/a><\/p>","protected":false},"author":39,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[16,14,39],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/582"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/comments?post=582"}],"version-history":[{"count":5,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/582\/revisions"}],"predecessor-version":[{"id":586,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/582\/revisions\/586"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/media?parent=582"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/categories?post=582"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/tags?post=582"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}