Understanding Array Preallocation

Today I would like to introduce guest blogger Jeremy Greenwald who works in the Development group here at MathWorks. Jeremy works on the Code Analyzer and will be discussing when preallocating MATLAB arrays is useful and when it should be avoided.

Contents

Why Preallocation is Useful

There are numerous resources that discuss preallocation, such as sections of our documentation and articles discussing improvements to MATLAB allocation strategies. While we will quickly review the topic of preallocation here, readers unfamiliar with this topic are encouraged to read some of the provided links.

Imagine we write the following small function to fetch our data from some external source. The function returns the variable data after assigning to it, one element at a time.

function data = fillData
for idx = 1:100
data(idx) = fetchData();
end
end


MATLAB will reallocate memory numerous times while executing this loop. After reallocating memory, MATLAB has to copy the old values to the new memory location. This memory allocation and copying of values can be very expensive in terms of computation time. It also has the effect of increasing peak memory usage, since the old and new copy must both exist for a period of time.

In this example we know that the final size of the variable data is 1-by-100, so we can easily fix the issue by preallocating the variable with the zeros function. In this version of the function, there will only be a single memory allocation and the values of data never have to be copied from one location to another.

function data = fillDataWithPreallocation
data = zeros(1,100);
for idx = 1:100
data(idx) = fetchData();
end
end


While this may not be an important optimization for small data sizes (such as 1-by-100), it can be a significant improvement if the size of the data is large. For example, in an image processing application, the data may consist of thousands of high resolution images, each image using hundreds of megabytes of memory. With such applications, correct usage of preallocation can lead to a significant improvement in execution time.

The Code Analyzer and the MATLAB Editor

The MATLAB Editor uses a feature called the Code Analyzer to detect certain programming patterns that may not be optimal. The Code Analyzer offers suggestions on how to rewrite these patterns. It then communicates with the Editor to underline such code. If you copy-and-paste the first function above into the MATLAB Editor, the variable data appears underlined in orange. Hovering over the variable with the cursor causes a tooltip to appear with the following message.

The variable 'data' appears to change size on every loop iteration.
Consider preallocating for speed.

The tooltip also contains a button labeled Details. Clicking on that button causes the tooltip box to expand and contain a fuller explanation of the message. Finally, inside the fuller explanation is a link to the section of the MATLAB documentation already mentioned in this post. MATLAB tries to offer a lot of guidance on when and how to preallocate. For the first function shown above

There are other code patterns that can also cause the size of a variable to change in such a way that preallocation would help. The Code Analyzer can catch many of these common patterns. The function below contains several examples.

function data = fillLotsOfData
% all three different variables are growing inside the loop
% and all three are underlined in the MATLAB Editor
data2 = [];
data3 = [];
for idx = 1:100
data1(idx) = fetchData();
data2(end+1) = fetchSomeOtherData();
data3 = [ data3 fetchYetMoreData() ];
end

data = { data1, data2, data3 };
end


A Common Misunderstanding

Users have been told so often to preallocate that we sometimes see code where variables are preallocated even when it is unnecessary. This not only complicates code, but can actually cause the very issues that preallocation is meant to alleviate, i.e., runtime performance and peak memory usage. The unnecessary preallocation often looks something like this.

function data = fillDataWithUnecessaryPreallocation
% note the Code Analyzer message
%  The value assigned to variable 'data' might be unused.
data = zeros(1,100);
data = fetchAllData();
end


The variable data is first preallocated with the zeros function. Then it is reassigned with the return value of fetchAllData. That second assignment would not have caused the issue preallocation is meant to avoid. The memory allocated by the call to zeros cannot be reused for the data that is returned from fetchAllData. Instead, it is thrown away once the call to fetchAllData successfully returns. This has the effect of requiring twice as much memory as needed, one chunk for the preallocated zeros and one chunk for the return value of fetchAllData.

Note that if you copy-and-paste the above code into the MATLAB Editor, the following Code Analyzer message appears.

The value assigned to variable 'data' might be unused.

This is an indication that the values (and hence the underlying memory) first assigned to data will never be used. The appearance of this message on a line of code that is preallocating a variable is a good sign that the preallocation is unneeded. Since the Code Analyzer can detect numerous patterns that would benefit from preallocation, if the Code Analyzer does not detect such a pattern and it detects an unused variable, together these indicate a high likelihood that the preallocation is not needed. While the Code Analyzer may occasionally miss code patterns that could benefit from preallocation, it can be relied on to catch the most common such patterns.

Conclusions

Preallocating is not free. Therefore you should not preallocate all large variables by default. Instead, you should rely on the Code Analyzer to detect code that might benefit from preallocation. If a preallocation line causes the unused message to appear, try removing that line and seeing if the variable changing size message appears. If this message does not appear, then the original line likely had the opposite effect you were hoping for.

Did you see the variable unused message? Have you been confused by this message? What could the Code Analyzer have done to make it more clear that there was an issue? Let us know here.

Published with MATLAB® R2012b

|