Loren on the Art of MATLAB

Understanding Array Preallocation 6

Posted by Loren Shure,

Today I would like to introduce guest blogger Jeremy Greenwald who works in the Development group here at MathWorks. Jeremy works on the Code Analyzer and will be discussing when preallocating MATLAB arrays is useful and when it should be avoided.

Contents

Why Preallocation is Useful

There are numerous resources that discuss preallocation, such as sections of our documentation and articles discussing improvements to MATLAB allocation strategies. While we will quickly review the topic of preallocation here, readers unfamiliar with this topic are encouraged to read some of the provided links.

Imagine we write the following small function to fetch our data from some external source. The function returns the variable data after assigning to it, one element at a time.

function data = fillData
for idx = 1:100
    data(idx) = fetchData();
end
end

MATLAB will reallocate memory numerous times while executing this loop. After reallocating memory, MATLAB has to copy the old values to the new memory location. This memory allocation and copying of values can be very expensive in terms of computation time. It also has the effect of increasing peak memory usage, since the old and new copy must both exist for a period of time.

In this example we know that the final size of the variable data is 1-by-100, so we can easily fix the issue by preallocating the variable with the zeros function. In this version of the function, there will only be a single memory allocation and the values of data never have to be copied from one location to another.

function data = fillDataWithPreallocation
data = zeros(1,100);
for idx = 1:100
    data(idx) = fetchData();
end
end

While this may not be an important optimization for small data sizes (such as 1-by-100), it can be a significant improvement if the size of the data is large. For example, in an image processing application, the data may consist of thousands of high resolution images, each image using hundreds of megabytes of memory. With such applications, correct usage of preallocation can lead to a significant improvement in execution time.

The Code Analyzer and the MATLAB Editor

The MATLAB Editor uses a feature called the Code Analyzer to detect certain programming patterns that may not be optimal. The Code Analyzer offers suggestions on how to rewrite these patterns. It then communicates with the Editor to underline such code. If you copy-and-paste the first function above into the MATLAB Editor, the variable data appears underlined in orange. Hovering over the variable with the cursor causes a tooltip to appear with the following message.

The variable 'data' appears to change size on every loop iteration.
Consider preallocating for speed.

The tooltip also contains a button labeled Details. Clicking on that button causes the tooltip box to expand and contain a fuller explanation of the message. Finally, inside the fuller explanation is a link to the section of the MATLAB documentation already mentioned in this post. MATLAB tries to offer a lot of guidance on when and how to preallocate. For the first function shown above

There are other code patterns that can also cause the size of a variable to change in such a way that preallocation would help. The Code Analyzer can catch many of these common patterns. The function below contains several examples.

function data = fillLotsOfData
% all three different variables are growing inside the loop
% and all three are underlined in the MATLAB Editor
data2 = [];
data3 = [];
for idx = 1:100
    data1(idx) = fetchData();
    data2(end+1) = fetchSomeOtherData();
    data3 = [ data3 fetchYetMoreData() ];
end

data = { data1, data2, data3 };
end

A Common Misunderstanding

Users have been told so often to preallocate that we sometimes see code where variables are preallocated even when it is unnecessary. This not only complicates code, but can actually cause the very issues that preallocation is meant to alleviate, i.e., runtime performance and peak memory usage. The unnecessary preallocation often looks something like this.

function data = fillDataWithUnecessaryPreallocation
% note the Code Analyzer message
%  The value assigned to variable 'data' might be unused.
data = zeros(1,100);
data = fetchAllData();
end

The variable data is first preallocated with the zeros function. Then it is reassigned with the return value of fetchAllData. That second assignment would not have caused the issue preallocation is meant to avoid. The memory allocated by the call to zeros cannot be reused for the data that is returned from fetchAllData. Instead, it is thrown away once the call to fetchAllData successfully returns. This has the effect of requiring twice as much memory as needed, one chunk for the preallocated zeros and one chunk for the return value of fetchAllData.

Note that if you copy-and-paste the above code into the MATLAB Editor, the following Code Analyzer message appears.

The value assigned to variable 'data' might be unused.

This is an indication that the values (and hence the underlying memory) first assigned to data will never be used. The appearance of this message on a line of code that is preallocating a variable is a good sign that the preallocation is unneeded. Since the Code Analyzer can detect numerous patterns that would benefit from preallocation, if the Code Analyzer does not detect such a pattern and it detects an unused variable, together these indicate a high likelihood that the preallocation is not needed. While the Code Analyzer may occasionally miss code patterns that could benefit from preallocation, it can be relied on to catch the most common such patterns.

Conclusions

Preallocating is not free. Therefore you should not preallocate all large variables by default. Instead, you should rely on the Code Analyzer to detect code that might benefit from preallocation. If a preallocation line causes the unused message to appear, try removing that line and seeing if the variable changing size message appears. If this message does not appear, then the original line likely had the opposite effect you were hoping for.

Did you see the variable unused message? Have you been confused by this message? What could the Code Analyzer have done to make it more clear that there was an issue? Let us know here.


Get the MATLAB code

Published with MATLAB® R2012b

6 CommentsOldest to Newest

(1) I wish the Code Analyzer would suggest I preallocate x for this pattern:

for idx=1:100
x = sprintf(‘%s\n1′,x);
end

I try to remember to rewrite it as:

x = cell(1,100);
for idx=1:100
x{idx} = sprintf(‘\n1′);
end
x = [x{:}];

but often forget.

(2) The one I always get frustrated with is when I get a suggestion to preallocate on this code:

x = [];
moreData = true;
while moreData
[moreData,val] = fetchData();
x(end+1) = val;
end

I feel like a bad person for not following the suggestion to preallocate x, but I just don’t know how.

With this pattern:

x = [];
for idx=1:100
if test(idx)
x(end+1) = idx;
end
end

I admit I often suppress the warning rather than rewrite it as:

x = zeros(1,100);
for idx=1:100
if test(idx)
x(idx) = idx;
end
end
x = x(0 ~= x);

@Bob, thanks for the suggestions. I just did some quick timings on my machine and I think you might be able to get some better results (about 3x) using brackets. Using the code below I see better results with the function withBrackets than the function withCellToString. Your example in 2) is also interesting, thanks for bringing it to our attention.

function testTime

n = 5e4;
tic, withSprint(n), toc
tic, withBrackets(n), toc
tic, withCellToString(n), toc

end

function withSprint(n)

x = ”;
for idx=1:n
x = sprintf(‘%s\nSome characters here and there’,x);
end
size(x)

end

function withBrackets(n)

x = ”;
nl = sprintf(‘\n’);
for idx=1:n
x = [ x nl ‘Some characters here and there’ ];
end

end

function withCellToString(n)

x = cell(1,n);
for idx=1:n
x{idx} = sprintf(‘\nSome characters here and there’);
end
x = [x{:}];

end

Hi,

I have a question about pre-allocation. What is the best procedure if you want the user to interactively select points and you don’t know in advance how many points they will select (and maybe the user doesn’t know either, for example they decide on the fly as they look at an image, so you can not ask for the number of points as an input parameter)?

For example:

but = 1;
n = 0;
while but == 1
[xp,yp,but] = ginput(1);
plot(xp,yp,’go’)
n = n+1;
xydata(n,:) = [xp,yp];
hold on
end

where the output is the matrix xydata.

Thanks!

@Karleen, In the context of your example I think it would be best not to worry about preallocation for two reasons. First, since the code is storing points interactively selected by the user it seems very unlikely that the variable xydata will grow very large. For small sized variables preallocation doesn’t help very much.
Second, it also seems very unlikely that any small amount of time spent managing the underlying memory of the variable xydata will exceed the time spent waiting for user input (ie, ginput).

I have an added reason to pre-allocate: with complicated for- or while-loops, I might accidentally skip some values. By pre-allocating with NaNs (or some other value which is obviously different from the real results), this becomes a lot more obvious. If I neglect to pre-allocate, Matlab automatically inserts zeros, which don’t stand out so much.

For instance, compare:

input = 1:5;
clear result;
% result = repmat(NaN,size(input));

for ind = 1:2:length(input)
result(ind) = input(ind).^2;
end
result

with:

input = 1:5;
clear result;
result = repmat(NaN,size(input));

for ind = 1:2:length(input)
result(ind) = input(ind).^2;
end
result

Also, when using 2D input with code which assumed vector input:

input = magic(3);
clear result;
% result = repmat(NaN,size(input));

for ind = 1:length(input)
result(ind) = input(ind).^2;
end
result

(Of course, in this case, both results are wrong.)

These postings are the author's and don't necessarily represent the opinions of MathWorks.