Once and for All
I was talking to my long-time colleague, Mike Croucher, who joined MathWorks team recently (yay!). About a bunch of interesting topics, some of which could be good fodder for a blog post. Today I want to talk about for-loops.
Table of Contents
Nested Loops
In my old Fortran programming days (yes, long gone), I learned early on that my code ran faster if I arranged my nested for-loops optimally. In Fortran, if I were computing one element at a time in a matrix, it meant making sure I filled the elements sequentially, which meant looping over each row, 1 column at a time, because that's the way Fortran (and MATLAB) stores arrays. This is not how arrays are stored in C.
I was curious what happens in MATLAB today if I try to run nested loops in different orders. So let's see!
Let's compute
Ar,c =r*(r+c3)+17
Straight-forward Loop Solution, Inner Loop Rows
Of course I preallocate my output array, Ar.
t = zeros(1,3);
n = 1000;
tic;
Ar = zeros(n,n);
[nrows,ncolumns] = size(Ar);
for rows = 1:nrows
for columns = 1:ncolumns
Ar(rows,columns) = rows*(rows+columns^3)+17;
end
end
t(1) = toc;
Straight-forward Loop Solution, Inner Loop Columns
And now the other order.
tic;
Ac = zeros(n,n);
[nrows,ncolumns] = size(Ac);
for columns = 1:ncolumns
for rows = 1:nrows
Ac(rows,columns) = rows*(rows+columns^3)+17;
end
end
t(2) = toc;
Vectorized Solution
And, of course, the vectorized version.
tic;
tmpr = (1:nrows).';
Av = tmpr .* (tmpr+(1:ncolumns).^3) + 17;
t(3) = toc;
Analysis
Let's make sure they also produce the same results.
areAnswerEqual = isequal(Ac,Ar,Av)
And what about the execution times?
t
Order Matters
In MATLAB, using nested for-loops, clearly the order of the loops matters from a timing perspective. So it would be wise to order them knowing that the data is stored column by column, if you can write your algorithm that way. And I did not show what the effects might be if we didn't preallocate the output matrices. You can try that for yourselves.
Despite getting a pretty good speedup by wrangling the nested loops, it's still often preferable, from a speed standpoint, to try to vectorize the code. Notice that I am using implicit expansion to compute Av. Also note that you won't always get the same ratio of speed difference between loops and the equivalent vectorized version - it will depend very much on how much work is going on for each computation and how much overhead arises.
Your Thoughts?
When using loops in MATLAB, from a speed viewpoint, it's best if you can set up the loops so the elements are accessed as close to sequentially as possible. You can see from the small example above that running down the columns gives superior performance vs. running across the rows. This is because of the way MATLAB stores arrays.
Have you run into a situation where reordering loops helped you out? Let us know here.
Copyright 2021 The MathWorks, Inc.
- Category:
- Performance,
- Vectorization