## Loren on the Art of MATLABTurn ideas into MATLAB

Note

Loren on the Art of MATLAB has been retired and will not be updated.

# Once and for All

I was talking to my long-time colleague, Mike Croucher, who joined MathWorks team recently (yay!). About a bunch of interesting topics, some of which could be good fodder for a blog post. Today I want to talk about for-loops.

## Nested Loops

In my old Fortran programming days (yes, long gone), I learned early on that my code ran faster if I arranged my nested for-loops optimally. In Fortran, if I were computing one element at a time in a matrix, it meant making sure I filled the elements sequentially, which meant looping over each row, 1 column at a time, because that's the way Fortran (and MATLAB) stores arrays. This is not how arrays are stored in C.
I was curious what happens in MATLAB today if I try to run nested loops in different orders. So let's see!
Let's compute Ar,c =r*(r+c3)+17

## Straight-forward Loop Solution, Inner Loop Rows

Of course I preallocate my output array, Ar.
t = zeros(1,3);
n = 1000;
tic;
Ar = zeros(n,n);
[nrows,ncolumns] = size(Ar);
for rows = 1:nrows
for columns = 1:ncolumns
Ar(rows,columns) = rows*(rows+columns^3)+17;
end
end
t(1) = toc;

## Straight-forward Loop Solution, Inner Loop Columns

And now the other order.
tic;
Ac = zeros(n,n);
[nrows,ncolumns] = size(Ac);
for columns = 1:ncolumns
for rows = 1:nrows
Ac(rows,columns) = rows*(rows+columns^3)+17;
end
end
t(2) = toc;

## Vectorized Solution

And, of course, the vectorized version.
tic;
tmpr = (1:nrows).';
Av = tmpr .* (tmpr+(1:ncolumns).^3) + 17;
t(3) = toc;

## Analysis

Let's make sure they also produce the same results.
1
And what about the execution times?
t
t = 1×3
0.1236 0.0190 0.0082

## Order Matters

In MATLAB, using nested for-loops, clearly the order of the loops matters from a timing perspective. So it would be wise to order them knowing that the data is stored column by column, if you can write your algorithm that way. And I did not show what the effects might be if we didn't preallocate the output matrices. You can try that for yourselves.
Despite getting a pretty good speedup by wrangling the nested loops, it's still often preferable, from a speed standpoint, to try to vectorize the code. Notice that I am using implicit expansion to compute Av. Also note that you won't always get the same ratio of speed difference between loops and the equivalent vectorized version - it will depend very much on how much work is going on for each computation and how much overhead arises.