Loren on the Art of MATLAB

Turn ideas into MATLAB

MATLAB Behavior: for 34

Posted by Loren Shure,

Conventional wisdom for programming MATLAB used to be that using for loops automatically forced a program to suffer from poor performance. Since MATLAB R13 (version 6.5), MATLAB has taken advantage of some innovations that accelerate many for loops so the code has performance on par with either vectorized code or code written in a lower level language such as C or Fortran. Obviously, details matter here. One thing that most people, even at MathWorks (!) don't appreciate is that the for loop has richer behavior than simply looping over single elements at a time. An informal hallway survey near my office found that even among experienced MATLAB programmers, far fewer than 50% knew about this behavior. Time to come clean.

Contents

Exploring the Behavior of 'for'

The MATLAB for statement is both more powerful and more subtle than many people realize because of the way it lets you iterate directly over an array rather than making use of explicit indices or subscripts. Here's an example that displays the logarithms of the positive numbers in a row vector:

x = [1 pi -17 1.3 289 -exp(1) -42];

for pnum = x(x > 0)
    disp('Iterating')
    log(pnum)
end
Iterating
ans =
     0
Iterating
ans =
           1.1447298858494
Iterating
ans =
         0.262364264467491
Iterating
ans =
          5.66642668811243

This code is streamlined in comparison to the following, which uses an explicit index ind, with equivalent results (except for the details on the output that I added here).

pnums = x(x > 0);
for ind = 1:numel(pnums)
    disp(['Iterating...   Value #',int2str(ind)])
    log(pnums(ind))
end
Iterating...   Value #1
ans =
     0
Iterating...   Value #2
ans =
           1.1447298858494
Iterating...   Value #3
ans =
         0.262364264467491
Iterating...   Value #4
ans =
          5.66642668811243

Behavior of for

To use for as in the first example, you need to understand that for loops do not iterate over the first dimension of an array, but only over dimensions 2 to the maximum array dimension. See this by transposing the input vector (turning it into a column).

xt = x.'
for pnum = xt(xt > 0)
    disp('Iterating')
    log(pnum)
end
xt =
                         1
          3.14159265358979
                       -17
                       1.3
                       289
         -2.71828182845905
                       -42
Iterating
ans =
                         0
           1.1447298858494
         0.262364264467491
          5.66642668811243

We see one iteration, with pnum taking on the 4-by-1 value [1 pi 1.3 289]'. In contrast, the version with the explicit index works the same way whatever the shape of x.

More Examples

This iterates over s.

for s = [1,-2,8,pi,17], disp('Iterating'), disp(s), end
Iterating
     1
Iterating
    -2
Iterating
     8
Iterating
          3.14159265358979
Iterating
    17

This doesn't iterate, but processes the entire column as one entity.

for s = [1,-2,8,pi,17]', disp('Iterating'), disp(s), end
Iterating
                         1
                        -2
                         8
          3.14159265358979
                        17

What about Higher Dimensions?

This iterates over the 2nd and 3rd dimensions of A.

A = (1:12);
A = reshape(A,[2 2 3])
A(:,:,1) =
     1     3
     2     4
A(:,:,2) =
     5     7
     6     8
A(:,:,3) =
     9    11
    10    12
for k = A, disp('iterating'), disp(k), end
iterating
     1
     2
iterating
     3
     4
iterating
     5
     6
iterating
     7
     8
iterating
     9
    10
iterating
    11
    12

The documentation makes it clear that for does not iterate over the first dimension. for does iterate over all of the dimensions of A except for the first (row) dimension. You can predict the number of iterations evaluating this.

numel(A,1,':')
ans =
     6

Returning to the first example, numel(x(x >= 0),1,':') evaluates to 4 when x is a row vector and 1 when x is a column vector.

Possible Uses

Why use this version of for? Suppose you have a large dataset and the vectorized calculations can't take full advantage of functions such as bsxfun. You may have a case where the memory tradeoff for vectorization is too high. But you don't want to make function calls for each array element since the function call overhead can get high as well. A possibly good compromise is to essentially process the data in chunks, perhaps by 'virtual' columns. That way the function call overhead is more limited as is the memory. To get the best outcome for this approach, you should preallocate the output array and assign into it with proper indexing as you loop.

What Do You Use?

Did you know about this for behavior? Do you ever take advantage of it? What strategies do you use for trading off memory use and function call overhead? Let me know here.


Get the MATLAB code

Published with MATLAB® 7.9

15 views (last 30 days)  | |

Comments

To leave a comment, please click here to sign in to your MathWorks Account or create a new one.