Recent Question about Speed with Subarray Calculations

Posted by Loren Shure, May 4, 2013

6 views (last 30 days) | 0 Likes | 8 comments

Recently someone asked me to explain the speed behavior doing a calculation using a loop and array indexing vs. getting the subarray first.

Example
First Method
Second Method
Same Results?
Compare Runtime
What's Happening?
Your Results?

Example

Suppose I have a function of two inputs, the first input being the column (of a square array), the second, a scalar, and the output, a vector.

myfun = @(x,z) x'*x+z;

And even though this may be calculated in a fully vectorized manner, let's explore what happens when we work on subarrays from the array input.

I am now creating the input array x and the results output arrays for doing the calculation two ways, with an additional intermediate step in one of the methods.

n = 500;
x = randn(n,n);
result1 = zeros(n,1);
result2 = zeros(n,1);

First Method

Here we see and time the first method. In this one, we create a temporary array for x(:,k) n times through the outer loop.

tic
for k = 1:n
    for z = 1:n
        result1(z) = myfun(x(:,k), z);
    end
    result1 = result1+x(:,k);
end
runtime(1) = toc;

Second Method

In this method, we extract the column of interest first in the outer loop, and reuse that temporary array each time through the inner loop. Again we see and time the results.

tic
for k = 1:n
    xt = x(:,k);
    for z = 1:n
        result2(z) = myfun(xt, z);
    end
    result2 = result2+x(:,k);
end
runtime(2) = toc;

Same Results?

First, let's make sure we get the same answer both ways. You can see that we do.

theSame = isequal(result1,result2)

theSame =
     1

Compare Runtime

Next, let's compare the times. I want to remind you that doing timing from a script generally has more overhead than when the same code is run inside a function. We just want to see the relative behavior so we should get some insight from this exercise.

disp(['Run times are: ',num2str(runtime)])

Run times are: 2.3936      1.9558

What's Happening?

Here's what's going on. In the first method, we create a temporary variable n times through the outer loop, even though that array is a constant for a fixed column. In the second method, we extract the relevant column once, and reuse it n times through the inner loop.

Be thoughtful if you do play around with this. Depending on the details of your function, if the calculations you do each time are large compared to the time to extract a column vector, you may not see much difference between the two methods. However, if the calculations are sufficiently short in duration, then the repeated creation of the temporary variable could add a tremendous amount of overhead to the calculation. In general, you should not be worse off always capturing the temporary array the fewest number of times possible.