Preparing to Run Code in Parallel

Posted by Loren Shure, April 15, 2021

6 views (last 30 days) | 0 Likes | 9 comments

In a recent post, I talked about for-loops in MATLAB and how to optimize their use knowing how MATLAB stores arrays in memory.  Today I want to talk about getting ready for parallel computation, specifically using parallel for-loops, via parfor. 

En route to creating code suitable for running in parallel, sometimes we take code with a for-loop and simply replace it with a parallel loop, using parfor. That is, if we can't vectorize the code well first.  This transformation from for->parfor works really well sometimes, but does not always work, and for very good reasons. 

For example, you can't simply replace for with parfor if loop iterations are not completely independent.  You can find out more conditions here.  There is a notable exception to this rule, reduction variables, (a reduction variable accumulates a value that depends on all the iterations together, but is independent of the iteration order).  The documentation for this is great.  Here's a quick pseudo-code that would parallelize just fine, even though we reuse the variable s on both the right and left-hand sides.  As long as the result is not numerically sensitive to the order in which the accumulation occurs, you will get the "right" answer here whether you use a for or a parfor loop.

s = 0;

parfor ind = 1:100

   s = s + fun(data(ind));

end

Testing for Parallelizability

I doubt parallelizability is word, but I think you know what I mean!  

One simple way to see if your loop might be suitable for parallelization is to try running the loop in another order to see if you get the same result.  If so, it is a candidate for parfor.  The easiest version of this might be running the loop backwards.  If you get the same results as the standard  sweep, you should be good to go.  Instead of backwards, you can also try replacing the loop with the counter in random order; this can be achieved easily using randperm.  

Note: clearly this calculation can be vectorized easily.  That's not my focus here.

k = randn(100,1);

meank = mean(k);

tol = sqrt(eps);

qforward = zeros(size(k));

for ind = 1:100

   qforward(ind) = k(ind) - meank;

end

meank = mean(k);

qbackward = zeros(size(k));

for ind = 100:-1:1

   qbackward(ind) = k(ind) - meank;

end

meank = mean(k);

qrandom = zeros(size(k));

for ind = randperm(100)

   qrandom(ind) = k(ind) - meank;

end

aretheyequal = isequal(qforward, qbackward, qrandom)  

aretheyequal = logical
   1

In case the answers aren't exactly equal, you could compare them in a way similar to this.

nearlyfb = norm(qforward-qbackward) < tol

nearlyfb = logical
   1

nearlyfr = norm(qforward-qrandom) < tol

nearlyfr = logical
   1

nearlybr = norm(qbackward-qrandom) < tol

nearlybr = logical
   1

So this calculation is ready to try with parfor.

qpar = zeros(size(k));

for ind = 1:100

   qpar(ind) = k(ind) - meank;

end

aretheyequal = isequal(qpar, qforward)

aretheyequal = logical
   1

What if I Have an Expression with Iteration Dependency

You may not always be able to parallelize your code.  However, there are some conditions under which you can convert your code to take advantage of parallelization by refactoring your calculations a little  Let me show you an example.  In this case, I will do some of the accumulation after the loop is complete.

%% The Code Analyzer gives this first chunk of code two red error messages.

x = [17 zeros(1,99)];

k = [0 randn(1,99)];

parfor ind = 2:100 

   x(ind) = x(ind-1) + k(ind)^2;

end

%% the error messages from above code:

The PARFOR loop cannot run due to the way variable 'x' is used.

In a PARFOR loop, variable 'x' is indexed in different ways, potentially 

causing dependencies between iterations.

Here's the for loop equivalent, no warning.

x = [17 zeros(1,99)];

k = [0 randn(1,99)];

for ind = 2:100  

   x(ind) = x(ind-1) + k(ind)^2;

end

xx = x;

And here's how you might parallelize it

x = [17 zeros(1,99)];

tmp = zeros(1,100);

parfor ind = 2:100

   tmp(ind) = k(ind)^2;

end

x = cumsum(x+tmp);

aretheyequal = isequal(x,xx)

aretheyequal = logical
   1

Thoughts?

Do you often work with large datasets or simulations where getting access to hardware suitable for running code in parallel helps your work scale?  Did your code fall into any of the simple categories I mentioned in this post?  I would love to hear from you about any issues you have wrangling your code into a form suitable for parallelization.  Post your thoughts here.

Run in your browser