Using parfor Loops: Getting Up and Running

Posted by Loren Shure, October 2, 2009

12 views (last 30 days) | 0 Likes | 81 comments

Note

NOTE: matlabpool was removed in 2015 and you should replace that with parpool instead.

Today I’d like to introduce a guest blogger, Sarah Wait Zaranek, who is an application engineer here at The MathWorks. Sarah previously has written about speeding up code from a customer to get acceptable performance. She again will be writing about speeding up MATLAB applications, but this time her focus will be on using the parallel computing tools.

Introduction
Method
Background on parfor-loops
Opening the matlabpool
Independence
Globals and Transparency
Classification
Uniqueness
Your examples

Introduction

I wanted to write a post to help users better understand our parallel computing tools. In this post, I will focus on one of the more commonly used functions in these tools: the parfor-loop.

This post will focus on getting a parallel code using parfor up and running. Performance will not be addressed in this post. I will assume that the reader has a basic knowledge of the parfor-loop construct. Loren has a very nice introduction to using parfor in one of her previous posts. There are also some nice introductory videos.

Note for clarity : Since Loren's introductory post, the toolbox used for parallel computing has changed names from the Distributed Computing Toolbox to the Parallel Computing Toolbox. These are not two separate toolboxes.

Method

In some cases, you may only need to change a for-loop to a parfor-loop to get their code running in parallel. However, in other cases you may need to slightly alter the code so that parfor can work. I decided to show a few examples highlighting the main challenges that one might encounter. I have separated these examples into four encompassing categories:

Independence
Globals and Transparency
Classification
Uniqueness

Background on parfor-loops

In a parfor-loop (just like in a standard for-loop) a series of statements known as the loop body are iterated over a range of values. However, when using a parfor-loop the iterations are run not on the client MATLAB machine but are run in parallel on MATLAB workers.

Each worker has its own unique workspace. So, the data needed to do these calculations is sent from the client to workers, and the results are sent back to the client and pieced together. The cool thing about parfor is this data transfer is handled for the user. When MATLAB gets to the parfor-loop, it statically analyzes the body of the parfor-loop and determines what information goes to which worker and what variables will be returning to the client MATLAB. Understanding this concept will become important when understanding why particular constraints are placed on the use of parfor.

Opening the matlabpool

Before looking at some examples, I will open up a matlabpool so I can run my loops in parallel. I will be opening up the matlabpool using my default local configuration (i.e. my workers will be running on the dual-core laptop machine where my MATLAB has been installed).

if matlabpool('size') == 0 % checking to see if my pool is already open
    matlabpool open 2
end

Starting matlabpool using the 'local' configuration ... connected to 2 labs.

Note : The 'size' option was new in R2008b.

Independence

The parfor-loop is designed for task-parallel types of problems where each iteration of the loop is independent of each other iteration. This is a critical requirement for using a parfor-loop. Let's see an example of when each iteration is not independent.

type dependentLoop.m

% Example of a dependent for-loop
a = zeros(1,10);

parfor it = 1:10 
    a(it) = someFunction(a(it-1));
end

Checking the above code using M-Lint (MATLAB's static code analyzer) gives a warning message that these iterations are dependent and will not work with the parfor construct. M-Lint can either be accessed via the editor or command line. In this case, I use the command line and have defined a simple function displayMlint so that the display is compact.

output = mlint('dependentLoop.m');
displayMlint(output)

The PARFOR loop cannot run due to 
 the way variable 'a' is used. 

In a PARFOR loop, variable 'a' is 
 indexed in different ways, 
 potentially causing dependencies 
 between iterations.

Sometimes loops are intrinsically or unavoidably dependent, and therefore parfor is not a good fit for that type of calculation. However, in some cases it is possible to reformulate the body of the loop to eliminate the dependency or separate it from the main time-consuming calculation.

Globals and Transparency

All variables within the body of a parfor-loop must be transparent. This means that all references to variables must occur in the text of the program. Since MATLAB is statically analyzing the loops to figure out what data goes to what worker and what data comes back, this seems like an understandable restriction.

Therefore, the following commands cannot be used within the body of a parfor-loop : evalc, eval, evalin, and assignin. load can also not be used unless the output of load is assigned to a variable name. It is possible to use the above functions within a function called by parfor, due to the fact that the function has its own workspace. I have found that this is often the easiest workaround for the transparency issue.

Additionally, you cannot define global variables or persistent variables within the body of the parfor loop. I would also suggest being careful with the use of globals since changes in global values on workers are not automatically reflected in local global values.

Classification

A detailed description of the classification of variables in a parfor-loop is in the documentation. I think it is useful to view classification as representing the different ways a variable is passed between client and worker and the different ways it is used within the body of the parfor-loop.

Challenges with Classification

Often challenges arise when first converting for-loops to parfor-loops due to issues with this classification. An often seen issue is the conversion of nested for-loops, where sliced variables are not indexed appropriately.

Sliced variables are variables where each worker is calculating on a different part of that variable. Therefore, sliced variables are sliced or divided amongst the workers. Sliced variables are used to prevent unneeded data transfer from client to worker.

Using parfor with Nested for-Loops

The loop below is nested and encounters some of the restrictions placed on parfor for sliced variables.

type parforNestTry.m

A1 = zeros(10,10); 

parfor ix = 1:10
    for jx = 1:10
        A1(ix, jx) = ix + jx;
    end
end

output = mlint('parforNestTry.m');
displayMlint(output);

The PARFOR loop cannot run due to 
 the way variable 'A1' is used. 

Valid indices for 'A1' are 
 restricted in PARFOR loops.

In this case, A1 is a sliced variable. For sliced variables, the restrictions are placed on the first-level variable indices. This allows parfor to easily distribute the right part of the variable to the right workers.

The first level indexing ,in general, refers to indexing within the first set of parenthesis or braces. This is explained in more detail in the same section as classification in the documentation.

One of these first-level indices must be the loop counter variable or the counter variable plus or minus a constant. Every other first-level index must be a constant, a non-loop counter variable, a colon, or an end.

In this case, A1 has an loop counter variable for both first level indices (ix and jx).

The solution to this is make sure a loop counter variable is only one of the indices of A1 and make the other index a colon. To implement this, the results of the inner loop can be saved to a new variable and then that variable can be saved to the desired variable outside the nested loop.

A2 = zeros(10,10);

parfor ix = 1:10
    myTemp = zeros(1,10);
    for jx = 1:10
        myTemp(jx) = ix + jx;
    end
    A2(ix,:) = myTemp;
end

You can also solve this issue by using cells. Since jx is now in the second level of indexing, it can be an loop counter variable.

A3 = cell(10,1);

parfor ix = 1:10
    for jx = 1:10
        A3{ix}(jx) = ix + jx;
    end
end

A3 = cell2mat(A3);

I have found that both solutions have their benefits. While cells may be easier to implement in your code, they also result in A3 using more memory due to the additional memory requirements for cells. The call to cell2mat also adds additional processing time.

A similar technique can be used for several levels of nested for-loops.

Uniqueness

Doing Machine Specific Calculations

This is a way, while using parfor-loops, to determine which machine you are on and do machine specific instructions within the loop. An example of why you would want to do this is if different machines have data files in different directories, and you wanted to make sure to get into the right directory. Do be careful if you make the code machine-specific since it will be harder to port.

% Getting the machine host name

[~,hostname] = system('hostname');

% If the loop iterations are the same as the size of matlabpool, the
% command is run once per worker.

parfor ix = 1:matlabpool('size')
    [~,hostnameID{ix}] = system('hostname');
end

% Can then do host/machine specific commands
hostnames = unique(hostnameID);
checkhost = hostnames(1);

parfor ix = 1:matlabpool('size')
    [~,myhost] = system('hostname');
    if strcmp(myhost,checkhost)
       display('On Machine 1')
    else
        display('NOT on Machine 1')
    end
end

On Machine 1
On Machine 1

In my case since I am running locally -- all of the workers are on the same machine.

Here's the same code running on a non-local cluster.

matlabpool close
matlabpool open speedy
parfor ix = 1:matlabpool('size')
    [~,hostnameID{ix}] = system('hostname');
end

% Can then do host/machine specific commands
hostnames = unique(hostnameID);
checkhost = hostnames(1);

parfor ix = 1:matlabpool('size')
    [~,myhost] = system('hostname');
    if strcmp(myhost,checkhost)
       display('On Machine 1')
    else
        display('NOT on Machine 1')
    end
end

Sending a stop signal to all the labs ... stopped.
Starting matlabpool using the 'speedy' configuration ... connected to 16 labs.
On Machine 1
On Machine 1
On Machine 1
NOT on Machine 1
On Machine 1
NOT on Machine 1
NOT on Machine 1
NOT on Machine 1
NOT on Machine 1
NOT on Machine 1
NOT on Machine 1
NOT on Machine 1
NOT on Machine 1
NOT on Machine 1
NOT on Machine 1
NOT on Machine 1

Note: The ~ feature is new in R2009b and discussed as a new feature in one of Loren's previous blog posts.

Doing Worker Specific Calculations

I would suggest using the new spmd functionality to do worker specific calculations. For more information about spmd, check out the documentation.

Clean up

matlabpool close

Sending a stop signal to all the labs ... stopped.

Your examples

Tell me about some of the ways you have used parfor-loops or feel free to post questions regarding non-performance related issues that haven't been addressed here. Post your questions and thoughts here.

Published with MATLAB® 7.9