Run Workers Run!

Posted by Loren Shure, April 27, 2016

3 views (last 30 days) | 0 Likes | 4 comments

Today's guest post comes from Sean de Wolski, one of Loren's fellow Application Engineers. You might recognize him from MATLAB answers and the pick of the week blog!

This blog post is going to focus on subtle differences in approaches to running commands on all Parallel Computing Toolbox workers.

Background
pctRunOnAll
SPMD
parfevalOnAll
Table of Tradeoffs
Lack of PARFOR
Comments and Feedback

Background

The Parallel Computing Toolbox allows you to open headless MATLAB workers that you can then distribute work to in order to speed up your code or offload it from the main MATLAB session. Oftentimes, there is some initial setup that you will want to run on each of the workers in order to prepare them to run. Some examples include: loading libraries, changing the path, or opening a Simulink model. There are three different approaches to this each with their own characteristics.

This has long been a source of confusion for me, so I finally researched it to figure out when to use each. Let's walk through the use cases for each one.

pctRunOnAll

pctRunOnAll will run synchronously, can run one command at a time, and cannot have outputs returned. It runs the command on all of the workers and the client.

pctRunOnAll should be used for any type of setup where no output is expected and none of the workers are busy. If one worker is busy, control will not be returned until it has run on every worker so this could take a while waiting for the non-idle worker to finish.

Since it runs on all of the workers and the client, I like to use it when loading a Simulink model because I can then visually make sure the correct model opened on the client machine.

It uses command syntax so everything has to be defined as a string. For example:

pctRunOnAll load('Data.mat')

This can make it harder to parameterize inputs, i.e., you'll need to use sprintf and friends, so if there will be a lot of changes to the command being run it might pay to look at spmd. However, it is nice for a quick one-off command line based operations. Here's an example of how you would provide the MAT-file name in the above example:

file = 'Data.mat';
pctRunOnAll(sprintf('load(''%s'')',file)

pctRunOnAll will also not automatically start the parallel pool if one is not open. It will instead return an error. The other two approaches will spool up the parallel pool if you are using the default preference "Automatically create a parallel pool when certain language features detected".

SPMD

spmd stands for single program, multiple data. It will run the same set of commands synchronously on all of the workers in the parallel pool but not the client.

spmd has many uses besides just running the same command on every worker in that you can govern what happens on each worker based on the worker's index using the function labindex. Thus it allows for full message passing parallelization.

In terms of running the same command on each worker, it is convenient for grouping multiple commands together as everything in the spmd block runs and allows you to store output. Let's make a random 10000x1 vector on each worker and then take the mean to emulate a simple monte carlo simulation of the central limit theorem:

spmd
    x = rand(10000,1);
    xmean = mean(x);
end

x and xmean are composite arrays meaning that each worker has its own value and the data are stored in the workers' memory. You can index into it with curly braces, like a cell array, to grab back a local copy of any specific element or to modify the elements on a specific worker.

xmeans = [xmean{:}];
disp(xmeans)

      0.50508      0.50163

I have two workers so xmeans is a 1x2.

Using spmd does require that the parallel pool be SpmdEnabled. This is a setting you can change when you start the pool and is on by default. One reason for turning off this feature is that if one or more workers die the remaining workers can still be used by parfor-loops or parfeval statements sans the dead worker(s).

parfevalOnAll

parfevalOnAll, first released in R2013b with the new parallel computing API, is the latest and greatest in evaluating functions on workers. It runs asynchronously and allows you to return an output. The input has to be a function so you will need to encapsulate your commands into a function, be it in a file or anonymous.

The biggest feature here is that it is asynchronous; i.e once you run the parfevalOnAll command, and the futures are submitted, control is returned to the MATLAB command prompt so you can continue working and running other things. It should strongly be considered for anything that is compute intensive or for any operation driven by a user interface where you want the user interface to continue being useful while the operations are running.

Like spmd, parfevalOnAll will start a parallel pool and only runs on the workers.

Here is a simple example to demonstrate the power of parfevalOnAll. Let's run a computationally expensive command locally to get a feel for how long it takes.

f = @optimizeCraneSwing;
tic
xs = f();
toc

Elapsed time is 28.676841 seconds.

And now offload that to run on both workers:

tic
futures = parfevalOnAll(f, 1);
tsubmit = toc
disp('Futures submitted; command line access returned!')
wait(futures);
tfinish = toc
disp('Futures finished')

tsubmit =
    0.0089678
Futures submitted; command line access returned!
tfinish =
       37.641
Futures finished

Notice how command line access returned almost immediately and the futures finished later. In order to get accurate timing, I waited for them to finish, and if you need the results before continuing, this is a necessary step. However, if you don't need the results, you can query the 'State' until they become finished and then fetch the outputs.

futures.State

ans =
finished

If you want to automate this, a timer can be used to query the state periodically.

Table of Tradeoffs

So finally, for a quick reference, here is a table of the tradeoffs to help you in future decisions.

Lack of PARFOR

You may have noticed that parfor did not make an appearance in the above list. Parallel for-loops do their own internal load balancing which does not guarantee utilization of every worker and therefore cannot be relied upon.