Encouragingly Parallel (Part 2)

Posted by Andy Campbell, March 3, 2015

10 views (last 30 days) | 0 Likes | 1 comment

Last time we showed that using a simple parfor loop we could reduce the runtime of a representative test suite from 6 to 7 minutes (when run serially) down to a minute and a half. However, we still faced some problems:

A minute and a half is a huge improvement over 6 to 7 minutes, but will still lose my attention.
The test output was garbled together without rhyme or reason.
The approach was inefficiently using the computational resources available. While the wall-clock time was reduced due to parallelization, there was a significant jump in the overall testing time required because we weren't leveraging the efficiency gains from sharing a fixture across multiple tests.

OK, let's address these issues. First, we'll need to create the same representative suite used in the previous post:

import matlab.unittest.TestSuite;
classSuite = TestSuite.fromFile('aClassBasedTest.m');
fcnSuite = TestSuite.fromFile('aFunctionBasedTest.m');
scriptSuite = TestSuite.fromFile('aScriptBasedTest.m');

suite = [repmat(classSuite, 1, 50), repmat(fcnSuite, 1, 50), repmat(scriptSuite, 1, 50)];

Add some sophistication

Rather than using parfor to run these tests, how might this be done if we take matters a bit more into our own hands using parfeval?

First we can get some information from the parallel pool to tell us the number of workers we have at our disposal. We can use this information to split our suite into groups and run a single group per worker to reduce overhead and better utilize our TestClassSetup/setupOnce fixtures:

p = gcp();
numWorkers = p.NumWorkers;

numGroupsToSchedule = numWorkers; % Run one group/worker

Next we need to schedule our groups. We can do this by iterating over our suite and using parfeval to schedule the suite divided into the number of groups we plan to use. Note, since we need to take the ceiling of the group size, group sizes scheduled later may be smaller than the initial group sizes, which is why the groupLength is calculated for each iteration through the scheduling loop.

% Schedule the groups
while ~isempty(unscheduledSuite)
    groupLength = ceil(numel(unscheduledSuite)/numGroupsToSchedule);
    groups(numGroupsToSchedule) = ...
        parfeval(@run, 1, unscheduledSuite(1:groupLength)');
    % 99 groups of tests on the wall, take one down, pass it around,
    % 98 groups of tests on the wall...
    numGroupsToSchedule = numGroupsToSchedule - 1;
    unscheduledSuite(1:groupLength) = [];
end
% Remove any groups not needed (this happens when the suite size is
% smaller than the number of workers)
groups(1:numGroupsToSchedule) = [];
groups = flip(groups);

Once they are scheduled the workers are already off and running, but we can print a little information and fetch the results:

fprintf('Split suite across %d groups on %d workers.\n', numel(groups), numWorkers);
results = fetchOutputs(groups);

There's a bit of code here so why don't we put it into a function runWithParfeval. Here it is in its entirety:

function results = runWithParfeval(suite)

p = gcp();
numWorkers = p.NumWorkers;

numGroupsToSchedule = numWorkers; % Run one group/worker
unscheduledSuite = suite;

% Schedule the groups
while ~isempty(unscheduledSuite)
    groupLength = ceil(numel(unscheduledSuite)/numGroupsToSchedule);
    groups(numGroupsToSchedule) = ...
        parfeval(@run, 1, unscheduledSuite(1:groupLength)');
    % 99 groups of tests on the wall, take one down, pass it around,
    % 98 groups of tests on the wall...
    numGroupsToSchedule = numGroupsToSchedule - 1;
    unscheduledSuite(1:groupLength) = [];
end
% Remove any groups not needed (this happens when the suite size is
% smaller than the number of workers)
groups(1:numGroupsToSchedule) = [];
groups = flip(groups);

fprintf('Split suite across %d groups on %d workers.\n', numel(groups), numWorkers);
results = fetchOutputs(groups);
end

Let's run it!

tic;
results = runWithParfeval_v1(suite)
toc;

Split suite across 16 groups on 16 workers.

results = 

  300x1 TestResult array with properties:

    Name
    Passed
    Failed
    Incomplete
    Duration

Totals:
   300 Passed, 0 Failed, 0 Incomplete.
   376.1316 seconds testing time.

Elapsed time is 62.155109 seconds.

Where did it go?

Well, the good news is the output is gone and the bad news is the output is gone. Also, it seems we are fairly efficient in the total execution time since the actual testing time is close to that when run serially. What's more, we are a bit faster, running in about a minute's time. Let's see if we can add some more output to gain more insight into what is going on during this run. In order to add some output, let's create a simple local function that takes an index and the diary output from the parallel.FevalFuture and prints it:

function printResult(idx, output)
bar = '*****************';
fprintf('%s\nFinished Group %d\n%s\n%s', bar, idx, bar, output);
end

Now with this printed output we will be able to see what happened during each group. Using the fetchNext method on the parallel.FevalFuture array as each group is completed allows us to print each group's output as it finishes like so:

for idx = 1:numel(groups)
    % fetchNext blocks until next results are available.
    groupIdx = fetchNext(groups);
    printResult(groupIdx, groups(groupIdx).Diary);
end
results = fetchOutputs(groups);

tic;
results = runWithParfeval_v2(suite)
toc;

Split suite across 16 groups on 16 workers.
*****************
Finished Group 16
*****************
Running aScriptBasedTest
..........
........
Done aScriptBasedTest
__________

*****************
Finished Group 15
*****************
Running aScriptBasedTest
..........
........
Done aScriptBasedTest
__________

*****************
Finished Group 14
*****************
Running aScriptBasedTest
..........
........
Done aScriptBasedTest
__________

*****************
Finished Group 13
*****************
Running aScriptBasedTest
..........
........
Done aScriptBasedTest
__________

*****************
Finished Group 12
*****************
Running aScriptBasedTest
..........
.........
Done aScriptBasedTest
__________

*****************
Finished Group 7
*****************
Running aFunctionBasedTest
..........
.........
Done aFunctionBasedTest
__________

*****************
Finished Group 11
*****************
Running aFunctionBasedTest
..........
Done aFunctionBasedTest
__________

Running aScriptBasedTest
.........
Done aScriptBasedTest
__________

*****************
Finished Group 8
*****************
Running aFunctionBasedTest
..........
.........
Done aFunctionBasedTest
__________

*****************
Finished Group 9
*****************
Running aFunctionBasedTest
..........
.........
Done aFunctionBasedTest
__________

*****************
Finished Group 10
*****************
Running aFunctionBasedTest
..........
.........
Done aFunctionBasedTest
__________

*****************
Finished Group 6
*****************
Running aClassBasedTest
.....
Done aClassBasedTest
__________

Running aFunctionBasedTest
..........
....
Done aFunctionBasedTest
__________

*****************
Finished Group 1
*****************
Running aClassBasedTest
..........
.........
Done aClassBasedTest
__________

*****************
Finished Group 4
*****************
Running aClassBasedTest
..........
.........
Done aClassBasedTest
__________

*****************
Finished Group 2
*****************
Running aClassBasedTest
..........
.........
Done aClassBasedTest
__________

*****************
Finished Group 3
*****************
Running aClassBasedTest
..........
.........
Done aClassBasedTest
__________

*****************
Finished Group 5
*****************
Running aClassBasedTest
..........
.........
Done aClassBasedTest
__________


results = 

  300x1 TestResult array with properties:

    Name
    Passed
    Failed
    Incomplete
    Duration

Totals:
   300 Passed, 0 Failed, 0 Incomplete.
   407.0311 seconds testing time.

Elapsed time is 60.451688 seconds.

The Long Pole With this output you can observe that the first groups didn't finish very quickly at all. Doesn't it seem odd that the first groups scheduled were the last groups to finish? This occurs because the first groups are the long pole in our test execution. When we split our test suite into one group per worker one aspect we failed to consider is that these tests take differing amounts of time to execute. In our case, the class-based test has a test method which takes an order of magnitude longer to execute (remember aClassBasedTest/testLongRunningEndToEndWorkflow?). Furthermore we loaded these tests on the front of our test suite and executed them all serially on some of the workers. These tests have now become our bottleneck. Let's tweak our approach a bit to be robust to such "long poles". If instead of scheduling one group per worker, why don't we try to strike a balance between communication overhead, shared test fixtures, and varying test times by scheduling each worker to execute 3 groups instead of one. Let's change just one line as follows and try it out:

numGroups = numWorkers*3; % Run three groups/worker

tic;
results = runWithParfeval_final(suite)
toc;

Split suite across 48 groups on 16 workers.
*****************
Finished Group 16
*****************
Running aFunctionBasedTest
......
Done aFunctionBasedTest
__________

*****************
Finished Group 6
*****************
Running aClassBasedTest
.......
Done aClassBasedTest
__________

*****************
Finished Group 12
*****************
Running aClassBasedTest
.......
Done aClassBasedTest
__________

*****************
Finished Group 17
*****************
Running aFunctionBasedTest
......
Done aFunctionBasedTest
__________

*****************
Finished Group 18
*****************
Running aFunctionBasedTest
......
Done aFunctionBasedTest
__________

*****************
Finished Group 8
*****************
Running aClassBasedTest
.......
Done aClassBasedTest
__________

*****************
Finished Group 7
*****************
Running aClassBasedTest
.......
Done aClassBasedTest
__________

*****************
Finished Group 13
*****************
Running aClassBasedTest
......
Done aClassBasedTest
__________

*****************
Finished Group 2
*****************
Running aClassBasedTest
.......
Done aClassBasedTest
__________

*****************
Finished Group 14
*****************
Running aClassBasedTest
......
Done aClassBasedTest
__________

*****************
Finished Group 9
*****************
Running aClassBasedTest
.......
Done aClassBasedTest
__________

*****************
Finished Group 22
*****************
Running aFunctionBasedTest
......
Done aFunctionBasedTest
__________

*****************
Finished Group 10
*****************
Running aClassBasedTest
.......
Done aClassBasedTest
__________

*****************
Finished Group 19
*****************
Running aFunctionBasedTest
......
Done aFunctionBasedTest
__________

*****************
Finished Group 21
*****************
Running aFunctionBasedTest
......
Done aFunctionBasedTest
__________

*****************
Finished Group 15
*****************
Running aClassBasedTest
....
Done aClassBasedTest
__________

Running aFunctionBasedTest
..
Done aFunctionBasedTest
__________

*****************
Finished Group 27
*****************
Running aFunctionBasedTest
......
Done aFunctionBasedTest
__________

*****************
Finished Group 25
*****************
Running aFunctionBasedTest
......
Done aFunctionBasedTest
__________

*****************
Finished Group 20
*****************
Running aFunctionBasedTest
......
Done aFunctionBasedTest
__________

*****************
Finished Group 11
*****************
Running aClassBasedTest
.......
Done aClassBasedTest
__________

*****************
Finished Group 1
*****************
Running aClassBasedTest
.......
Done aClassBasedTest
__________

*****************
Finished Group 32
*****************
Running aFunctionBasedTest
..
Done aFunctionBasedTest
__________

Running aScriptBasedTest
....
Done aScriptBasedTest
__________

*****************
Finished Group 30
*****************
Running aFunctionBasedTest
......
Done aFunctionBasedTest
__________

*****************
Finished Group 26
*****************
Running aFunctionBasedTest
......
Done aFunctionBasedTest
__________

*****************
Finished Group 24
*****************
Running aFunctionBasedTest
......
Done aFunctionBasedTest
__________

*****************
Finished Group 5
*****************
Running aClassBasedTest
.......
Done aClassBasedTest
__________

*****************
Finished Group 33
*****************
Running aScriptBasedTest
......
Done aScriptBasedTest
__________

*****************
Finished Group 34
*****************
Running aScriptBasedTest
......
Done aScriptBasedTest
__________

*****************
Finished Group 37
*****************
Running aScriptBasedTest
......
Done aScriptBasedTest
__________

*****************
Finished Group 36
*****************
Running aScriptBasedTest
......
Done aScriptBasedTest
__________

*****************
Finished Group 35
*****************
Running aScriptBasedTest
......
Done aScriptBasedTest
__________

*****************
Finished Group 38
*****************
Running aScriptBasedTest
......
Done aScriptBasedTest
__________

*****************
Finished Group 3
*****************
Running aClassBasedTest
.......
Done aClassBasedTest
__________

*****************
Finished Group 4
*****************
Running aClassBasedTest
.......
Done aClassBasedTest
__________

*****************
Finished Group 23
*****************
Running aFunctionBasedTest
......
Done aFunctionBasedTest
__________

*****************
Finished Group 44
*****************
Running aScriptBasedTest
......
Done aScriptBasedTest
__________

*****************
Finished Group 39
*****************
Running aScriptBasedTest
......
Done aScriptBasedTest
__________

*****************
Finished Group 41
*****************
Running aScriptBasedTest
......
Done aScriptBasedTest
__________

*****************
Finished Group 46
*****************
Running aScriptBasedTest
......
Done aScriptBasedTest
__________

*****************
Finished Group 40
*****************
Running aScriptBasedTest
......
Done aScriptBasedTest
__________

*****************
Finished Group 42
*****************
Running aScriptBasedTest
......
Done aScriptBasedTest
__________

*****************
Finished Group 29
*****************
Running aFunctionBasedTest
......
Done aFunctionBasedTest
__________

*****************
Finished Group 31
*****************
Running aFunctionBasedTest
......
Done aFunctionBasedTest
__________

*****************
Finished Group 28
*****************
Running aFunctionBasedTest
......
Done aFunctionBasedTest
__________

*****************
Finished Group 48
*****************
Running aScriptBasedTest
......
Done aScriptBasedTest
__________

*****************
Finished Group 43
*****************
Running aScriptBasedTest
......
Done aScriptBasedTest
__________

*****************
Finished Group 47
*****************
Running aScriptBasedTest
......
Done aScriptBasedTest
__________

*****************
Finished Group 45
*****************
Running aScriptBasedTest
......
Done aScriptBasedTest
__________


results = 

  300x1 TestResult array with properties:

    Name
    Passed
    Failed
    Incomplete
    Duration

Totals:
   300 Passed, 0 Failed, 0 Incomplete.
   441.8035 seconds testing time.

Elapsed time is 31.292908 seconds.

Beautiful! We get an order of magnitude speed up from the serial case, it executes in a third the wall-clock time as parfor and half the time as parfeval with one group per worker. Add to the mix the rational test output and we are gold! Here is the finished function:

function results = runWithParfeval(suite)

p = gcp();
numWorkers = p.NumWorkers;

numGroupsToSchedule = numWorkers*3; % Run three groups/worker
unscheduledSuite = suite;

% Schedule the groups
while ~isempty(unscheduledSuite)
    groupLength = ceil(numel(unscheduledSuite)/numGroupsToSchedule);
    groups(numGroupsToSchedule) = ...
        parfeval(@run, 1, unscheduledSuite(1:groupLength)');
    % 99 groups of tests on the wall, take one down, pass it around,
    % 98 groups of tests on the wall...
    numGroupsToSchedule = numGroupsToSchedule - 1;
    unscheduledSuite(1:groupLength) = [];
end
% Remove any groups not needed (this happens when the suite size is
% smaller than the number of workers)
groups(1:numGroupsToSchedule) = [];
groups = flip(groups);

fprintf('Split suite across %d groups on %d workers.\n', numel(groups), numWorkers);

for idx = 1:numel(groups)
    % fetchNext blocks until next results are available.
    groupIdx = fetchNext(groups);
    printResult(groupIdx, groups(groupIdx).Diary);
end
results = fetchOutputs(groups);
end

function printResult(idx, output)
bar = '*****************';
fprintf('%s\nFinished Group %d\n%s\n%s', bar, idx, bar, output);
end

Conclusion

As demonstrated we can leverage parallelism in our day to day testing workflows to significantly speed up the wall clock time of our code execution. This can be done trivially using parfor, but the power of parfeval can be utilized to further improve the execution time, better utilize shared fixtures on the workers, and control test output, among many other benefits not shown here.

Have you used any other methods for running tests in parallel? If so share your experiences in the comments!

Published with MATLAB® R2014b