{"id":97,"date":"2015-03-03T22:34:27","date_gmt":"2015-03-03T22:34:27","guid":{"rendered":"https:\/\/blogs.mathworks.com\/developer\/?p=97"},"modified":"2015-03-04T16:58:46","modified_gmt":"2015-03-04T16:58:46","slug":"encouragingly-parallel-part-2","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/developer\/2015\/03\/03\/encouragingly-parallel-part-2\/","title":{"rendered":"Encouragingly Parallel (Part 2)"},"content":{"rendered":"<div class=\"content\"><!--introduction--><p><a href=\"https:\/\/blogs.mathworks.com\/developer\/2015\/02\/20\/encouragingly-parallel-part-1\/\">Last time<\/a> we showed that using a simple parfor loop we could reduce the runtime of a representative test suite from 6 to 7 minutes (when run serially) down to a minute and a half. However, we still faced some problems:<\/p><!--\/introduction--><div><ol><li>A minute and a half is a huge improvement over 6 to 7 minutes, but will still lose my attention.<\/li><li>The test output was garbled together without rhyme or reason.<\/li><li>The approach was inefficiently using the computational resources available. While the wall-clock time was reduced due to parallelization, there was a significant jump in the overall testing time required because we weren't leveraging the efficiency gains from sharing a fixture across multiple tests.<\/li><\/ol><\/div><p>OK, let's address these issues. First, we'll need to create the same representative suite used in the previous post:<\/p><pre class=\"codeinput\">import <span class=\"string\">matlab.unittest.TestSuite<\/span>;\r\nclassSuite = TestSuite.fromFile(<span class=\"string\">'aClassBasedTest.m'<\/span>);\r\nfcnSuite = TestSuite.fromFile(<span class=\"string\">'aFunctionBasedTest.m'<\/span>);\r\nscriptSuite = TestSuite.fromFile(<span class=\"string\">'aScriptBasedTest.m'<\/span>);\r\n\r\nsuite = [repmat(classSuite, 1, 50), repmat(fcnSuite, 1, 50), repmat(scriptSuite, 1, 50)];\r\n<\/pre><p><b>Add some sophistication<\/b><\/p><p>Rather than using <b><tt>parfor<\/tt><\/b> to run these tests, how might this be done if we take matters a bit more into our own hands using <a href=\"https:\/\/www.mathworks.com\/help\/distcomp\/parfeval.html\"><b><tt>parfeval<\/tt><\/b><\/a>?<\/p><p>First we can get some information from the parallel pool to tell us the number of workers we have at our disposal. We can use this information to split our suite into groups and run a single group per worker to reduce overhead and better utilize our TestClassSetup\/setupOnce fixtures:<\/p><pre class=\"language-matlab\">p = gcp();\r\nnumWorkers = p.NumWorkers;\r\n\r\nnumGroupsToSchedule = numWorkers; <span class=\"comment\">% Run one group\/worker<\/span>\r\n<\/pre><p>Next we need to schedule our groups. We can do this by iterating over our suite and using <b><tt>parfeval<\/tt><\/b> to schedule the suite divided into the number of groups we plan to use. Note, since we need to take the ceiling of the group size, group sizes scheduled later may be smaller than the initial group sizes, which is why the groupLength is calculated for each iteration through the scheduling loop.<\/p><pre class=\"language-matlab\"><span class=\"comment\">% Schedule the groups<\/span>\r\n<span class=\"keyword\">while<\/span> ~isempty(unscheduledSuite)\r\n    groupLength = ceil(numel(unscheduledSuite)\/numGroupsToSchedule);\r\n    groups(numGroupsToSchedule) = <span class=\"keyword\">...<\/span>\r\n        parfeval(@run, 1, unscheduledSuite(1:groupLength)');\r\n    <span class=\"comment\">% 99 groups of tests on the wall, take one down, pass it around,<\/span>\r\n    <span class=\"comment\">% 98 groups of tests on the wall...<\/span>\r\n    numGroupsToSchedule = numGroupsToSchedule - 1;\r\n    unscheduledSuite(1:groupLength) = [];\r\n<span class=\"keyword\">end<\/span>\r\n<span class=\"comment\">% Remove any groups not needed (this happens when the suite size is<\/span>\r\n<span class=\"comment\">% smaller than the number of workers)<\/span>\r\ngroups(1:numGroupsToSchedule) = [];\r\ngroups = flip(groups);\r\n<\/pre><p>Once they are scheduled the workers are already off and running, but we can print a little information and fetch the results:<\/p><pre class=\"language-matlab\">fprintf(<span class=\"string\">'Split suite across %d groups on %d workers.\\n'<\/span>, numel(groups), numWorkers);\r\nresults = fetchOutputs(groups);\r\n<\/pre><p>There's a bit of code here so why don't we put it into a function <b><tt>runWithParfeval<\/tt><\/b>. Here it is in its entirety:<\/p><pre class=\"language-matlab\"><span class=\"keyword\">function<\/span> results = runWithParfeval(suite)\r\n\r\np = gcp();\r\nnumWorkers = p.NumWorkers;\r\n\r\nnumGroupsToSchedule = numWorkers; <span class=\"comment\">% Run one group\/worker<\/span>\r\nunscheduledSuite = suite;\r\n\r\n<span class=\"comment\">% Schedule the groups<\/span>\r\n<span class=\"keyword\">while<\/span> ~isempty(unscheduledSuite)\r\n    groupLength = ceil(numel(unscheduledSuite)\/numGroupsToSchedule);\r\n    groups(numGroupsToSchedule) = <span class=\"keyword\">...<\/span>\r\n        parfeval(@run, 1, unscheduledSuite(1:groupLength)');\r\n    <span class=\"comment\">% 99 groups of tests on the wall, take one down, pass it around,<\/span>\r\n    <span class=\"comment\">% 98 groups of tests on the wall...<\/span>\r\n    numGroupsToSchedule = numGroupsToSchedule - 1;\r\n    unscheduledSuite(1:groupLength) = [];\r\n<span class=\"keyword\">end<\/span>\r\n<span class=\"comment\">% Remove any groups not needed (this happens when the suite size is<\/span>\r\n<span class=\"comment\">% smaller than the number of workers)<\/span>\r\ngroups(1:numGroupsToSchedule) = [];\r\ngroups = flip(groups);\r\n\r\nfprintf(<span class=\"string\">'Split suite across %d groups on %d workers.\\n'<\/span>, numel(groups), numWorkers);\r\nresults = fetchOutputs(groups);\r\n<span class=\"keyword\">end<\/span>\r\n<\/pre><p>Let's run it!<\/p><pre class=\"codeinput\">tic;\r\nresults = runWithParfeval_v1(suite)\r\ntoc;\r\n<\/pre><pre class=\"codeoutput\">Split suite across 16 groups on 16 workers.\r\n\r\nresults = \r\n\r\n  300x1 TestResult array with properties:\r\n\r\n    Name\r\n    Passed\r\n    Failed\r\n    Incomplete\r\n    Duration\r\n\r\nTotals:\r\n   300 Passed, 0 Failed, 0 Incomplete.\r\n   376.1316 seconds testing time.\r\n\r\nElapsed time is 62.155109 seconds.\r\n<\/pre><p><b>Where did it go?<\/b><\/p><p>Well, the good news is the output is gone and the bad news is the output is gone. Also, it seems we are fairly efficient in the total execution time since the actual testing time is close to that when run serially. What's more, we are a bit faster, running in about a minute's time. Let's see if we can add some more output to gain more insight into what is going on during this run. In order to add some output, let's create a simple local function that takes an index and the diary output from the <b><tt>parallel.FevalFuture<\/tt><\/b> and prints it:<\/p><pre class=\"language-matlab\"><span class=\"keyword\">function<\/span> printResult(idx, output)\r\nbar = <span class=\"string\">'*****************'<\/span>;\r\nfprintf(<span class=\"string\">'%s\\nFinished Group %d\\n%s\\n%s'<\/span>, bar, idx, bar, output);\r\n<span class=\"keyword\">end<\/span>\r\n<\/pre><p>Now with this printed output we will be able to see what happened during each group. Using the <b><tt>fetchNext<\/tt><\/b> method on the <b><tt>parallel.FevalFuture<\/tt><\/b> array as each group is completed allows us to print each group's output as it finishes like so:<\/p><pre class=\"language-matlab\"><span class=\"keyword\">for<\/span> idx = 1:numel(groups)\r\n    <span class=\"comment\">% fetchNext blocks until next results are available.<\/span>\r\n    groupIdx = fetchNext(groups);\r\n    printResult(groupIdx, groups(groupIdx).Diary);\r\n<span class=\"keyword\">end<\/span>\r\nresults = fetchOutputs(groups);\r\n<\/pre><pre class=\"codeinput\">tic;\r\nresults = runWithParfeval_v2(suite)\r\ntoc;\r\n<\/pre><pre class=\"codeoutput\">Split suite across 16 groups on 16 workers.\r\n*****************\r\nFinished Group 16\r\n*****************\r\nRunning aScriptBasedTest\r\n..........\r\n........\r\nDone aScriptBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 15\r\n*****************\r\nRunning aScriptBasedTest\r\n..........\r\n........\r\nDone aScriptBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 14\r\n*****************\r\nRunning aScriptBasedTest\r\n..........\r\n........\r\nDone aScriptBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 13\r\n*****************\r\nRunning aScriptBasedTest\r\n..........\r\n........\r\nDone aScriptBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 12\r\n*****************\r\nRunning aScriptBasedTest\r\n..........\r\n.........\r\nDone aScriptBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 7\r\n*****************\r\nRunning aFunctionBasedTest\r\n..........\r\n.........\r\nDone aFunctionBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 11\r\n*****************\r\nRunning aFunctionBasedTest\r\n..........\r\nDone aFunctionBasedTest\r\n__________\r\n\r\nRunning aScriptBasedTest\r\n.........\r\nDone aScriptBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 8\r\n*****************\r\nRunning aFunctionBasedTest\r\n..........\r\n.........\r\nDone aFunctionBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 9\r\n*****************\r\nRunning aFunctionBasedTest\r\n..........\r\n.........\r\nDone aFunctionBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 10\r\n*****************\r\nRunning aFunctionBasedTest\r\n..........\r\n.........\r\nDone aFunctionBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 6\r\n*****************\r\nRunning aClassBasedTest\r\n.....\r\nDone aClassBasedTest\r\n__________\r\n\r\nRunning aFunctionBasedTest\r\n..........\r\n....\r\nDone aFunctionBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 1\r\n*****************\r\nRunning aClassBasedTest\r\n..........\r\n.........\r\nDone aClassBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 4\r\n*****************\r\nRunning aClassBasedTest\r\n..........\r\n.........\r\nDone aClassBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 2\r\n*****************\r\nRunning aClassBasedTest\r\n..........\r\n.........\r\nDone aClassBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 3\r\n*****************\r\nRunning aClassBasedTest\r\n..........\r\n.........\r\nDone aClassBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 5\r\n*****************\r\nRunning aClassBasedTest\r\n..........\r\n.........\r\nDone aClassBasedTest\r\n__________\r\n\r\n\r\nresults = \r\n\r\n  300x1 TestResult array with properties:\r\n\r\n    Name\r\n    Passed\r\n    Failed\r\n    Incomplete\r\n    Duration\r\n\r\nTotals:\r\n   300 Passed, 0 Failed, 0 Incomplete.\r\n   407.0311 seconds testing time.\r\n\r\nElapsed time is 60.451688 seconds.\r\n<\/pre><p><b>The Long Pole<\/b> With this output you can observe that the first groups didn't finish very quickly at all. Doesn't it seem odd that the first groups scheduled were the last groups to finish? This occurs because the first groups are the long pole in our test execution. When we split our test suite into one group per worker one aspect we failed to consider is that these tests take differing amounts of time to execute. In our case, the class-based test has a test method which takes an order of magnitude longer to execute (remember <b><tt>aClassBasedTest\/testLongRunningEndToEndWorkflow<\/tt><\/b>?). Furthermore we loaded these tests on the front of our test suite and executed them all serially on some of the workers. These tests have now become our bottleneck. Let's tweak our approach a bit to be robust to such \"long poles\". If instead of scheduling one group per worker, why don't we try to strike a balance between communication overhead, shared test fixtures, and varying test times by scheduling each worker to execute 3 groups instead of one. Let's change just one line as follows and try it out:<\/p><pre class=\"language-matlab\">numGroups = numWorkers*3; <span class=\"comment\">% Run three groups\/worker<\/span>\r\n<\/pre><pre class=\"codeinput\">tic;\r\nresults = runWithParfeval_final(suite)\r\ntoc;\r\n<\/pre><pre class=\"codeoutput\">Split suite across 48 groups on 16 workers.\r\n*****************\r\nFinished Group 16\r\n*****************\r\nRunning aFunctionBasedTest\r\n......\r\nDone aFunctionBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 6\r\n*****************\r\nRunning aClassBasedTest\r\n.......\r\nDone aClassBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 12\r\n*****************\r\nRunning aClassBasedTest\r\n.......\r\nDone aClassBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 17\r\n*****************\r\nRunning aFunctionBasedTest\r\n......\r\nDone aFunctionBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 18\r\n*****************\r\nRunning aFunctionBasedTest\r\n......\r\nDone aFunctionBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 8\r\n*****************\r\nRunning aClassBasedTest\r\n.......\r\nDone aClassBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 7\r\n*****************\r\nRunning aClassBasedTest\r\n.......\r\nDone aClassBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 13\r\n*****************\r\nRunning aClassBasedTest\r\n......\r\nDone aClassBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 2\r\n*****************\r\nRunning aClassBasedTest\r\n.......\r\nDone aClassBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 14\r\n*****************\r\nRunning aClassBasedTest\r\n......\r\nDone aClassBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 9\r\n*****************\r\nRunning aClassBasedTest\r\n.......\r\nDone aClassBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 22\r\n*****************\r\nRunning aFunctionBasedTest\r\n......\r\nDone aFunctionBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 10\r\n*****************\r\nRunning aClassBasedTest\r\n.......\r\nDone aClassBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 19\r\n*****************\r\nRunning aFunctionBasedTest\r\n......\r\nDone aFunctionBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 21\r\n*****************\r\nRunning aFunctionBasedTest\r\n......\r\nDone aFunctionBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 15\r\n*****************\r\nRunning aClassBasedTest\r\n....\r\nDone aClassBasedTest\r\n__________\r\n\r\nRunning aFunctionBasedTest\r\n..\r\nDone aFunctionBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 27\r\n*****************\r\nRunning aFunctionBasedTest\r\n......\r\nDone aFunctionBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 25\r\n*****************\r\nRunning aFunctionBasedTest\r\n......\r\nDone aFunctionBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 20\r\n*****************\r\nRunning aFunctionBasedTest\r\n......\r\nDone aFunctionBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 11\r\n*****************\r\nRunning aClassBasedTest\r\n.......\r\nDone aClassBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 1\r\n*****************\r\nRunning aClassBasedTest\r\n.......\r\nDone aClassBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 32\r\n*****************\r\nRunning aFunctionBasedTest\r\n..\r\nDone aFunctionBasedTest\r\n__________\r\n\r\nRunning aScriptBasedTest\r\n....\r\nDone aScriptBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 30\r\n*****************\r\nRunning aFunctionBasedTest\r\n......\r\nDone aFunctionBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 26\r\n*****************\r\nRunning aFunctionBasedTest\r\n......\r\nDone aFunctionBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 24\r\n*****************\r\nRunning aFunctionBasedTest\r\n......\r\nDone aFunctionBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 5\r\n*****************\r\nRunning aClassBasedTest\r\n.......\r\nDone aClassBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 33\r\n*****************\r\nRunning aScriptBasedTest\r\n......\r\nDone aScriptBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 34\r\n*****************\r\nRunning aScriptBasedTest\r\n......\r\nDone aScriptBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 37\r\n*****************\r\nRunning aScriptBasedTest\r\n......\r\nDone aScriptBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 36\r\n*****************\r\nRunning aScriptBasedTest\r\n......\r\nDone aScriptBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 35\r\n*****************\r\nRunning aScriptBasedTest\r\n......\r\nDone aScriptBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 38\r\n*****************\r\nRunning aScriptBasedTest\r\n......\r\nDone aScriptBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 3\r\n*****************\r\nRunning aClassBasedTest\r\n.......\r\nDone aClassBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 4\r\n*****************\r\nRunning aClassBasedTest\r\n.......\r\nDone aClassBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 23\r\n*****************\r\nRunning aFunctionBasedTest\r\n......\r\nDone aFunctionBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 44\r\n*****************\r\nRunning aScriptBasedTest\r\n......\r\nDone aScriptBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 39\r\n*****************\r\nRunning aScriptBasedTest\r\n......\r\nDone aScriptBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 41\r\n*****************\r\nRunning aScriptBasedTest\r\n......\r\nDone aScriptBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 46\r\n*****************\r\nRunning aScriptBasedTest\r\n......\r\nDone aScriptBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 40\r\n*****************\r\nRunning aScriptBasedTest\r\n......\r\nDone aScriptBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 42\r\n*****************\r\nRunning aScriptBasedTest\r\n......\r\nDone aScriptBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 29\r\n*****************\r\nRunning aFunctionBasedTest\r\n......\r\nDone aFunctionBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 31\r\n*****************\r\nRunning aFunctionBasedTest\r\n......\r\nDone aFunctionBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 28\r\n*****************\r\nRunning aFunctionBasedTest\r\n......\r\nDone aFunctionBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 48\r\n*****************\r\nRunning aScriptBasedTest\r\n......\r\nDone aScriptBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 43\r\n*****************\r\nRunning aScriptBasedTest\r\n......\r\nDone aScriptBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 47\r\n*****************\r\nRunning aScriptBasedTest\r\n......\r\nDone aScriptBasedTest\r\n__________\r\n\r\n*****************\r\nFinished Group 45\r\n*****************\r\nRunning aScriptBasedTest\r\n......\r\nDone aScriptBasedTest\r\n__________\r\n\r\n\r\nresults = \r\n\r\n  300x1 TestResult array with properties:\r\n\r\n    Name\r\n    Passed\r\n    Failed\r\n    Incomplete\r\n    Duration\r\n\r\nTotals:\r\n   300 Passed, 0 Failed, 0 Incomplete.\r\n   441.8035 seconds testing time.\r\n\r\nElapsed time is 31.292908 seconds.\r\n<\/pre><p>Beautiful! We get an order of magnitude speed up from the serial case, it executes in a third the wall-clock time as <b><tt>parfor<\/tt><\/b> and half the time as <b><tt>parfeval<\/tt><\/b> with one group per worker. Add to the mix the rational test output and we are gold! Here is the finished function:<\/p><pre class=\"language-matlab\"><span class=\"keyword\">function<\/span> results = runWithParfeval(suite)\r\n\r\np = gcp();\r\nnumWorkers = p.NumWorkers;\r\n\r\nnumGroupsToSchedule = numWorkers*3; <span class=\"comment\">% Run three groups\/worker<\/span>\r\nunscheduledSuite = suite;\r\n\r\n<span class=\"comment\">% Schedule the groups<\/span>\r\n<span class=\"keyword\">while<\/span> ~isempty(unscheduledSuite)\r\n    groupLength = ceil(numel(unscheduledSuite)\/numGroupsToSchedule);\r\n    groups(numGroupsToSchedule) = <span class=\"keyword\">...<\/span>\r\n        parfeval(@run, 1, unscheduledSuite(1:groupLength)');\r\n    <span class=\"comment\">% 99 groups of tests on the wall, take one down, pass it around,<\/span>\r\n    <span class=\"comment\">% 98 groups of tests on the wall...<\/span>\r\n    numGroupsToSchedule = numGroupsToSchedule - 1;\r\n    unscheduledSuite(1:groupLength) = [];\r\n<span class=\"keyword\">end<\/span>\r\n<span class=\"comment\">% Remove any groups not needed (this happens when the suite size is<\/span>\r\n<span class=\"comment\">% smaller than the number of workers)<\/span>\r\ngroups(1:numGroupsToSchedule) = [];\r\ngroups = flip(groups);\r\n\r\nfprintf(<span class=\"string\">'Split suite across %d groups on %d workers.\\n'<\/span>, numel(groups), numWorkers);\r\n\r\n<span class=\"keyword\">for<\/span> idx = 1:numel(groups)\r\n    <span class=\"comment\">% fetchNext blocks until next results are available.<\/span>\r\n    groupIdx = fetchNext(groups);\r\n    printResult(groupIdx, groups(groupIdx).Diary);\r\n<span class=\"keyword\">end<\/span>\r\nresults = fetchOutputs(groups);\r\n<span class=\"keyword\">end<\/span>\r\n\r\n<span class=\"keyword\">function<\/span> printResult(idx, output)\r\nbar = <span class=\"string\">'*****************'<\/span>;\r\nfprintf(<span class=\"string\">'%s\\nFinished Group %d\\n%s\\n%s'<\/span>, bar, idx, bar, output);\r\n<span class=\"keyword\">end<\/span>\r\n<\/pre><p><b>Conclusion<\/b><\/p><p>As demonstrated we can leverage parallelism in our day to day testing workflows to significantly speed up the wall clock time of our code execution. This can be done trivially using <b><tt>parfor<\/tt><\/b>, but the power of <b><tt>parfeval<\/tt><\/b> can be utilized to further improve the execution time, better utilize shared fixtures on the workers, and control test output, among many other benefits not shown here.<\/p><p>Have you used any other methods for running tests in parallel? If so share your experiences in the comments!<\/p><script language=\"JavaScript\"> <!-- \r\n    function grabCode_525556fbd97b471b98776fd30cefb98e() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='525556fbd97b471b98776fd30cefb98e ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 525556fbd97b471b98776fd30cefb98e';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2015 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_525556fbd97b471b98776fd30cefb98e()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2014b<br><\/p><\/div><!--\r\n525556fbd97b471b98776fd30cefb98e ##### SOURCE BEGIN #####\r\n%% Encouragingly Parallel (Part 2)\r\n% <https:\/\/blogs.mathworks.com\/developer\/2015\/02\/20\/encouragingly-parallel-part-1\/\r\n% Last time> we showed that using a simple parfor loop we could reduce the\r\n% runtime of a representative test suite from 6 to 7 minutes (when run\r\n% serially) down to a minute and a half. However, we still faced some\r\n% problems:\r\n%%\r\n% \r\n% # A minute and a half is a huge improvement over 6 to 7 minutes, but will\r\n% still lose my attention.\r\n% # The test output was garbled together without rhyme or reason.\r\n% # The approach was inefficiently using the computational resources\r\n% available. While the wall-clock time was reduced due to parallelization,\r\n% there was a significant jump in the overall testing time required because\r\n% we weren't leveraging the efficiency gains from sharing a fixture across\r\n% multiple tests.\r\n% \r\n%%\r\n%\r\n% OK, let's address these issues. First, we'll need to create the same\r\n% representative suite used in the previous post:\r\nimport matlab.unittest.TestSuite;\r\nclassSuite = TestSuite.fromFile('aClassBasedTest.m');\r\nfcnSuite = TestSuite.fromFile('aFunctionBasedTest.m');\r\nscriptSuite = TestSuite.fromFile('aScriptBasedTest.m');\r\n\r\nsuite = [repmat(classSuite, 1, 50), repmat(fcnSuite, 1, 50), repmat(scriptSuite, 1, 50)];\r\n\r\n%% \r\n% *Add some sophistication*\r\n%\r\n% Rather than using *|parfor|* to run these tests, how might this be done\r\n% if we take matters a bit more into our own hands using\r\n% <https:\/\/www.mathworks.com\/help\/distcomp\/parfeval.html *|parfeval|*>?\r\n%\r\n% First we can get some information from the parallel pool to tell us the\r\n% number of workers we have at our disposal. We can use this information to\r\n% split our suite into groups and run a single group per worker to reduce\r\n% overhead and better utilize our TestClassSetup\/setupOnce fixtures:\r\n%\r\n%   p = gcp();\r\n%   numWorkers = p.NumWorkers;\r\n%   \r\n%   numGroups = numWorkers; % Run one group\/worker\r\n%\r\n%% \r\n% Next we need to schedule our groups. We can do this by iterating over our\r\n% suite and using *|parfeval|* to schedule the suite divided into the\r\n% number of groups we plan to use. Note, since we need to take the ceiling\r\n% of the group size, group sizes scheduled later may be smaller than the\r\n% initial group sizes, which is why the groupLength is calculated for each\r\n% iteration through the scheduling loop.\r\n%\r\n%%\r\n%   % Schedule the groups\r\n%   while ~isempty(unscheduledSuite)\r\n%       groupLength = ceil(numel(unscheduledSuite)\/numGroupsToSchedule);\r\n%       groups(numGroupsToSchedule) = ...\r\n%           parfeval(@run, 1, unscheduledSuite(1:groupLength)');\r\n%       % 99 groups of tests on the wall, take one down, pass it around, \r\n%       % 98 groups of tests on the wall...\r\n%       numGroupsToSchedule = numGroupsToSchedule - 1;\r\n%       unscheduledSuite(1:groupLength) = [];\r\n%   end\r\n%   % Remove any groups not needed (this happens when the suite size is \r\n%   % smaller than the number of workers)\r\n%   groups(1:numGroupsToSchedule) = [];\r\n%   groups = flip(groups);\r\n%\r\n%\r\n% Once they are scheduled the workers are already off and running, but\r\n% we can print a little information and fetch the results:\r\n%\r\n%   fprintf('Split suite across %d groups on %d workers.\\n', numel(groups), numWorkers); \r\n%   results = fetchOutputs(groups);\r\n%\r\n%% \r\n% There's a bit of code here so why don't we put it into a function\r\n% *|runWithParfeval|*. Here it is in its entirety:\r\n%\r\n%   function results = runWithParfeval(suite)\r\n%\r\n%   p = gcp();\r\n%   numWorkers = p.NumWorkers;\r\n%   \r\n%   numGroupsToSchedule = numWorkers; % Run one group\/worker\r\n%   unscheduledSuite = suite;\r\n%   \r\n%   % Schedule the groups\r\n%   while ~isempty(unscheduledSuite)\r\n%       groupLength = ceil(numel(unscheduledSuite)\/numGroupsToSchedule);\r\n%       groups(numGroupsToSchedule) = ...\r\n%           parfeval(@run, 1, unscheduledSuite(1:groupLength)');\r\n%       % 99 groups of tests on the wall, take one down, pass it around, \r\n%       % 98 groups of tests on the wall...\r\n%       numGroupsToSchedule = numGroupsToSchedule - 1;\r\n%       unscheduledSuite(1:groupLength) = [];\r\n%   end\r\n%   % Remove any groups not needed (this happens when the suite size is \r\n%   % smaller than the number of workers)\r\n%   groups(1:numGroupsToSchedule) = [];\r\n%   groups = flip(groups);\r\n%   \r\n%   fprintf('Split suite across %d groups on %d workers.\\n', numel(groups), numWorkers); \r\n%   results = fetchOutputs(groups);\r\n%   end\r\n%\r\n%%\r\n% Let's run it!\r\ntic;\r\nresults = runWithParfeval_v1(suite)\r\ntoc;\r\n\r\n%% \r\n% *Where did it go?*\r\n%\r\n% Well, the good news is the output is gone and the bad news is the output\r\n% is gone. Also, it seems we are fairly efficient in the total execution\r\n% time since the actual testing time is close to that when run serially.\r\n% What's more, we are a bit faster, running in about a minute's time. Let's\r\n% see if we can add some more output to gain more insight into what is\r\n% going on during this run. In order to add some output, let's create a\r\n% simple local function that takes an index and the diary output from the\r\n% *|parallel.FevalFuture|* and prints it:\r\n%\r\n%   function printResult(idx, output)\r\n%   bar = '*****************';\r\n%   fprintf('%s\\nFinished Group %d\\n%s\\n%s', bar, idx, bar, output);\r\n%   end\r\n%\r\n%%\r\n%\r\n% Now with this printed output we will be able to see what happened\r\n% during each group. Using the *|fetchNext|* method on the\r\n% *|parallel.FevalFuture|* array as each group is completed allows us to\r\n% print each group's output as it finishes like so:\r\n%\r\n%\r\n%   for idx = 1:numel(groups)\r\n%       % fetchNext blocks until next results are available.\r\n%       groupIdx = fetchNext(groups);\r\n%       printResult(groupIdx, groups(groupIdx).Diary);\r\n%   end\r\n%   results = fetchOutputs(groups);\r\n%  \r\ntic;\r\nresults = runWithParfeval_v2(suite)\r\ntoc;\r\n\r\n%% \r\n% *The Long Pole*\r\n% With this output you can observe that the first groups didn't finish very\r\n% quickly at all. Doesn't it seem odd that the first groups scheduled were\r\n% the last groups to finish? This occurs because the first groups are the long\r\n% pole in our test execution. When we split our test suite into one group per\r\n% worker one aspect we failed to consider is that these tests take\r\n% differing amounts of time to execute. In our case, the class-based test\r\n% has a test method which takes an order of magnitude longer to execute\r\n% (remember *|aClassBasedTest\/testLongRunningEndToEndWorkflow|*?).\r\n% Furthermore we loaded these tests on the front of our test suite and\r\n% executed them all serially on some of the workers. These tests have now\r\n% become our bottleneck. Let's tweak our approach a bit to be robust to\r\n% such \"long poles\". If instead of scheduling one group per worker, why don't\r\n% we try to strike a balance between communication overhead, shared test\r\n% fixtures, and varying test times by scheduling each worker to execute 3\r\n% groups instead of one. Let's change just one line as follows and try it out:\r\n%\r\n%   numGroupsToSchedule = numWorkers*3; % Run three groups\/worker\r\n%\r\ntic;\r\nresults = runWithParfeval_final(suite)\r\ntoc;\r\n\r\n%% \r\n% Beautiful! We get an order of magnitude speed up from the serial case, it\r\n% executes in a third the wall-clock time as *|parfor|* and half\r\n% the time as *|parfeval|* with one group per worker. Add to the mix the\r\n% rational test output and we are gold! Here is the finished function:\r\n%\r\n%   function results = runWithParfeval(suite)\r\n%   \r\n%   p = gcp();\r\n%   numWorkers = p.NumWorkers;\r\n%   \r\n%   numGroupsToSchedule = numWorkers*3; % Run three groups\/worker\r\n%   unscheduledSuite = suite;\r\n%   \r\n%   % Schedule the groups\r\n%   while ~isempty(unscheduledSuite)\r\n%       groupLength = ceil(numel(unscheduledSuite)\/numGroupsToSchedule);\r\n%       groups(numGroupsToSchedule) = ...\r\n%           parfeval(@run, 1, unscheduledSuite(1:groupLength)');\r\n%       % 99 groups of tests on the wall, take one down, pass it around, \r\n%       % 98 groups of tests on the wall...\r\n%       numGroupsToSchedule = numGroupsToSchedule - 1;\r\n%       unscheduledSuite(1:groupLength) = [];\r\n%   end\r\n%   % Remove any groups not needed (this happens when the suite size is \r\n%   % smaller than the number of workers)\r\n%   groups(1:numGroupsToSchedule) = [];\r\n%   groups = flip(groups);\r\n%   \r\n%   fprintf('Split suite across %d groups on %d workers.\\n', numel(groups), numWorkers); \r\n%\r\n%   for idx = 1:numel(groups)\r\n%       % fetchNext blocks until next results are available.\r\n%       groupIdx = fetchNext(groups);\r\n%       printResult(groupIdx, groups(groupIdx).Diary);\r\n%   end\r\n%   results = fetchOutputs(groups);\r\n%   end\r\n%\r\n%   function printResult(idx, output)\r\n%   bar = '*****************';\r\n%   fprintf('%s\\nFinished Group %d\\n%s\\n%s', bar, idx, bar, output);\r\n%   end\r\n\r\n%% \r\n% *Conclusion*\r\n%\r\n% As demonstrated we can leverage parallelism in our day to day testing\r\n% workflows to significantly speed up the wall clock time of our code\r\n% execution. This can be done trivially using *|parfor|*, but the power of\r\n% *|parfeval|* can be utilized to further improve the execution time, better\r\n% utilize shared fixtures on the workers, and control test output, among\r\n% many other benefits not shown here.\r\n%\r\n% Have you used any other methods for running tests in parallel? If so\r\n% share your experiences in the comments!\r\n##### SOURCE END ##### 525556fbd97b471b98776fd30cefb98e\r\n-->","protected":false},"excerpt":{"rendered":"<!--introduction--><p><a href=\"https:\/\/blogs.mathworks.com\/developer\/2015\/02\/20\/encouragingly-parallel-part-1\/\">Last time<\/a> we showed that using a simple parfor loop we could reduce the runtime of a representative test suite from 6 to 7 minutes (when run serially) down to a minute and a half. However, we still faced some problems:... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/developer\/2015\/03\/03\/encouragingly-parallel-part-2\/\">read more >><\/a><\/p>","protected":false},"author":90,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[8,7],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/developer\/wp-json\/wp\/v2\/posts\/97"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/developer\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/developer\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/developer\/wp-json\/wp\/v2\/users\/90"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/developer\/wp-json\/wp\/v2\/comments?post=97"}],"version-history":[{"count":9,"href":"https:\/\/blogs.mathworks.com\/developer\/wp-json\/wp\/v2\/posts\/97\/revisions"}],"predecessor-version":[{"id":106,"href":"https:\/\/blogs.mathworks.com\/developer\/wp-json\/wp\/v2\/posts\/97\/revisions\/106"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/developer\/wp-json\/wp\/v2\/media?parent=97"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/developer\/wp-json\/wp\/v2\/categories?post=97"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/developer\/wp-json\/wp\/v2\/tags?post=97"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}