{"id":1451,"date":"2016-04-27T09:45:33","date_gmt":"2016-04-27T14:45:33","guid":{"rendered":"https:\/\/blogs.mathworks.com\/loren\/?p=1451"},"modified":"2019-02-01T09:46:34","modified_gmt":"2019-02-01T14:46:34","slug":"run-workers-run","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/loren\/2016\/04\/27\/run-workers-run\/","title":{"rendered":"Run Workers Run!"},"content":{"rendered":"<div class=\"content\"><!--introduction--><p><i>Today's guest post comes from <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/3208495\">Sean de Wolski<\/a>, one of Loren's fellow Application Engineers.  You might recognize him from <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/answers\/\">MATLAB answers<\/a> and the <a href=\"https:\/\/blogs.mathworks.com\/pick\/\">pick of the week<\/a> blog!<\/i><\/p><p>This blog post is going to focus on subtle differences in approaches to running commands on all <a href=\"https:\/\/www.mathworks.com\/products\/parallel-computing\/\">Parallel Computing Toolbox<\/a> workers.<\/p><!--\/introduction--><h3>Contents<\/h3><div><ul><li><a href=\"#b93f694a-3d19-4dd4-8f7b-72469989552c\">Background<\/a><\/li><li><a href=\"#4a0ff83d-c302-4a00-b27a-4efce282ebab\">pctRunOnAll<\/a><\/li><li><a href=\"#75947e6d-1e1d-44b1-9f83-f95029ee9534\">SPMD<\/a><\/li><li><a href=\"#ed3d9971-5b1d-4e59-9997-9214cfd3df1a\">parfevalOnAll<\/a><\/li><li><a href=\"#06e9d796-2f78-4226-bf71-4a08111345e1\">Table of Tradeoffs<\/a><\/li><li><a href=\"#1fa16f1e-43a6-4f65-b173-2aa4b8428fc3\">Lack of PARFOR<\/a><\/li><li><a href=\"#9393b3cb-84db-4344-97a8-9e4edfb56650\">Comments and Feedback<\/a><\/li><\/ul><\/div><h4>Background<a name=\"b93f694a-3d19-4dd4-8f7b-72469989552c\"><\/a><\/h4><p>The Parallel Computing Toolbox allows you to open headless MATLAB workers that you can then distribute work to in order to speed up your code or offload it from the main MATLAB session.  Oftentimes, there is some initial setup that you will want to run on each of the workers in order to prepare them to run.   Some examples include: loading libraries, changing the path, or opening a Simulink model.  There are three different approaches to this each with their own characteristics.<\/p><div><ul><li><a href=\"https:\/\/www.mathworks.com\/help\/distcomp\/pctrunonall.html\"><tt>pctRunOnAll<\/tt><\/a><\/li><li><a href=\"https:\/\/www.mathworks.com\/help\/distcomp\/spmd.html\"><tt>spmd<\/tt><\/a><\/li><li><a href=\"https:\/\/www.mathworks.com\/help\/distcomp\/parfevalonall.html\"><tt>parfevalOnAll<\/tt><\/a><\/li><\/ul><\/div><p>This has long been a source of confusion for me, so I finally researched it to figure out when to use each.  Let's walk through the use cases for each one.<\/p><h4>pctRunOnAll<a name=\"4a0ff83d-c302-4a00-b27a-4efce282ebab\"><\/a><\/h4><p><tt>pctRunOnAll<\/tt> will run synchronously, can run one command at a time, and cannot have outputs returned.  It runs the command on all of the workers <b>and<\/b> the client.<\/p><p><tt>pctRunOnAll<\/tt> should be used for any type of setup where no output is expected and none of the workers are busy.  If one worker is busy, control will not be returned until it has run on every worker so this could take a while waiting for the non-idle worker to finish.<\/p><p>Since it runs on all of the workers and the client, I like to use it when loading a Simulink model because I can then visually make sure the correct model opened on the client machine.<\/p><p>It uses command syntax so everything has to be defined as a string.  For example:<\/p><pre class=\"language-matlab\">pctRunOnAll <span class=\"string\">load('Data.mat')<\/span>\r\n<\/pre><p>This can make it harder to parameterize inputs, i.e., you'll need to use <tt>sprintf<\/tt> and friends, so if there will be a lot of changes to the command being run it might pay to look at <tt>spmd<\/tt>. However, it is nice for a quick one-off command line based operations. Here's an example of how you would provide the MAT-file name in the above example:<\/p><pre class=\"language-matlab\">file = <span class=\"string\">'Data.mat'<\/span>;\r\npctRunOnAll(sprintf(<span class=\"string\">'load(''%s'')'<\/span>,file)\r\n<\/pre><p><tt>pctRunOnAll<\/tt> will also not automatically start the parallel pool if one is not open.  It will instead return an error.  The other two approaches will spool up the parallel pool if you are using the default <a title=\"https:\/\/www.mathworks.com\/help\/distcomp\/parallel-pools.html#buepucj-1 (link no longer works)\">preference<\/a> \"Automatically create a parallel pool when certain language features detected\".<\/p><h4>SPMD<a name=\"75947e6d-1e1d-44b1-9f83-f95029ee9534\"><\/a><\/h4><p><tt>spmd<\/tt> stands for <i>single program, multiple data<\/i>.  It will run the same set of commands synchronously on all of the workers in the parallel pool but not the client.<\/p><p><tt>spmd<\/tt> has many uses besides just running the same command on every worker in that you can govern what happens on each worker based on the worker's index using the function <tt>labindex<\/tt>.  Thus it allows for full message passing parallelization.<\/p><p>In terms of running the same command on each worker, it is convenient for grouping multiple commands together as everything in the <tt>spmd<\/tt> block runs and allows you to store output.  Let's make a random 10000x1 vector on each worker and then take the mean to emulate a simple monte carlo simulation of the central limit theorem:<\/p><pre class=\"codeinput\"><span class=\"keyword\">spmd<\/span>\r\n    x = rand(10000,1);\r\n    xmean = mean(x);\r\n<span class=\"keyword\">end<\/span>\r\n<\/pre><p><i>x<\/i> and <i>xmean<\/i> are <a href=\"https:\/\/www.mathworks.com\/help\/distcomp\/composite.composite.html\">composite<\/a> arrays meaning that each worker has its own value and the data are stored in the workers' memory.  You can index into it with curly braces, like a cell array, to grab back a local copy of any specific element or to modify the elements on a specific worker.<\/p><pre class=\"codeinput\">xmeans = [xmean{:}];\r\ndisp(xmeans)\r\n<\/pre><pre class=\"codeoutput\">      0.50508      0.50163\r\n<\/pre><p>I have two workers so <i>xmeans<\/i> is a 1x2.<\/p><p>Using <tt>spmd<\/tt> does require that the parallel pool be <i>SpmdEnabled<\/i>.  This is a setting you can change when you start the pool and is on by default. One reason for turning off this feature is that if one or more workers die the remaining workers can still be used by <tt>parfor<\/tt>-loops or <tt>parfeval<\/tt> statements sans the dead worker(s).<\/p><h4>parfevalOnAll<a name=\"ed3d9971-5b1d-4e59-9997-9214cfd3df1a\"><\/a><\/h4><p><tt>parfevalOnAll<\/tt>, first released in R2013b with the new parallel computing API, is the latest and greatest in evaluating functions on workers.  It runs asynchronously and allows you to return an output.  The input has to be a function so you will need to encapsulate your commands into a function, be it in a file or anonymous.<\/p><p>The biggest feature here is that it is asynchronous; i.e once you run the <tt>parfevalOnAll<\/tt> command, and the futures are submitted, control is returned to the MATLAB command prompt so you can continue working and running other things.  It should strongly be considered for anything that is compute intensive or for any operation driven by a user interface where you want the user interface to continue being useful while the operations are running.<\/p><p>Like <tt>spmd<\/tt>, <tt>parfevalOnAll<\/tt> will start a parallel pool and only runs on the workers.<\/p><p>Here is a simple example to demonstrate the power of <tt>parfevalOnAll<\/tt>. Let's run a computationally expensive command locally to get a feel for how long it takes.<\/p><pre class=\"codeinput\">f = @optimizeCraneSwing;\r\ntic\r\nxs = f();\r\ntoc\r\n<\/pre><pre class=\"codeoutput\">Elapsed time is 28.676841 seconds.\r\n<\/pre><p>And now offload that to run on both workers:<\/p><pre class=\"codeinput\">tic\r\nfutures = parfevalOnAll(f, 1);\r\ntsubmit = toc\r\ndisp(<span class=\"string\">'Futures submitted; command line access returned!'<\/span>)\r\nwait(futures);\r\ntfinish = toc\r\ndisp(<span class=\"string\">'Futures finished'<\/span>)\r\n<\/pre><pre class=\"codeoutput\">tsubmit =\r\n    0.0089678\r\nFutures submitted; command line access returned!\r\ntfinish =\r\n       37.641\r\nFutures finished\r\n<\/pre><p>Notice how command line access returned almost immediately and the futures finished later.  In order to get accurate timing, I waited for them to finish, and if you need the results before continuing, this is a necessary step.  However, if you don't need the results, you can query the <i>'State'<\/i> until they become <i>finished<\/i> and then fetch the outputs.<\/p><pre class=\"codeinput\">futures.State\r\n<\/pre><pre class=\"codeoutput\">ans =\r\nfinished\r\n<\/pre><p>If you want to automate this, a <a title=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/timer-class.html (link no longer works)\"><tt>timer<\/tt><\/a> can be used to query the state periodically.<\/p><h4>Table of Tradeoffs<a name=\"06e9d796-2f78-4226-bf71-4a08111345e1\"><\/a><\/h4><p>So finally, for a quick reference, here is a table of the tradeoffs to help you in future decisions.<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/loren\/files\/tableofattributes.png\" alt=\"\"> <\/p><h4>Lack of PARFOR<a name=\"1fa16f1e-43a6-4f65-b173-2aa4b8428fc3\"><\/a><\/h4><p>You may have noticed that <tt>parfor<\/tt> did not make an appearance in the above list.  Parallel <tt>for<\/tt>-loops do their own internal load balancing which does not guarantee utilization of every worker and therefore cannot be relied upon.<\/p><h4>Comments and Feedback<a name=\"9393b3cb-84db-4344-97a8-9e4edfb56650\"><\/a><\/h4><p>Have you used any of these approaches before?  Do you have a preference or a cautionary story?  Share it <a href=\"https:\/\/blogs.mathworks.com\/loren\/?p=1451#respond\">here<\/a>.<\/p><script language=\"JavaScript\"> <!-- \r\n    function grabCode_3a032aa8e9624d68bc3ee3f0a432adf5() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='3a032aa8e9624d68bc3ee3f0a432adf5 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 3a032aa8e9624d68bc3ee3f0a432adf5';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2016 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_3a032aa8e9624d68bc3ee3f0a432adf5()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2016a<br><\/p><\/div><!--\r\n3a032aa8e9624d68bc3ee3f0a432adf5 ##### SOURCE BEGIN #####\r\n%% Run Workers Run!\r\n% _Today's guest post comes from\r\n% <https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/3208495 Sean de\r\n% Wolski>, one of Loren's fellow Application Engineers.  You might\r\n% recognize him from <https:\/\/www.mathworks.com\/matlabcentral\/answers\/ MATLAB\r\n% answers> and the <https:\/\/blogs.mathworks.com\/pick\/ pick of the week>\r\n% blog!_\r\n% \r\n% This blog post is going to focus on subtle differences in approaches to\r\n% running commands on all\r\n% <https:\/\/www.mathworks.com\/products\/parallel-computing\/ Parallel Computing\r\n% Toolbox> workers.\r\n%\r\n%% Background\r\n%\r\n% The Parallel Computing Toolbox allows you to open headless MATLAB workers\r\n% that you can than distribute work to in order to speed up your code or\r\n% offload it from the main MATLAB session.  Oftentimes, there is\r\n% some initial setup that you will want to run on each of the workers in\r\n% order to prepare them to run.   Some examples include: loading libraries,\r\n% changing the path, or opening a Simulink model.  There are three\r\n% different approaches to this each with their own characteristics.\r\n%\r\n% * <https:\/\/www.mathworks.com\/help\/distcomp\/pctrunonall.html |pctRunOnAll|>\r\n% * <https:\/\/www.mathworks.com\/help\/distcomp\/spmd.html |spmd|>\r\n% * <https:\/\/www.mathworks.com\/help\/distcomp\/parfevalonall.html |parfevalOnAll|>\r\n%\r\n% This has long been a source of confusion for me, so I finally researched\r\n% it to figure out when to use each.  Let's walk through the use cases for\r\n% each one.\r\n\r\n%% pctRunOnAll\r\n%\r\n% |pctRunOnAll| will run synchronously, can run one command at a time, and\r\n% cannot have outputs returned.  It runs the command on all of the workers\r\n% *and* the client. \r\n%\r\n% |pctRunOnAll| should be used for any type of setup where no\r\n% output is expected and none of the workers are busy.  If one worker is\r\n% busy, control will not be returned until it has run on every worker so\r\n% this could take a while waiting for the non-idle worker to finish.\r\n%\r\n% Since it runs on all of the workers and the client, I like to use it when\r\n% loading a simulink model because I can then visually make sure the\r\n% correct model opened on the client machine.\r\n%\r\n% It uses command syntax so everything has to be defined as a string.  For\r\n% example:\r\n%\r\n%   pctRunOnAll load('Data.mat')\r\n%\r\n% This can make it harder to parameterize inputs, i.e., you'll need to use\r\n% |sprintf| and friends, so if there will be a lot of changes to the\r\n% command being run it might pay to look at |spmd|.\r\n% However, it is nice for a quick one-off command line based operations.\r\n% Here's an example of how you would provide the MAT-file name in the above\r\n% example:\r\n%\r\n%   file = 'Data.mat';\r\n%   pctRunOnAll(sprintf('load(''%s'')',file)\r\n%\r\n% |pctRunOnAll| will also not automatically start the parallel pool if one\r\n% is not open.  It will instead return an error.  The other two approaches\r\n% will spool up the parallel pool if you are using the default\r\n% <https:\/\/www.mathworks.com\/help\/distcomp\/parallel-pools.html#buepucj-1\r\n% preference> \"Automatically create a parallel pool when certain language\r\n% features detected\".\r\n%\r\n\r\n%% SPMD\r\n% |spmd| stands for _single program, multiple data_.  It will run the same\r\n% set of commands synchronously on all of the workers in the parallel pool\r\n% but not the client.\r\n%\r\n% |spmd| has many uses besides just running the same command on every\r\n% worker in that you can govern what happens on each worker based on the\r\n% worker's index using the function |labindex|.  Thus it allows for full\r\n% message passing parallelization.\r\n%\r\n% In terms of running the same command on each worker, it is convenient for\r\n% grouping multiple commands together as everything in the |spmd| block\r\n% runs and allows you to store output.  Let's make a random 10000x1 vector\r\n% on each worker and then take the mean to emulate a simple monte carlo\r\n% simulation of the central limit theorem:\r\nspmd\r\n    x = rand(10000,1);\r\n    xmean = mean(x);\r\nend\r\n%%\r\n% _x_ and _xmean_ are\r\n% <https:\/\/www.mathworks.com\/help\/distcomp\/composite.composite.html\r\n% composite> arrays meaning that each worker has its own value and the data\r\n% are stored in the workers' memory.  You can index into it with curly\r\n% braces, like a cell array, to grab back a local copy of any specific\r\n% element or to modify the elements on a specific worker.\r\nxmeans = [xmean{:}];\r\ndisp(xmeans)\r\n%%\r\n% I have two workers so _xmeans_ is a 1x2.\r\n%\r\n% Using |spmd| does require that the parallel pool be _SpmdEnabled_.  This is\r\n% a setting you can change when you start the pool and is on by default.\r\n% One reason for turning off this feature is that if one or more workers\r\n% die the remaining workers can still be used by |parfor|-loops or\r\n% |parfeval| statements sans the dead worker(s).\r\n\r\n\r\n%% parfevalOnAll\r\n%\r\n% |parfevalOnAll|, first released in R2013b with the new parallel computing\r\n% API, is the latest and greatest in evaluating functions on workers.  It\r\n% runs asynchronously and allows you to return an output.  The input has to\r\n% be a function so you will need to encapsulate your commands into a\r\n% function, be it in a file or anonymous.\r\n%\r\n% The biggest feature here is that it is asynchronous; i.e once you run the\r\n% |parfevalOnAll| command, and the futures are submitted, control is\r\n% returned to the MATLAB command prompt so you can continue working and\r\n% running other things.  It should strongly be considered for anything that\r\n% is compute intensive or for any operation driven by a user interface\r\n% where you want the user interface to continue being useful while the\r\n% operations are running.\r\n% \r\n% Like |spmd|, |parfevalOnAll| will start a parallel pool and only runs on\r\n% the workers.\r\n%\r\n% Here is a simple example to demonstrate the power of |parfevalOnAll|.\r\n% Let's run a computationally expensive command locally to get a feel for\r\n% how long it takes.\r\n\r\nf = @optimizeCraneSwing;\r\ntic\r\nxs = f();\r\ntoc\r\n%%\r\n% And now offload that to run on both workers:\r\n\r\ntic\r\nfutures = parfevalOnAll(f, 1);\r\ntsubmit = toc\r\ndisp('Futures submitted; command line access returned!')\r\nwait(futures);\r\ntfinish = toc\r\ndisp('Futures finished')\r\n\r\n%%\r\n% Notice how command line access returned almost immediately and the\r\n% futures finished later.  In order to get accurate timing, I waited for\r\n% them to finish, and if you need the results before continuing, this is a\r\n% necessary step.  However, if you don't need the results, you can query\r\n% the _'State'_ until they become _finished_ and then fetch the outputs.\r\n%\r\nfutures.State\r\n%%\r\n% If you want to automate this, a\r\n% <https:\/\/www.mathworks.com\/help\/matlab\/ref\/timer-class.html |timer|> can\r\n% be used to query the state periodically.\r\n\r\n%% Table of Tradeoffs\r\n%\r\n% So finally, for a quick reference, here is a table of the tradeoffs to\r\n% help you in future decisions.\r\n% \r\n% <<tableofattributes.png>>\r\n\r\n%% Lack of PARFOR\r\n% You may have noticed that |parfor| did not make an appearance in the\r\n% above list.  Parallel |for|-loops do their own internal load balancing\r\n% which does not guarantee utilization of every worker and therefore cannot\r\n% be relied upon.\r\n\r\n%% Comments and Feedback\r\n% Have you used any of these approaches before?  Do you have a preference\r\n% or a cautionary story?  Share it\r\n% <https:\/\/blogs.mathworks.com\/loren\/?p=1451#respond here>.\r\n\r\n##### SOURCE END ##### 3a032aa8e9624d68bc3ee3f0a432adf5\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/loren\/files\/tableofattributes.png\" onError=\"this.style.display ='none';\" \/><\/div><!--introduction--><p><i>Today's guest post comes from <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/3208495\">Sean de Wolski<\/a>, one of Loren's fellow Application Engineers.  You might recognize him from <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/answers\/\">MATLAB answers<\/a> and the <a href=\"https:\/\/blogs.mathworks.com\/pick\/\">pick of the week<\/a> blog!<\/i>... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/loren\/2016\/04\/27\/run-workers-run\/\">read more >><\/a><\/p>","protected":false},"author":39,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[34],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/1451"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/comments?post=1451"}],"version-history":[{"count":9,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/1451\/revisions"}],"predecessor-version":[{"id":3214,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/1451\/revisions\/3214"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/media?parent=1451"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/categories?post=1451"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/tags?post=1451"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}