{"id":199,"date":"2009-10-02T16:30:13","date_gmt":"2009-10-02T16:30:13","guid":{"rendered":"https:\/\/blogs.mathworks.com\/loren\/2009\/10\/02\/using-parfor-loops-getting-up-and-running\/"},"modified":"2021-05-20T08:46:36","modified_gmt":"2021-05-20T12:46:36","slug":"using-parfor-loops-getting-up-and-running","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/loren\/2009\/10\/02\/using-parfor-loops-getting-up-and-running\/","title":{"rendered":"Using parfor Loops: Getting Up and Running"},"content":{"rendered":"<div class=\"alert alert-info\"> <span class=\"alert_icon icon-alert-info-reverse\"><\/span><p class=\"alert_heading\"><strong>Note<\/strong><\/p><p>.<\/p><\/div>\r\n\r\n<div xmlns:mwsh=\"https:\/\/www.mathworks.com\/namespace\/mcode\/v1\/syntaxhighlight.dtd\" class=\"content\">\r\n   <introduction>\r\n      <p>\r\nNOTE: <tt>matlabpool<\/tt> was removed in 2015 and you should replace that with \r\n<a href=\"https:\/\/www.mathworks.com\/help\/parallel-computing\/parpool.html\"><tt>parpool<\/tt><\/a>\r\n instead.\r\n      <\/p>\r\n      <p>Today I&#8217;d like to introduce a guest blogger, <a href=\"mailto:sarah.zaranek@mathworks.com\">Sarah Wait Zaranek<\/a>, who is an application engineer here at The MathWorks. Sarah previously has <a href=\"https:\/\/blogs.mathworks.com\/loren\/2008\/06\/25\/speeding-up-matlab-applications\/\">written<\/a> about speeding up code from a customer to get acceptable performance. She again will be writing about speeding up MATLAB\r\n         applications, but this time her focus will be on using the parallel computing tools.\r\n      <\/p>\r\n   <\/introduction>\r\n   <h3>Contents<\/h3>\r\n   <div>\r\n      <ul>\r\n         <li><a href=\"#1\">Introduction<\/a><\/li>\r\n         <li><a href=\"#2\">Method<\/a><\/li>\r\n         <li><a href=\"#4\">Background on parfor-loops<\/a><\/li>\r\n         <li><a href=\"#5\">Opening the matlabpool<\/a><\/li>\r\n         <li><a href=\"#8\">Independence<\/a><\/li>\r\n         <li><a href=\"#11\">Globals and Transparency<\/a><\/li>\r\n         <li><a href=\"#12\">Classification<\/a><\/li>\r\n         <li><a href=\"#21\">Uniqueness<\/a><\/li>\r\n         <li><a href=\"#28\">Your examples<\/a><\/li>\r\n      <\/ul>\r\n   <\/div>\r\n   <h3>Introduction<a name=\"1\"><\/a><\/h3>\r\n   <p>I wanted to write a post to help users better understand our parallel computing tools. In this post, I will focus on one of\r\n      the more commonly used functions in these tools: the <tt>parfor<\/tt>-loop.\r\n   <\/p>\r\n   <p>This post will focus on getting a parallel code using <tt>parfor<\/tt> up and running. Performance will not be addressed in this post. I will assume that the reader has a basic knowledge of the\r\n      <tt>parfor<\/tt>-loop construct. Loren has a very nice introduction to using <tt>parfor<\/tt> in one of her previous <a href=\"https:\/\/blogs.mathworks.com\/loren\/2007\/10\/03\/parfor-the-course\/\">posts<\/a>. There are also some nice introductory videos.\r\n   <\/p>\r\n   <p><i>Note for clarity<\/i> : Since Loren's introductory post, the toolbox used for parallel computing has changed names from the Distributed Computing\r\n      Toolbox to the <a href=\"https:\/\/www.mathworks.com\/products\/parallel-computing\/\">Parallel Computing Toolbox<\/a>. These are not two separate toolboxes.\r\n   <\/p>\r\n   <h3>Method<a name=\"2\"><\/a><\/h3>\r\n   <p>In some cases, you may only need to change a <tt>for<\/tt>-loop to a <tt>parfor<\/tt>-loop to get their code running in parallel. However, in other cases you may need to slightly alter the code so that <tt>parfor<\/tt> can work. I decided to show a few examples highlighting the main challenges that one might encounter.  I have separated these\r\n      examples into four encompassing categories:\r\n   <\/p>\r\n   <div>\r\n      <ul>\r\n         <li>Independence<\/li>\r\n         <li>Globals and Transparency<\/li>\r\n         <li>Classification<\/li>\r\n         <li>Uniqueness<\/li>\r\n      <\/ul>\r\n   <\/div>\r\n   <h3>Background on parfor-loops<a name=\"4\"><\/a><\/h3>\r\n   <p>In a <tt>parfor<\/tt>-loop (just like in a standard <tt>for<\/tt>-loop) a series of statements known as the loop body are iterated over a range of values. However, when using a <tt>parfor<\/tt>-loop the iterations are run not on the client MATLAB machine but are run in parallel on MATLAB workers.\r\n   <\/p>\r\n   <p>Each worker has its own unique workspace.  So, the data needed to do these calculations is sent from the client to workers,\r\n      and the results are sent back to the client and pieced together. The cool thing about <tt>parfor<\/tt> is this data transfer is handled for the user. When MATLAB gets to the <tt>parfor<\/tt>-loop, it statically analyzes the body of the <tt>parfor<\/tt>-loop and determines what information goes to which worker and what variables will be returning to the client MATLAB. Understanding\r\n      this concept will become important when understanding why particular constraints are placed on the use of <tt>parfor<\/tt>.\r\n   <\/p>\r\n   <h3>Opening the matlabpool<a name=\"5\"><\/a><\/h3>\r\n   <p>Before looking at some examples, I will open up a matlabpool so I can run my loops in parallel.  I will be opening up the\r\n      matlabpool using my default local configuration (i.e. my workers will be running on the dual-core laptop machine where my\r\n      MATLAB has been installed).\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\"><span style=\"color: #0000FF\">if<\/span> matlabpool(<span style=\"color: #A020F0\">'size'<\/span>) == 0 <span style=\"color: #228B22\">% checking to see if my pool is already open<\/span>\r\n    matlabpool <span style=\"color: #A020F0\">open<\/span> <span style=\"color: #A020F0\">2<\/span>\r\n<span style=\"color: #0000FF\">end<\/span><\/pre><pre style=\"font-style:oblique\">Starting matlabpool using the 'local' configuration ... connected to 2 labs.\r\n<\/pre><p><i>Note<\/i> : The <tt>'size'<\/tt> option was new in R2008b.\r\n   <\/p>\r\n   <h3>Independence<a name=\"8\"><\/a><\/h3>\r\n   <p>The <tt>parfor<\/tt>-loop is designed for task-parallel types of problems where each iteration of the loop is independent of each other iteration.\r\n      This is a critical requirement for using a <tt>parfor<\/tt>-loop. Let's see an example of when each iteration is not independent.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">type <span style=\"color: #A020F0\">dependentLoop.m<\/span><\/pre><pre style=\"font-style:oblique\">\r\n% Example of a dependent for-loop\r\na = zeros(1,10);\r\n\r\nparfor it = 1:10 \r\n    a(it) = someFunction(a(it-1));\r\nend\r\n<\/pre><p>Checking the above code using M-Lint (MATLAB's static code analyzer) gives a warning message that these iterations are dependent\r\n      and will not work with the <tt>parfor<\/tt> construct. M-Lint can either be accessed via the editor or command line. In this case, I use the command line and have defined\r\n      a simple function <tt>displayMlint<\/tt> so that the display is compact.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">output = mlint(<span style=\"color: #A020F0\">'dependentLoop.m'<\/span>);\r\ndisplayMlint(output)<\/pre><pre style=\"font-style:oblique\">The PARFOR loop cannot run due to \r\n the way variable 'a' is used. \r\n\r\nIn a PARFOR loop, variable 'a' is \r\n indexed in different ways, \r\n potentially causing dependencies \r\n between iterations. \r\n\r\n<\/pre><p>Sometimes loops are intrinsically or unavoidably dependent, and therefore <tt>parfor<\/tt> is not a good fit for that type of calculation. However, in some cases it is possible to reformulate the body of the loop\r\n      to eliminate the dependency or separate it from the main time-consuming calculation.\r\n   <\/p>\r\n   <h3>Globals and Transparency<a name=\"11\"><\/a><\/h3>\r\n   <p>All variables within the body of a <tt>parfor<\/tt>-loop must be transparent. This means that all references to variables must occur in the text of the program. Since MATLAB\r\n      is statically analyzing the loops to figure out what data goes to what worker and what data comes back, this seems like an\r\n      understandable restriction.\r\n   <\/p>\r\n   <p>Therefore, the following commands cannot be used within the body of a <tt>parfor<\/tt>-loop : <tt>evalc<\/tt>, <tt>eval<\/tt>, <tt>evalin<\/tt>, and <tt>assignin<\/tt>. <tt>load<\/tt> can also not be used unless the output of load is <a href=\"https:\/\/www.mathworks.com\/help\/releases\/R2009b\/techdoc\/ref\/load.html\">assigned<\/a> to a variable name. It is possible to use the above functions within a function called by <tt>parfor<\/tt>, due to the fact that the function has its own workspace. I have found that this is often the easiest workaround for the\r\n      transparency issue.\r\n   <\/p>\r\n   <p>Additionally, you cannot define global variables or persistent variables within the body of the <tt>parfor<\/tt> loop. I would also suggest being careful with the use of globals since changes in global values on workers are not automatically\r\n      reflected in local global values.\r\n   <\/p>\r\n   <h3>Classification<a name=\"12\"><\/a><\/h3>\r\n   <p>A detailed description of the <a href=\"https:\/\/www.mathworks.com\/help\/releases\/R2009b\/toolbox\/distcomp\/index.html?\/access\/helpdesk\/help\/releases\/R2009b\/toolbox\/distcomp\/brdqtjj-1.html\">classification<\/a> of variables in a <tt>parfor<\/tt>-loop is in the documentation. I think it is useful to view classification as representing the different ways a variable is\r\n      passed between client and worker and the different ways it is used within the body of the <tt>parfor<\/tt>-loop.\r\n   <\/p>\r\n   <p><i>Challenges with Classification<\/i><\/p>\r\n   <p>Often challenges arise when first converting <tt>for<\/tt>-loops to <tt>parfor<\/tt>-loops due to issues with this classification.  An often seen issue is the conversion of nested <tt>for<\/tt>-loops, where sliced variables are not indexed appropriately.\r\n   <\/p>\r\n   <p>Sliced variables are variables where each worker is calculating on a different part of that variable. Therefore, sliced variables\r\n      are sliced or divided amongst the workers. Sliced variables are used to prevent unneeded data transfer from client to worker.\r\n   <\/p>\r\n   <p><i>Using <tt>parfor<\/tt> with Nested <tt>for<\/tt>-Loops<\/i><\/p>\r\n   <p>The loop below is nested and encounters some of the restrictions placed on <tt>parfor<\/tt> for sliced variables.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">type <span style=\"color: #A020F0\">parforNestTry.m<\/span><\/pre><pre style=\"font-style:oblique\">\r\nA1 = zeros(10,10); \r\n\r\nparfor ix = 1:10\r\n    for jx = 1:10\r\n        A1(ix, jx) = ix + jx;\r\n    end\r\nend\r\n<\/pre><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">output = mlint(<span style=\"color: #A020F0\">'parforNestTry.m'<\/span>);\r\ndisplayMlint(output);<\/pre><pre style=\"font-style:oblique\">The PARFOR loop cannot run due to \r\n the way variable 'A1' is used. \r\n\r\nValid indices for 'A1' are \r\n restricted in PARFOR loops. \r\n\r\n<\/pre><p>In this case, <tt>A1<\/tt> is a sliced variable. For sliced variables, the restrictions are placed on the first-level variable indices. This allows\r\n      <tt>parfor<\/tt> to easily distribute the right part of the variable to the right workers.\r\n   <\/p>\r\n   <p>The first level indexing ,in general, refers to indexing within the first set of parenthesis or braces. This is <a href=\"https:\/\/www.mathworks.com\/help\/releases\/R2009b\/toolbox\/distcomp\/index.html?\/access\/helpdesk\/help\/releases\/R2009b\/toolbox\/distcomp\/brdqtjj-1.html\">explained<\/a> in more detail in the same section as classification in the documentation.\r\n   <\/p>\r\n   <p>One of these first-level indices must be the loop counter variable or the counter variable plus or minus a constant.  <i>Every other first-level index must be a constant, a non-loop counter variable, a colon, or an <tt>end<\/tt>.<\/i><\/p>\r\n   <p>In this case, <tt>A1<\/tt> has an loop counter variable for both first level indices (<tt>ix<\/tt> and <tt>jx<\/tt>).\r\n   <\/p>\r\n   <p>The solution to this is make sure a loop counter variable is only one of the indices of <tt>A1<\/tt> and make the other index a colon. To implement this, the results of the inner loop can be saved to a new variable and then\r\n      that variable can be saved to the desired variable outside the nested loop.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">A2 = zeros(10,10);\r\n\r\n<span style=\"color: #0000FF\">parfor<\/span> ix = 1:10\r\n    myTemp = zeros(1,10);\r\n    <span style=\"color: #0000FF\">for<\/span> jx = 1:10\r\n        myTemp(jx) = ix + jx;\r\n    <span style=\"color: #0000FF\">end<\/span>\r\n    A2(ix,:) = myTemp;\r\n<span style=\"color: #0000FF\">end<\/span><\/pre><p>You can also solve this issue by using cells. Since <tt>jx<\/tt> is now in the second level of indexing, it can be an loop counter variable.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">A3 = cell(10,1);\r\n\r\n<span style=\"color: #0000FF\">parfor<\/span> ix = 1:10\r\n    <span style=\"color: #0000FF\">for<\/span> jx = 1:10\r\n        A3{ix}(jx) = ix + jx;\r\n    <span style=\"color: #0000FF\">end<\/span>\r\n<span style=\"color: #0000FF\">end<\/span>\r\n\r\nA3 = cell2mat(A3);<\/pre><p>I have found that both solutions have their benefits. While cells may be easier to implement in your code, they also result\r\n      in <tt>A3<\/tt> using more memory due to the additional <a href=\"https:\/\/www.mathworks.com\/help\/releases\/R2009b\/techdoc\/matlab_prog\/brh72ex-2.html#brh72ex-14\">memory<\/a> requirements for cells. The call to <tt>cell2mat<\/tt> also adds additional processing time.\r\n   <\/p>\r\n   <p>A similar technique can be used for several levels of nested <tt>for<\/tt>-loops.\r\n   <\/p>\r\n   <h3>Uniqueness<a name=\"21\"><\/a><\/h3>\r\n   <p><i>Doing Machine Specific Calculations<\/i><\/p>\r\n   <p>This is a way, while using <tt>parfor<\/tt>-loops, to determine which machine you are on and do machine specific instructions within the loop. An example of why you\r\n      would want to do this is if different machines have data files in different directories, and you wanted to make sure to get\r\n      into the right directory. Do be careful if you make the code machine-specific since it will be harder to port.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\"><span style=\"color: #228B22\">% Getting the machine host name<\/span>\r\n\r\n[~,hostname] = system(<span style=\"color: #A020F0\">'hostname'<\/span>);\r\n\r\n<span style=\"color: #228B22\">% If the loop iterations are the same as the size of matlabpool, the<\/span>\r\n<span style=\"color: #228B22\">% command is run once per worker.<\/span>\r\n\r\n<span style=\"color: #0000FF\">parfor<\/span> ix = 1:matlabpool(<span style=\"color: #A020F0\">'size'<\/span>)\r\n    [~,hostnameID{ix}] = system(<span style=\"color: #A020F0\">'hostname'<\/span>);\r\n<span style=\"color: #0000FF\">end<\/span>\r\n\r\n<span style=\"color: #228B22\">% Can then do host\/machine specific commands<\/span>\r\nhostnames = unique(hostnameID);\r\ncheckhost = hostnames(1);\r\n\r\n<span style=\"color: #0000FF\">parfor<\/span> ix = 1:matlabpool(<span style=\"color: #A020F0\">'size'<\/span>)\r\n    [~,myhost] = system(<span style=\"color: #A020F0\">'hostname'<\/span>);\r\n    <span style=\"color: #0000FF\">if<\/span> strcmp(myhost,checkhost)\r\n       display(<span style=\"color: #A020F0\">'On Machine 1'<\/span>)\r\n    <span style=\"color: #0000FF\">else<\/span>\r\n        display(<span style=\"color: #A020F0\">'NOT on Machine 1'<\/span>)\r\n    <span style=\"color: #0000FF\">end<\/span>\r\n<span style=\"color: #0000FF\">end<\/span><\/pre><pre style=\"font-style:oblique\">On Machine 1\r\nOn Machine 1\r\n<\/pre><p>In my case since I am running locally -- all of the workers are on the same machine.<\/p>\r\n   <p>Here's the same code running on a non-local cluster.<\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">matlabpool <span style=\"color: #A020F0\">close<\/span>\r\nmatlabpool <span style=\"color: #A020F0\">open<\/span> <span style=\"color: #A020F0\">speedy<\/span>\r\n<span style=\"color: #0000FF\">parfor<\/span> ix = 1:matlabpool(<span style=\"color: #A020F0\">'size'<\/span>)\r\n    [~,hostnameID{ix}] = system(<span style=\"color: #A020F0\">'hostname'<\/span>);\r\n<span style=\"color: #0000FF\">end<\/span>\r\n\r\n<span style=\"color: #228B22\">% Can then do host\/machine specific commands<\/span>\r\nhostnames = unique(hostnameID);\r\ncheckhost = hostnames(1);\r\n\r\n<span style=\"color: #0000FF\">parfor<\/span> ix = 1:matlabpool(<span style=\"color: #A020F0\">'size'<\/span>)\r\n    [~,myhost] = system(<span style=\"color: #A020F0\">'hostname'<\/span>);\r\n    <span style=\"color: #0000FF\">if<\/span> strcmp(myhost,checkhost)\r\n       display(<span style=\"color: #A020F0\">'On Machine 1'<\/span>)\r\n    <span style=\"color: #0000FF\">else<\/span>\r\n        display(<span style=\"color: #A020F0\">'NOT on Machine 1'<\/span>)\r\n    <span style=\"color: #0000FF\">end<\/span>\r\n<span style=\"color: #0000FF\">end<\/span><\/pre><pre style=\"font-style:oblique\">Sending a stop signal to all the labs ... stopped.\r\nStarting matlabpool using the 'speedy' configuration ... connected to 16 labs.\r\nOn Machine 1\r\nOn Machine 1\r\nOn Machine 1\r\nNOT on Machine 1\r\nOn Machine 1\r\nNOT on Machine 1\r\nNOT on Machine 1\r\nNOT on Machine 1\r\nNOT on Machine 1\r\nNOT on Machine 1\r\nNOT on Machine 1\r\nNOT on Machine 1\r\nNOT on Machine 1\r\nNOT on Machine 1\r\nNOT on Machine 1\r\nNOT on Machine 1\r\n<\/pre><p><i>Note<\/i>: The <tt>~<\/tt> feature is new in R2009b and discussed as a new feature in one of Loren's previous blog <a href=\"https:\/\/blogs.mathworks.com\/loren\/2009\/09\/11\/matlab-release-2009b-best-new-feature-or\/\"> posts<\/a>.\r\n   <\/p>\r\n   <p><i>Doing Worker Specific Calculations<\/i><\/p>\r\n   <p>I would suggest using the new <tt>spmd<\/tt> functionality to do worker specific calculations. For more information about <a href=\"https:\/\/www.mathworks.com\/help\/releases\/R2009b\/toolbox\/distcomp\/index.html?\/access\/helpdesk\/help\/releases\/R2009b\/toolbox\/distcomp\/spmd.html\"><tt>spmd<\/tt><\/a>, check out the documentation.\r\n   <\/p>\r\n   <p>Clean up<\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">matlabpool <span style=\"color: #A020F0\">close<\/span><\/pre><pre style=\"font-style:oblique\">Sending a stop signal to all the labs ... stopped.\r\n<\/pre><h3>Your examples<a name=\"28\"><\/a><\/h3>\r\n   <p>Tell me about some of the ways you have used <tt>parfor<\/tt>-loops or feel free to post questions regarding non-performance related issues that haven't been addressed here.  Post your\r\n      questions and thoughts <a href=\"https:\/\/blogs.mathworks.com\/loren\/?p=199#respond\">here<\/a>.\r\n   <\/p><script language=\"JavaScript\">\r\n<!--\r\n\r\n    function grabCode_21027c06f45d49c38cb91403012a36e2() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='21027c06f45d49c38cb91403012a36e2 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 21027c06f45d49c38cb91403012a36e2';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        author = 'Loren Shure';\r\n        copyright = 'Copyright 2009 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add author and copyright lines at the bottom if specified.\r\n        if ((author.length > 0) || (copyright.length > 0)) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (author.length > 0) {\r\n                d.writeln('% _' + author + '_');\r\n            }\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n      \r\n      d.title = title + ' (MATLAB code)';\r\n      d.close();\r\n      }   \r\n      \r\n-->\r\n<\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_21027c06f45d49c38cb91403012a36e2()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n            the MATLAB code \r\n            <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; 7.9<br><\/p>\r\n<\/div>\r\n<!--\r\n21027c06f45d49c38cb91403012a36e2 ##### SOURCE BEGIN #####\r\n%% Using |parfor| Loops: Getting Up and Running\r\n% Today I\u00e2\u20ac&#x2122;d like to introduce a guest blogger, \r\n% <mailto:sarah.zaranek@mathworks.com Sarah Wait Zaranek>, who is an\r\n% application engineer here at The MathWorks. Sarah previously has <https:\/\/blogs.mathworks.com\/loren\/2008\/06\/25\/speeding-up-matlab-applications\/ written>\r\n% about speeding up code from a customer to get acceptable performance.\r\n% She again will be writing about speeding up MATLAB applications, but this\r\n% time her focus will be on using the parallel computing tools. \r\n\r\n%% Introduction\r\n% I wanted to write a post to help users better understand our parallel\r\n% computing tools. In this post, I will focus on one of the more\r\n% commonly used functions in these tools: the |parfor|-loop. \r\n%\r\n% This post will focus on getting a parallel code using |parfor| up and \r\n% running. Performance will not be addressed in this post. I will assume\r\n% that the reader has a basic knowledge of the |parfor|-loop construct.\r\n% Loren has a very nice introduction to using |parfor| in one of her\r\n% previous <https:\/\/blogs.mathworks.com\/loren\/2007\/10\/03\/parfor-the-course\/ posts>.\r\n% There are also some nice introductory\r\n% <https:\/\/www.mathworks.com\/products\/parallel-computing\/demos.html videos>.\r\n%\r\n% _Note for clarity_ : Since Loren's introductory post, the toolbox used\r\n% for parallel computing has changed names from the Distributed Computing\r\n% Toolbox to the <https:\/\/www.mathworks.com\/products\/parallel-computing\/ Parallel Computing Toolbox>.  \r\n% These are not two separate toolboxes.\r\n\r\n%% Method\r\n% In some cases, you may only need to change a |for|-loop to a |parfor|-loop\r\n% to get their code running in parallel. However, in other cases you \r\n% may need to slightly alter the code so that |parfor| can work. I decided to show a\r\n% few examples highlighting the main challenges that one might encounter.  I\r\n% have separated these examples into four encompassing categories:\r\n%% \r\n% * Independence\r\n% * Globals and Transparency\r\n% * Classification\r\n% * Uniqueness\r\n\r\n%% Background on |parfor|-loops\r\n% In a |parfor|-loop (just like in a standard |for|-loop) a series of\r\n% statements known as the loop body are iterated over a range of values.\r\n% However, when using a |parfor|-loop the iterations are run not on the\r\n% client MATLAB machine but are run in parallel on MATLAB workers.\r\n%\r\n% Each worker has its own unique workspace.  So, the data needed to do\r\n% these calculations is sent from the client to workers, and the results\r\n% are sent back to the client and pieced together. The cool thing about\r\n% |parfor| is this data transfer is handled for the user. When MATLAB gets to\r\n% the |parfor|-loop, it statically analyzes the body of the |parfor|-loop and\r\n% determines what information goes to which worker and what variables will\r\n% be returning to the client MATLAB. Understanding this concept will become\r\n% important when understanding why particular constraints are placed on\r\n% the use of |parfor|. \r\n\r\n%% Opening the matlabpool\r\n% Before looking at some examples, I will open up a matlabpool\r\n% so I can run my loops in parallel.  I will be opening up the matlabpool\r\n% using my default local configuration (i.e. my workers will be running on\r\n% the dual-core laptop machine where my MATLAB has been installed).\r\n%%\r\n\r\nif matlabpool('size') == 0 % checking to see if my pool is already open\r\n    matlabpool open 2\r\nend\r\n\r\n%%\r\n% _Note_ : The |'size'| option was new in R2008b.\r\n\r\n%% Independence\r\n% The |parfor|-loop is designed for task-parallel types of problems where\r\n% each iteration of the loop is independent of each other iteration.  \r\n% This is a critical requirement for using a |parfor|-loop.\r\n% Let's see an example of when each iteration is not independent.\r\n%\r\ntype dependentLoop.m\r\n%%\r\n% Checking the above code using M-Lint (MATLAB's static code analyzer) gives\r\n% a warning message that these iterations are dependent\r\n% and will not work with the |parfor| construct. M-Lint can either\r\n% be accessed via the editor or command line. In this case, I use the command\r\n% line and have defined a simple function |displayMlint| so that the display is\r\n% compact.\r\n%\r\noutput = mlint('dependentLoop.m');\r\ndisplayMlint(output)\r\n\r\n%%\r\n% Sometimes loops are intrinsically or unavoidably dependent, and therefore\r\n% |parfor| is not a good fit for that type of calculation. However, in some\r\n% cases it is possible to reformulate the body of the loop to eliminate the\r\n% dependency or separate it from the main time-consuming calculation.  \r\n%\r\n%% Globals and Transparency\r\n% All variables within the body of a |parfor|-loop must be transparent. This\r\n% means that all references to variables must occur in the text of the\r\n% program. Since MATLAB is statically analyzing the loops to figure out\r\n% what data goes to what worker and what data comes back, this seems like\r\n% an understandable restriction.\r\n%\r\n% Therefore, the following commands cannot be used within the body of a\r\n% |parfor|-loop : |evalc|, |eval|, |evalin|, and |assignin|. |load| can\r\n% also not be used unless the output of load is\r\n% <https:\/\/www.mathworks.com\/help\/releases\/R2009b\/techdoc\/ref\/load.html assigned> \r\n% to a variable name. It is possible to use the above functions\r\n% within a function called by |parfor|, due to the fact that the function\r\n% has its own workspace. I have found that this is often the easiest workaround for the\r\n% transparency issue. \r\n%\r\n% Additionally, you cannot define global variables or persistent variables within the body of the\r\n% |parfor| loop. I would also suggest being careful with the use of globals \r\n% since changes in global values on workers are not automatically reflected in local\r\n% global values. \r\n\r\n%% Classification\r\n%\r\n% A detailed description of the \r\n% <https:\/\/www.mathworks.com\/help\/releases\/R2009b\/toolbox\/distcomp\/index.html?\/access\/helpdesk\/help\/releases\/R2009b\/toolbox\/distcomp\/brdqtjj-1.html classification> \r\n% of variables in a |parfor|-loop is in the documentation.\r\n% I think it is useful to view classification as representing the different ways a\r\n% variable is passed between client and worker and the different ways it is\r\n% used within the body of the |parfor|-loop. \r\n%\r\n%%\r\n% _Challenges with Classification_\r\n%\r\n% Often challenges arise when first converting |for|-loops to |parfor|-loops\r\n% due to issues with this classification.  An often seen issue is the\r\n% conversion of nested |for|-loops, where sliced variables are not indexed\r\n% appropriately.   \r\n%\r\n% Sliced variables are variables where each worker is\r\n% calculating on a different part of that variable. Therefore, sliced\r\n% variables are sliced or divided amongst the workers. Sliced variables are\r\n% used to prevent unneeded data transfer from client to worker. \r\n% \r\n%%\r\n% _Using |parfor| with Nested |for|-Loops_\r\n%\r\n% The loop below is nested and encounters some of the restrictions placed\r\n% on |parfor| for sliced variables. \r\n\r\ntype parforNestTry.m\r\n\r\n%%\r\noutput = mlint('parforNestTry.m');\r\ndisplayMlint(output);\r\n\r\n%%\r\n% In this case, |A1| is a sliced variable. For sliced variables, the\r\n% restrictions are placed on the first-level variable indices.\r\n% This allows |parfor| to easily distribute the right part of the variable\r\n% to the right workers. \r\n%\r\n% The first level indexing ,in general, refers to indexing within the first set \r\n% of parenthesis or braces. This is \r\n% <https:\/\/www.mathworks.com\/help\/releases\/R2009b\/toolbox\/distcomp\/index.html?\/access\/helpdesk\/help\/releases\/R2009b\/toolbox\/distcomp\/brdqtjj-1.html explained>  \r\n% in more detail in the same section as classification in the documentation. \r\n%\r\n% One of these first-level indices must be the loop counter variable or the\r\n% counter variable plus or minus a constant.  _Every other first-level \r\n% index must be a constant, a non-loop counter variable, a colon, or an |end|._ \r\n\r\n%%\r\n% In this case, |A1| has an loop counter variable for both\r\n% first level indices (|ix| and |jx|).\r\n%\r\n% The solution to this is make sure a loop counter variable is only one of the indices\r\n% of |A1| and make the other index a colon. To implement this, the\r\n% results of the inner loop can be saved to a new variable and then that variable\r\n% can be saved to the desired variable outside the nested loop.\r\n%%\r\n\r\nA2 = zeros(10,10);\r\n\r\nparfor ix = 1:10\r\n    myTemp = zeros(1,10); \r\n    for jx = 1:10\r\n        myTemp(jx) = ix + jx;\r\n    end\r\n    A2(ix,:) = myTemp; \r\nend\r\n\r\n%%\r\n% You can also solve this issue by using cells. Since |jx| is now in the second\r\n% level of indexing, it can be an loop counter variable. \r\n\r\nA3 = cell(10,1); \r\n\r\nparfor ix = 1:10\r\n    for jx = 1:10\r\n        A3{ix}(jx) = ix + jx;\r\n    end\r\nend\r\n\r\nA3 = cell2mat(A3);\r\n\r\n%%  \r\n% I have found that both solutions have their benefits. While cells may be\r\n% easier to implement in your code, they also result in |A3| using more\r\n% memory due to the additional <https:\/\/www.mathworks.com\/help\/releases\/R2009b\/techdoc\/matlab_prog\/brh72ex-2.html#brh72ex-14 memory> \r\n% requirements for cells. The call to |cell2mat| also adds additional\r\n% processing time. \r\n%\r\n% A similar technique can be used for several levels of nested |for|-loops.  \r\n\r\n%% Uniqueness\r\n% \r\n% _Doing Machine Specific Calculations_\r\n%\r\n% This is a way, while using |parfor|-loops, to determine which machine you\r\n% are on and do machine specific instructions within the loop. An example\r\n% of why you would want to do this is if different machines have data files\r\n% in different directories, and you wanted to make sure to get into the\r\n% right directory. Do be careful if you make the code machine-specific\r\n% since it will be harder to port.\r\n\r\n%% \r\n\r\n% Getting the machine host name\r\n\r\n[~,hostname] = system('hostname');\r\n\r\n% If the loop iterations are the same as the size of matlabpool, the\r\n% command is run once per worker. \r\n\r\nparfor ix = 1:matlabpool('size')\r\n    [~,hostnameID{ix}] = system('hostname');\r\nend   \r\n\r\n% Can then do host\/machine specific commands\r\nhostnames = unique(hostnameID);\r\ncheckhost = hostnames(1);\r\n\r\nparfor ix = 1:matlabpool('size')\r\n    [~,myhost] = system('hostname');\r\n    if strcmp(myhost,checkhost)\r\n       display('On Machine 1')\r\n    else\r\n        display('NOT on Machine 1')\r\n    end\r\nend\r\n\r\n%%\r\n% In my case since I am running locally REPLACE_WITH_DASH_DASH all of the workers are on the\r\n% same machine.\r\n%%\r\n% Here's the same code running on a non-local cluster.\r\nmatlabpool close\r\nmatlabpool open speedy\r\nparfor ix = 1:matlabpool('size')\r\n    [~,hostnameID{ix}] = system('hostname');\r\nend   \r\n\r\n% Can then do host\/machine specific commands\r\nhostnames = unique(hostnameID);\r\ncheckhost = hostnames(1);\r\n\r\nparfor ix = 1:matlabpool('size')\r\n    [~,myhost] = system('hostname');\r\n    if strcmp(myhost,checkhost)\r\n       display('On Machine 1')\r\n    else\r\n        display('NOT on Machine 1')\r\n    end\r\nend\r\n%%\r\n% _Note_: The |~| feature is new in R2009b and discussed as a new feature\r\n% in one of Loren's previous blog <https:\/\/blogs.mathworks.com\/loren\/2009\/09\/11\/matlab-release-2009b-best-new-feature-or\/  posts>. \r\n\r\n%%\r\n% _Doing Worker Specific Calculations_\r\n%\r\n% I would suggest using the new |spmd| functionality to do worker specific \r\n% calculations. For more\r\n% information about <https:\/\/www.mathworks.com\/help\/releases\/R2009b\/toolbox\/distcomp\/index.html?\/access\/helpdesk\/help\/releases\/R2009b\/toolbox\/distcomp\/spmd.html |spmd|>, \r\n% check out the documentation.  \r\n%%\r\n% Clean up\r\nmatlabpool close\r\n\r\n\r\n%% Your examples\r\n% Tell me about some of the ways you have used |parfor|-loops or feel free to\r\n% post questions regarding non-performance related issues that haven't been\r\n% addressed here.  Post your questions and thoughts\r\n% <https:\/\/blogs.mathworks.com\/loren\/?p=199#respond here>.\r\n\r\n\r\n\r\n\r\n##### SOURCE END ##### 21027c06f45d49c38cb91403012a36e2\r\n-->","protected":false},"excerpt":{"rendered":"<p> Note.\r\n\r\n\r\n   \r\n      \r\nNOTE: matlabpool was removed in 2015 and you should replace that with \r\nparpool\r\n instead.\r\n      \r\n      Today I&#8217;d like to introduce a guest blogger, Sarah Wait... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/loren\/2009\/10\/02\/using-parfor-loops-getting-up-and-running\/\">read more >><\/a><\/p>","protected":false},"author":39,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[16,34],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/199"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/comments?post=199"}],"version-history":[{"count":5,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/199\/revisions"}],"predecessor-version":[{"id":4276,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/199\/revisions\/4276"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/media?parent=199"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/categories?post=199"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/tags?post=199"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}