{"id":109,"date":"2007-10-03T11:22:43","date_gmt":"2007-10-03T16:22:43","guid":{"rendered":"https:\/\/blogs.mathworks.com\/loren\/2007\/10\/03\/parfor-the-course\/"},"modified":"2016-11-10T20:35:16","modified_gmt":"2016-11-11T01:35:16","slug":"parfor-the-course","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/loren\/2007\/10\/03\/parfor-the-course\/","title":{"rendered":"parfor the Course"},"content":{"rendered":"<div class=\"content\">\n<p>Starting with release <a href=\"https:\/\/www.mathworks.com\/products\/new_products\/latest_features.html\">R2007b<\/a>, there are multiple ways to take advantage of newer hardware in MATLAB. In MATLAB alone, you can benefit from using multithreading,<br \/>\ndepending on what kind of calculations you do. If you have access to <a href=\"https:\/\/www.mathworks.com\/products\/distribtb\/index.html?s_cid=HP_FP_ML_DistributedComputingToolbox\">Distributed Computing Toolbox<\/a>, you have an additional set of possibilities.<\/p>\n<p>&nbsp;<\/p>\n<h3>Contents<\/h3>\n<div>\n<ul>\n<li><a href=\"#1\">Problem Set Up<\/a><\/li>\n<li><a href=\"#5\">Parallel Version<\/a><\/li>\n<li><a href=\"#6\">Run Parallel Algorithm and Compare<\/a><\/li>\n<li><a href=\"#7\">Run in Parallel Locally<\/a><\/li>\n<li><a href=\"#8\">Comparison<\/a><\/li>\n<li><a href=\"#11\">Local Workers<\/a><\/li>\n<li><a href=\"#12\">parfor and matlabpool<\/a><\/li>\n<li><a href=\"#13\">Do You Have Access to a Cluster?<\/a><\/li>\n<\/ul>\n<\/div>\n<h3>Problem Set Up<a name=\"1\"><\/a><\/h3>\n<p>Let's compute the rank of magic square matrices of various sizes. Each of these rank computations is independent of the others.<\/p>\n<pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid #c8c8c8;\">n = 400;\r\nranksSingle = zeros(1,n);<\/pre>\n<p>Because I want to compare some speeds, and I have a dual core laptop, I will run for now using a single processor using the<br \/>\nnew function <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/maxnumcompthreads.html\"><tt>maxNumCompThreads<\/tt><\/a>.<\/p>\n<pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid #c8c8c8;\">maxNumCompThreads(1);\r\ntic\r\n<span style=\"color: #0000ff;\">for<\/span> ind = 1:n\r\n    ranksSingle(ind) = rank(magic(ind));\r\n<span style=\"color: #0000ff;\">end<\/span>\r\ntoc\r\nplot(1:n,ranksSingle, <span style=\"color: #a020f0;\">'b-o'<\/span>, 1:n, 1:n, <span style=\"color: #a020f0;\">'m--'<\/span>)<\/pre>\n<pre style=\"font-style: oblique;\">Elapsed time is 22.641646 seconds.\r\n<\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/109\/parforTheCourse_01.png\" hspace=\"5\" vspace=\"5\" \/><\/p>\n<p>Zooming in, youI can see a pattern with the odd order magic squares having full rank.<\/p>\n<pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid #c8c8c8;\">axis([250 280 0 280])<\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/109\/parforTheCourse_02.png\" hspace=\"5\" vspace=\"5\" \/><\/p>\n<p>Since each of the rank calculations is independent from the others, we could have distributed these calculations to lots of<br \/>\nprocessors all at once.<\/p>\n<h3>Parallel Version<a name=\"5\"><\/a><\/h3>\n<p>With Distributed Computing Toolbox, you can use up to 4 local workers to prototype a parallel algorithm. Here's what the<br \/>\nalgorithm for the rank calculations. <tt>parMagic<\/tt> uses <a href=\"https:\/\/www.mathworks.com\/help\/distcomp\/parfor.html\"><tt>parfor<\/tt><\/a>, a new construct for executing independent passes through a loop. It is part of the MATLAB language, but behaves essentially<br \/>\nlike a regular for loop if you do not have access to Distributed Computing Toolbox.<\/p>\n<pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid #c8c8c8;\">dbtype <span style=\"color: #a020f0;\">parMagic<\/span><\/pre>\n<pre style=\"font-style: oblique;\">1     function ranks = parMagic(n)\r\n2     \r\n3     ranks = zeros(1,n);\r\n4     parfor (ind = 1:n)\r\n5         ranks(ind) = rank(magic(ind));  % last index could be ind,not n-ind+1\r\n6     end\r\n<\/pre>\n<h3>Run Parallel Algorithm and Compare<a name=\"6\"><\/a><\/h3>\n<p>Let's run the parallel version of the algorithm from <tt>parMagic<\/tt> still using a single process and compare results with the original for loop version.<\/p>\n<pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid #c8c8c8;\">tic\r\n   ranksSingle2 = parMagic(n);\r\ntoc\r\nisequal(ranksSingle, ranksSingle2)<\/pre>\n<pre style=\"font-style: oblique;\">Elapsed time is 22.733663 seconds.\r\nans =\r\n     1\r\n<\/pre>\n<h3>Run in Parallel Locally<a name=\"7\"><\/a><\/h3>\n<p>Now let's take advantage of the two cores in my laptop, by creating a pool of workers on which to do the calculations using<br \/>\nthe <tt>matlabpool<\/tt> command.<\/p>\n<pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid #c8c8c8;\">matlabpool <span style=\"color: #a020f0;\">local<\/span> <span style=\"color: #a020f0;\">2<\/span>\r\ntic\r\n   ranksPar = parMagic(n);\r\ntoc<\/pre>\n<pre style=\"font-style: oblique;\">To learn more about the capabilities and limitations of matlabpool, distributed\r\narrays, and associated parallel algorithms, use   doc matlabpool\r\n\r\nWe are very interested in your feedback regarding these capabilities.\r\nPlease send it to parallel_feedback@mathworks.com.\r\n\r\nSubmitted parallel job to the scheduler, waiting for it to start.\r\nConnected to a matlabpool session with 2 labs.\r\nElapsed time is 13.836088 seconds.\r\n<\/pre>\n<h3>Comparison<a name=\"8\"><\/a><\/h3>\n<p>Did we get the same answer?<\/p>\n<pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid #c8c8c8;\">isequal(ranksSingle, ranksPar)<\/pre>\n<pre style=\"font-style: oblique;\">ans =\r\n     1\r\n<\/pre>\n<p>In fact, we did! And the wall clock time sped up pretty decently as well, though not a full factor of 2.<\/p>\n<p>Let me close the <tt>matlabpool<\/tt> to finish off the example.<\/p>\n<pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid #c8c8c8;\">matlabpool <span style=\"color: #a020f0;\">close<\/span><\/pre>\n<pre style=\"font-style: oblique;\">Sending a stop signal to all the labs...\r\nWaiting for parallel job to finish...\r\nPerforming parallel job cleanup...\r\nDone.\r\n<\/pre>\n<h3>Local Workers<a name=\"11\"><\/a><\/h3>\n<p>With Distributed Computing Toolbox, I can use up to 4 local workers. So why did I choose to use just 2? Because on a dual-core<br \/>\nmachine, that just doesn't make lots of sense. However, running without a pool, then using a pool of size 1 perhaps, 2 and,<br \/>\nand maybe 4 helps me ensure that my algorithm is ready to run in parallel, perhaps for a larger cluster next. To do so additionally<br \/>\nrequires MATLAB Distributed Computing Engine.<\/p>\n<h3>parfor and matlabpool<a name=\"12\"><\/a><\/h3>\n<p><tt>matlabpool<\/tt> started 2 local matlab workers in the background. <tt>parfor<\/tt> in the current \"regular\" matlab decided how to divide the <tt>parfor<\/tt> range among the 2 <tt>matlabpool<\/tt> workers as the workers performed the calculations. To learn a bit more about the constraints of code that works in a <tt>parfor<\/tt> loop, I recommend you read the portion of documentation on variable classifications.<\/p>\n<h3>Do You Have Access to a Cluster?<a name=\"13\"><\/a><\/h3>\n<p>I wonder if you have access to a cluster. Can you see places in your code that could take advantage of some parallelism if<br \/>\nyou had access to the right hardware? Let me know <a href=\"https:\/\/blogs.mathworks.com\/loren\/?p=109#respond\">here<\/a>.<\/p>\n<p><script>\/\/ <![CDATA[\nfunction grabCode_39cfeac328914b5ebeff1a19027de726() {\n        \/\/ Remember the title so we can use it in the new page\n        title = document.title;\n\n        \/\/ Break up these strings so that their presence\n        \/\/ in the Javascript doesn't mess up the search for\n        \/\/ the MATLAB code.\n        t1='39cfeac328914b5ebeff1a19027de726 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 39cfeac328914b5ebeff1a19027de726';\n    \n        b=document.getElementsByTagName('body')[0];\n        i1=b.innerHTML.indexOf(t1)+t1.length;\n        i2=b.innerHTML.indexOf(t2);\n \n        code_string = b.innerHTML.substring(i1, i2);\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\n\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \n        \/\/ in the XML parser.\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\n        \/\/ doesn't go ahead and substitute the less-than character. \n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\n\n        author = 'Loren Shure';\n        copyright = 'Copyright 2007 The MathWorks, Inc.';\n\n        w = window.open();\n        d = w.document;\n        d.write('\n\n\n\n\n\n<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add author and copyright lines at the bottom if specified.\r\n        if ((author.length > 0) || (copyright.length > 0)) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (author.length > 0) {\r\n                d.writeln('% _' + author + '_');\r\n            }\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\n\n\n\n\n\n\n\\n');\n      \n      d.title = title + ' (MATLAB code)';\n      d.close();\n      }\n\/\/ ]]><\/script><\/p>\n<p style=\"text-align: right; font-size: xx-small; font-weight: lighter; font-style: italic; color: gray;\"><a><span style=\"font-size: x-small; font-style: italic;\">Get<br \/>\nthe MATLAB code<br \/>\n<noscript>(requires JavaScript)<\/noscript><\/span><\/a><\/p>\n<p>Published with MATLAB\u00ae 7.5<\/p>\n<\/div>\n<p><!--\n39cfeac328914b5ebeff1a19027de726 ##### SOURCE BEGIN #####\n%% parfor the Course\n% Starting with release <https:\/\/www.mathworks.com\/products\/new_products\/latest_features.html R2007b>,\n% there are multiple ways to take advantage of newer hardware in MATLAB.\n%  In MATLAB alone, you can benefit from using multithreading, depending on\n%  what kind of calculations you do.  If you have access to\n% <https:\/\/www.mathworks.com\/products\/distribtb\/index.html?s_cid=HP_FP_ML_DistributedComputingToolbox Distributed Computing Toolbox>,\n% you have an additional set of possibilities.\n%% Problem Set Up\n% Let's compute the rank of magic square matrices of various sizes.  Each\n% of these rank computations is independent of the others.\nn = 400;\nranksSingle = zeros(1,n);\n%%\n% Because I want to compare some speeds, and I have a dual core laptop,\n% I will run for now using a single processor using the new function\n% <https:\/\/www.mathworks.com\/help\/matlab\/ref\/maxnumcompthreads.html |maxNumCompThreads|>.\nmaxNumCompThreads(1);\ntic\nfor ind = 1:n\nranksSingle(ind) = rank(magic(ind));\nend\ntoc\nplot(1:n,ranksSingle, 'b-o', 1:n, 1:n, 'mREPLACE_WITH_DASH_DASH')\n%%\n% Zooming in, youI can see a pattern with the odd order magic squares\n% having full rank.\naxis([250 280 0 280])\n%%\n% Since each of the rank calculations is independent from the others, we\n% could have distributed these calculations to lots of processors\n% all at once.\n%% Parallel Version\n% With Distributed Computing Toolbox, you can use up to 4 local workers to\n% prototype a parallel\n% algorithm.  Here's what the algorithm for the rank calculations.\n% |parMagic| uses\n% <https:\/\/www.mathworks.com\/help\/distcomp\/parfor.html |parfor|>,\n% a new construct for executing independent passes through a loop.  It is\n% part of the MATLAB language, but behaves essentially like a regular for\n% loop if you do not have access to Distributed Computing Toolbox.\ndbtype parMagic\n%% Run Parallel Algorithm and Compare\n% Let's run the parallel version of the algorithm from |parMagic| still\n% using a single process and compare results with the original for loop\n% version.\ntic\nranksSingle2 = parMagic(n);\ntoc\nisequal(ranksSingle, ranksSingle2)\n%% Run in Parallel Locally\n% Now let's take advantage of the two cores in my laptop, by creating a\n% pool of workers on which to do the calculations using the\n% <https:\/\/www.mathworks.com\/access\/helpdesk\/help\/toolbox\/distcomp\/matlabpool.html |matlabpool|>\n% command.\nmatlabpool local 2\ntic\nranksPar = parMagic(n);\ntoc\n%% Comparison\n% Did we get the same answer?\nisequal(ranksSingle, ranksPar)\n%%\n% In fact, we did!  And the wall clock time sped up pretty decently as\n% well, though not a full factor of 2.\n%%\n% Let me close the |matlabpool| to finish off the example.\nmatlabpool close\n%% Local Workers\n% With Distributed Computing Toolbox, I can use up to 4 local workers.  So\n% why did I choose to use just 2?  Because on a dual-core machine, that\n% just doesn't make lots of sense.  However, running without a pool, then\n% using a pool of size 1 perhaps, 2 and, and maybe 4 helps me ensure that\n% my algorithm is ready to run in parallel, perhaps for a larger cluster\n% next.  To do so additionally requires\n% <https:\/\/www.mathworks.com\/access\/helpdesk\/help\/toolbox\/mdce\/mdce_product_page.html MATLAB Distributed Computing Engine>.\n%% parfor and matlabpool\n% |matlabpool| started 2 local matlab workers in the background.  |parfor|\n% in the current \"regular\" matlab\n% decided how to divide the |parfor| range among the 2 |matlabpool| workers\n% as the workers performed the calculations.  To learn a bit more about the\n% constraints of code that works in a |parfor| loop, I recommend you read\n% the portion of documentation on\n% <https:\/\/www.mathworks.com\/access\/helpdesk\/help\/toolbox\/distcomp\/brdqtjj-1.html#brdsfkm-1 variable classifications>.\n%% Do You Have Access to a Cluster?\n% I wonder if you have access to a cluster.  Can you see places in your code\n% that could take advantage of some parallelism if you had access to the\n% right hardware?  Let me know\n% <https:\/\/blogs.mathworks.com\/loren\/?p=109#respond here>.\n\n##### SOURCE END ##### 39cfeac328914b5ebeff1a19027de726\n--><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\nStarting with release R2007b, there are multiple ways to take advantage of newer hardware in MATLAB. In MATLAB alone, you can benefit from using multithreading,<br \/>\ndepending on what kind of... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/loren\/2007\/10\/03\/parfor-the-course\/\">read more >><\/a><\/p>\n","protected":false},"author":39,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[6],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/109"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/comments?post=109"}],"version-history":[{"count":3,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/109\/revisions"}],"predecessor-version":[{"id":2111,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/109\/revisions\/2111"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/media?parent=109"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/categories?post=109"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/tags?post=109"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}