{"id":105,"date":"2010-10-17T11:00:48","date_gmt":"2010-10-17T11:00:48","guid":{"rendered":"https:\/\/blogs.mathworks.com\/seth\/2010\/10\/17\/parallel-computing-with-simulink-running-thousands-of-simulations\/"},"modified":"2017-12-21T17:01:55","modified_gmt":"2017-12-21T22:01:55","slug":"parallel-computing-with-simulink-running-thousands-of-simulations","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/simulink\/2010\/10\/17\/parallel-computing-with-simulink-running-thousands-of-simulations\/","title":{"rendered":"Parallel Computing with Simulink: Running Thousands of Simulations!"},"content":{"rendered":"<div xmlns:mwsh=\"https:\/\/www.mathworks.com\/namespace\/mcode\/v1\/syntaxhighlight.dtd\" class=\"content\">\r\n\r\n<p><strong><em>Update: In MATLAB R2017a the function <a href=\"https:\/\/www.mathworks.com\/help\/simulink\/slref\/parsim.html\">PARSIM<\/a> got introduced. For a better experience simulating models in parallel, we recommend using PARSIM instead of SIM inside parfor. See the more recent blog post <a href=\"https:\/\/blogs.mathworks.com\/simulink\/2017\/04\/14\/simulating-models-in-parallel-made-easy-with-parsim\">Simulating models in parallel made easy with parsim<\/a> for more details.<\/em><\/strong><\/p>\r\n\r\n<p>-------------------<\/p>\r\n\r\n   <introduction>\r\n      <p>If you have to run thousands of simulations, you will probably want to do it as quickly as possibly.  While in some cases\r\n         this means looking for more efficient modeling techniques, or buying a more powerful computer, in this post I want to show\r\n         how to run thousands of simulations in parallel across all available MATLAB workers.  Specifically, this post\r\n         is about how to use the <a href=\"https:\/\/www.mathworks.com\/products\/parallel-computing\/\">Parallel Computing Toolbox<\/a>, and <a href=\"https:\/\/www.mathworks.com\/help\/releases\/R2010b\/techdoc\/ref\/parfor.html\"><tt>PARFOR<\/tt><\/a> with Simulink.\r\n      <\/p>\r\n   <\/introduction>\r\n   <h3>Contents<\/h3>\r\n   <div>\r\n      <ul>\r\n         <li><a href=\"#1\">Embarrassingly Parallel Problems<\/a><\/li>\r\n         <li><a href=\"#2\">Computing the Basin of Attraction<\/a><\/li>\r\n         <li><a href=\"#6\">Forced Damped Pendulum Model<\/a><\/li>\r\n         <li><a href=\"#8\">Nested FOR loops<\/a><\/li>\r\n         <li><a href=\"#11\">Converting to a PARFOR loop<\/a><\/li>\r\n         <li><a href=\"#15\">Local MATLAB Workers<\/a><\/li>\r\n         <li><a href=\"#19\">Connecting to a Cluster<\/a><\/li>\r\n         <li><a href=\"#22\">Timing Summary<\/a><\/li>\r\n         <li><a href=\"#23\">A Quarter Million Simulations<\/a><\/li>\r\n         <li><a href=\"#25\">Performance Increase<\/a><\/li>\r\n         <li><a href=\"#26\">Now it's your turn<\/a><\/li>\r\n      <\/ul>\r\n   <\/div>\r\n   <h3>Embarrassingly Parallel Problems<a name=\"1\"><\/a><\/h3>\r\n   <p><tt>PARFOR<\/tt> is the parallel for-loop construct in MATLAB.  Using the Parallel Computing toolbox, you can start a local pool of MATLAB\r\n      Workers, or connect to a cluster running the <a href=\"https:\/\/www.mathworks.com\/products\/distriben\/index.html\">MATLAB Distributed Computing Server<\/a>. Once connected, these <tt>PARFOR<\/tt> loops are automatically split from serial execution into parallel execution.\r\n   <\/p>\r\n   <p><tt>PARFOR<\/tt> enables parallelization of what are often referred to as <i>embarrassingly parallel problems<\/i>.  Each iteration of the loop must be able to run independent of every other iteration of the loop.  There can be no dependency\r\n      of the variables in one loop on the results of a previous loop.  Some examples of this in Simulink are problems where you\r\n      need to perform a parameter sweep, or running simulations of a large number of models just to generate the data output for\r\n      later analysis.\r\n   <\/p>\r\n   <h3>Computing the Basin of Attraction<a name=\"2\"><\/a><\/h3>\r\n   <p>To illustrate an embarrassingly parallel problem, I will look at computing the basin of attraction for a simple differential equation.\r\n       (Thanks to my colleague Ned for giving me this example.)\r\n   <\/p>\r\n   <p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/seth\/2010Q4\/b106_parforSim_eq50521.png\"> <\/p>\r\n   <p>The equation is from an article about <a href=\"http:\/\/www.scholarpedia.org\/article\/Basin_of_attraction\">basins of attraction on Scholarpedia<\/a>.  When the system is integrated forward, the <img decoding=\"async\"  src=\"https:\/\/blogs.mathworks.com\/images\/seth\/2010Q4\/b106_parforSim_eq49435.png\">  state will approach one of two attractors.  Which attractor it approaches depends on the initial conditions of <img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/seth\/2010Q4\/b106_parforSim_eq49435.png\">  and its derivative, <img decoding=\"async\"  src=\"https:\/\/blogs.mathworks.com\/images\/seth\/2010Q4\/b106_parforSim_eq02930.png\"> .  The Scholarpedia article shows the following image as the fractal basin of attraction for this equation.\r\n   <\/p>\r\n   <p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/seth\/2010Q4\/300px-Ott_fig3.gif\"> <\/p>\r\n   <p>(picture made by H.E. Nusse)<\/p>\r\n   <p>Each of the thousands of pixels in that plot is the result of a simulation.  So, to make that plot you need to run thousands\r\n      of simulations.\r\n   <\/p>\r\n   <h3>Forced Damped Pendulum Model<a name=\"6\"><\/a><\/h3>\r\n   <p>Here is my version of that differential equation in Simulink.<\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">open_system(<span style=\"color: #A020F0\">'ForcedDampedPendulum'<\/span>)<\/pre>\r\n<p><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/seth\/2010Q4\/b106_parforSim_Model.png\"> <\/p>\r\n\r\n<p>To see how <img decoding=\"async\"  src=\"https:\/\/blogs.mathworks.com\/images\/seth\/2010Q4\/b106_parforSim_eq49435.png\">  evolves over time, here is a plot of 100 seconds of simulation.  (Notice, I'm using the <a href=\"https:\/\/www.mathworks.com\/help\/releases\/R2010b\/toolbox\/simulink\/rn\/br28su4.html\">single output SIM command<\/a>.)\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">simOut = sim(<span style=\"color: #A020F0\">'ForcedDampedPendulum'<\/span>,<span style=\"color: #A020F0\">'StopTime'<\/span>,<span style=\"color: #A020F0\">'100'<\/span>);\r\ny = simOut.get(<span style=\"color: #A020F0\">'yout'<\/span>);\r\nt = simOut.get(<span style=\"color: #A020F0\">'tout'<\/span>);\r\nplot(t,y)\r\nhold <span style=\"color: #A020F0\">on<\/span><\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/seth\/2010Q4\/b106_parforSim_01.png\"> <h3>Nested FOR loops<a name=\"8\"><\/a><\/h3>\r\n   <p>Using nested <tt>for<\/tt> loops, we can sweep over a range of values for <img decoding=\"async\"  src=\"https:\/\/blogs.mathworks.com\/images\/seth\/2010Q4\/b106_parforSim_eq49435.png\">  and <img decoding=\"async\"  src=\"https:\/\/blogs.mathworks.com\/images\/seth\/2010Q4\/b106_parforSim_eq02930.png\"> .  If we add that to the plot above, you can visualize the attractors at <img decoding=\"async\"  src=\"https:\/\/blogs.mathworks.com\/images\/seth\/2010Q4\/b106_parforSim_eq47890.png\"> . I can use a grayscale <tt>colorbar<\/tt> to indicate the final value of <img decoding=\"async\"  src=\"https:\/\/blogs.mathworks.com\/images\/seth\/2010Q4\/b106_parforSim_eq49435.png\">  at the end of the simulation to make a basin of attraction image.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">n = 5;\r\nthetaRange = linspace(-pi,pi,n);\r\nthetadotRange = linspace(-5,5,n);\r\ntic;\r\n<span style=\"color: #0000FF\">for<\/span> i = 1:n\r\n    <span style=\"color: #0000FF\">for<\/span> j = 1:n\r\n        thetaX0 = thetaRange(i);\r\n        thetadotX0 = thetadotRange(j);\r\n        simOut = sim(<span style=\"color: #A020F0\">'ForcedDampedPendulum'<\/span>,<span style=\"color: #A020F0\">'StopTime'<\/span>,<span style=\"color: #A020F0\">'100'<\/span>);\r\n        y = simOut.get(<span style=\"color: #A020F0\">'yout'<\/span>);\r\n        t = simOut.get(<span style=\"color: #A020F0\">'tout'<\/span>);\r\n        plot(t,y)\r\n    <span style=\"color: #0000FF\">end<\/span>\r\n<span style=\"color: #0000FF\">end<\/span>\r\nt1 = toc;\r\ncolormap <span style=\"color: #A020F0\">gray<\/span>\r\ncolorbar<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/seth\/2010Q4\/b106_parforSim_02.png\"> <p>I have added <tt>TIC<\/tt>\/<tt>TOC<\/tt> timing to the code to measure the time spent running these simulations, and from this we can figure out the time required\r\n      per simulation.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">t1PerSim = t1\/n^2;\r\ndisp([<span style=\"color: #A020F0\">'Time per simulation with nested FOR loops = '<\/span> num2str(t1PerSim)])<\/pre><pre style=\"font-style:oblique\">Time per simulation with nested FOR loops = 0.11862\r\n<\/pre><p>During the loops, I iterate over each combination of values only once. This represents <img decoding=\"async\"  src=\"https:\/\/blogs.mathworks.com\/images\/seth\/2010Q4\/b106_parforSim_eq26263.png\">  simulations that need to be run.  Because the simulations are all independent of each other, this is a perfect candidate\r\n      for distribution to parallel MATLABs.\r\n   <\/p>\r\n   <h3>Converting to a PARFOR loop<a name=\"11\"><\/a><\/h3>\r\n   <p><tt>PARFOR<\/tt> loops can not be nested, so the iteration across the initial conditions needs to be rewritten as a single loop.  For this\r\n      problem I can re-imagine the output image as a grid of pixels.  If I create a corresponding <tt>thetaX0<\/tt> and <tt>thetadotX0<\/tt> matrix, I will be able to loop over the elements in a PARFOR loop.  An easy way to do this is with <tt>MESHGRID<\/tt>.\r\n   <\/p>\r\n   <p>Another important change to the loop is using the <tt>ASSIGNIN<\/tt> function to set the initial values of the states.  Within <tt>PARFOR<\/tt>, the <tt>'base'<\/tt> workspace refers to the MATLAB workers, where the simulation is run.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">n = 20;\r\n[thetaMat,thetadotMat] = meshgrid(linspace(-pi,pi,n),linspace(-5,5,n));\r\nmap = zeros(n,n);\r\ntic;\r\n<span style=\"color: #0000FF\">parfor<\/span> i = 1:n^2\r\n    assignin(<span style=\"color: #A020F0\">'base'<\/span>,<span style=\"color: #A020F0\">'thetaX0'<\/span>,thetaMat(i));\r\n    assignin(<span style=\"color: #A020F0\">'base'<\/span>,<span style=\"color: #A020F0\">'thetadotX0'<\/span>,thetadotMat(i));\r\n    simOut = sim(<span style=\"color: #A020F0\">'ForcedDampedPendulum'<\/span>,<span style=\"color: #A020F0\">'StopTime'<\/span>,<span style=\"color: #A020F0\">'100'<\/span>);\r\n    y = simOut.get(<span style=\"color: #A020F0\">'yout'<\/span>);\r\n    map(i)=y(end);\r\n<span style=\"color: #0000FF\">end<\/span>\r\nt2 = toc;<\/pre><p>I can use a scaled image to display the basin of attraction. (Of course, this is verly low resolution.)<\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">clf\r\nimagesc([-pi,pi],[-5,5],map)\r\ncolormap(gray), axis <span style=\"color: #A020F0\">xy<\/span> <span style=\"color: #A020F0\">square<\/span><\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/seth\/2010Q4\/b106_parforSim_03.png\"> <p>The <tt>PARFOR<\/tt> loop acts as a normal <tt>for<\/tt>-loop when there are no MATLAB workers.  We can see this from the timing analysis.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">t2PerSim = t2\/n^2;\r\ndisp([<span style=\"color: #A020F0\">'Time per simulation with PARFOR loop = '<\/span> num2str(t2PerSim)])<\/pre><pre style=\"font-style:oblique\">Time per simulation with PARFOR loop = 0.13012\r\n<\/pre><h3>Local MATLAB Workers<a name=\"15\"><\/a><\/h3>\r\n   <p>Most computers today have multiple processors, or multi-core architectures.  To take advantage of all the cores on this computer,\r\n      I can use the <tt>MATLABPOOL<\/tt> command to start a local MATLAB pool of workers.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">matlabpool <span style=\"color: #A020F0\">open<\/span> <span style=\"color: #A020F0\">local<\/span>\r\nnWorkers3 = matlabpool(<span style=\"color: #A020F0\">'size'<\/span>);<\/pre><pre style=\"font-style:oblique\">Starting matlabpool using the 'local' configuration ... connected to 4 labs.\r\n<\/pre><p>In order to remove any first time effects from my timing analysis, I can run commands using <tt>pctRunOnAll<\/tt> to load the model and simulate it.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">pctRunOnAll(<span style=\"color: #A020F0\">'load_system(''ForcedDampedPendulum'')'<\/span>) <span style=\"color: #228B22\">% warm-up<\/span>\r\npctRunOnAll(<span style=\"color: #A020F0\">'sim(''ForcedDampedPendulum'')'<\/span>)         <span style=\"color: #228B22\">% warm-up<\/span><\/pre><p>Now, let's measure the <tt>PARFOR<\/tt> loop again, using the full power of all the cores on this computer.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">n = 20;\r\n[thetaMat,thetadotMat] = meshgrid(linspace(-pi,pi,n),linspace(-5,5,n));\r\nmap = zeros(n,n);\r\ntic;\r\n<span style=\"color: #0000FF\">parfor<\/span> i = 1:n^2\r\n    assignin(<span style=\"color: #A020F0\">'base'<\/span>,<span style=\"color: #A020F0\">'thetaX0'<\/span>,thetaMat(i));\r\n    assignin(<span style=\"color: #A020F0\">'base'<\/span>,<span style=\"color: #A020F0\">'thetadotX0'<\/span>,thetadotMat(i));\r\n    simOut = sim(<span style=\"color: #A020F0\">'ForcedDampedPendulum'<\/span>,<span style=\"color: #A020F0\">'StopTime'<\/span>,<span style=\"color: #A020F0\">'100'<\/span>);\r\n    y = simOut.get(<span style=\"color: #A020F0\">'yout'<\/span>);\r\n    map(i)=y(end);\r\n<span style=\"color: #0000FF\">end<\/span>\r\nt3=toc;\r\n\r\nt3PerSim = t3\/n^2;\r\ndisp([<span style=\"color: #A020F0\">'Time per simulation with PARFOR and '<\/span> num2str(nWorkers3) <span style=\"color: #0000FF\">...<\/span>\r\n    <span style=\"color: #A020F0\">' workers = '<\/span> num2str(t3PerSim)])\r\ndisp([<span style=\"color: #A020F0\">'Speed up factor = '<\/span> num2str(t1PerSim\/t3PerSim)])<\/pre><pre style=\"font-style:oblique\">Time per simulation with PARFOR and 4 workers = 0.025087\r\nSpeed up factor = 4.7285\r\n<\/pre><p>Now that I am done with the local MATLAB Workers, I can issue a command to close them.<\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">matlabpool <span style=\"color: #A020F0\">close<\/span><\/pre><pre style=\"font-style:oblique\">Sending a stop signal to all the labs ... stopped.\r\n<\/pre><h3>Connecting to a Cluster<a name=\"19\"><\/a><\/h3>\r\n   <p>I am fortunate enough to have access to a small distributed computing cluster, configured with MATLAB Distributed Computing\r\n      Server.  The cluster consists of four, quad-core computers with 6 GB of memory.  It is configured to start one MATLAB worker\r\n      for each core, giving a 16 node cluster.  I can repeat my timing experiments here.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">matlabpool <span style=\"color: #A020F0\">open<\/span> <span style=\"color: #A020F0\">Cluster16<\/span>\r\nnWorkers4 = matlabpool(<span style=\"color: #A020F0\">'size'<\/span>);\r\npctRunOnAll(<span style=\"color: #A020F0\">'load_system(''ForcedDampedPendulum'')'<\/span>) <span style=\"color: #228B22\">% warm-up<\/span>\r\npctRunOnAll(<span style=\"color: #A020F0\">'sim(''ForcedDampedPendulum'')'<\/span>)         <span style=\"color: #228B22\">% warm-up<\/span><\/pre><pre style=\"font-style:oblique\">Starting matlabpool using the 'Cluster16' configuration ... connected to 16 labs.\r\n<\/pre><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">n = 20;\r\n[thetaMat,thetadotMat] = meshgrid(linspace(-pi,pi,n),linspace(-5,5,n));\r\nmap = zeros(n,n);\r\ntic;\r\n<span style=\"color: #0000FF\">parfor<\/span> i = 1:n^2\r\n    assignin(<span style=\"color: #A020F0\">'base'<\/span>,<span style=\"color: #A020F0\">'thetaX0'<\/span>,thetaMat(i));\r\n    assignin(<span style=\"color: #A020F0\">'base'<\/span>,<span style=\"color: #A020F0\">'thetadotX0'<\/span>,thetadotMat(i));\r\n    simOut = sim(<span style=\"color: #A020F0\">'ForcedDampedPendulum'<\/span>,<span style=\"color: #A020F0\">'StopTime'<\/span>,<span style=\"color: #A020F0\">'100'<\/span>);\r\n    y = simOut.get(<span style=\"color: #A020F0\">'yout'<\/span>);\r\n    map(i)=y(end);\r\n<span style=\"color: #0000FF\">end<\/span>\r\nt4 = toc;\r\n\r\nt4PerSim = t4\/n^2;\r\ndisp([<span style=\"color: #A020F0\">'Time per simulation with PARFOR and '<\/span> num2str(nWorkers4) <span style=\"color: #0000FF\">...<\/span>\r\n    <span style=\"color: #A020F0\">' workers = '<\/span> num2str(t4PerSim)])\r\ndisp([<span style=\"color: #A020F0\">'Speed up factor = '<\/span> num2str(t1PerSim\/t4PerSim)])<\/pre><pre style=\"font-style:oblique\">Time per simulation with PARFOR and 16 workers = 0.0064637\r\nSpeed up factor = 18.3521\r\n<\/pre><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">matlabpool <span style=\"color: #A020F0\">close<\/span><\/pre><pre style=\"font-style:oblique\">Sending a stop signal to all the labs ... stopped.\r\n<\/pre><h3>Timing Summary<a name=\"22\"><\/a><\/h3>\r\n   <p>Here is a summary of the timing measurements:<\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">disp([<span style=\"color: #A020F0\">'Time per simulation with nested FOR loops = '<\/span> num2str(t1PerSim)])\r\ndisp([<span style=\"color: #A020F0\">'Time per simulation with PARFOR loop = '<\/span> num2str(t2PerSim)])\r\ndisp([<span style=\"color: #A020F0\">'Time per simulation with PARFOR and '<\/span> num2str(nWorkers3) <span style=\"color: #0000FF\">...<\/span>\r\n    <span style=\"color: #A020F0\">' workers = '<\/span> num2str(t3PerSim)])\r\ndisp([<span style=\"color: #A020F0\">'Speed up factor for '<\/span> num2str(nWorkers3) <span style=\"color: #0000FF\">...<\/span>\r\n    <span style=\"color: #A020F0\">' local workers = '<\/span> num2str(t1PerSim\/t3PerSim)])\r\ndisp([<span style=\"color: #A020F0\">'Time per simulation with PARFOR and '<\/span> num2str(nWorkers4) <span style=\"color: #0000FF\">...<\/span>\r\n    <span style=\"color: #A020F0\">' workers = '<\/span> num2str(t4PerSim)])\r\ndisp([<span style=\"color: #A020F0\">'Speed up factor for '<\/span> num2str(nWorkers4) <span style=\"color: #0000FF\">...<\/span>\r\n    <span style=\"color: #A020F0\">' workers on the cluster = '<\/span> num2str(t1PerSim\/t4PerSim)])<\/pre><pre style=\"font-style:oblique\">Time per simulation with nested FOR loops = 0.11862\r\nTime per simulation with PARFOR loop = 0.13012\r\nTime per simulation with PARFOR and 4 workers = 0.025087\r\nSpeed up factor for 4 local workers = 4.7285\r\nTime per simulation with PARFOR and 16 workers = 0.0064637\r\nSpeed up factor for 16 workers on the cluster = 18.3521\r\n<\/pre><h3>A Quarter Million Simulations<a name=\"23\"><\/a><\/h3>\r\n   <p>Many months ago I tinkered with this problem, using a high performance desktop machine with 4 local workers, and generated\r\n      the following graph. This represents 250,000 simulations.\r\n   <\/p>\r\n   <p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/seth\/2010Q4\/basin500_1000.jpg\"> <\/p>\r\n   <h3>Performance Increase<a name=\"25\"><\/a><\/h3>\r\n   <p>There are many factors that contribute to the performance of parallel code execution.  Breaking up the loop and distributing\r\n      it across workers represents some amount of overhead.  For some, smaller problems with a small number of workers, this overhead\r\n      can swamp the problem.  Another major factor to consider is data transfer and network speed; transferring large amounts of\r\n      data adds overhead that doesn't normally exist in MATLAB.\r\n   <\/p>\r\n   <h3>Now it's your turn<a name=\"26\"><\/a><\/h3>\r\n   <p>Do you have a problem that could benefit from a <tt>PARFOR<\/tt> loop and a cluster of MATLAB Workers?  Leave a <a href=\"https:\/\/blogs.mathworks.com\/seth\/?p=105&amp;#comment\">comment here<\/a> and tell us all about it.\r\n   <\/p><script language=\"JavaScript\">\r\n<!--\r\n\r\n    function grabCode_5022bc0f483f42adb3d890e7a68befde() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='5022bc0f483f42adb3d890e7a68befde ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 5022bc0f483f42adb3d890e7a68befde';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        author = 'Seth Popinchalk';\r\n        copyright = 'Copyright 2010 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add author and copyright lines at the bottom if specified.\r\n        if ((author.length > 0) || (copyright.length > 0)) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (author.length > 0) {\r\n                d.writeln('% _' + author + '_');\r\n            }\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n      \r\n      d.title = title + ' (MATLAB code)';\r\n      d.close();\r\n      }   \r\n      \r\n-->\r\n<\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_5022bc0f483f42adb3d890e7a68befde()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n            the MATLAB code \r\n            <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; 7.11<br><\/p>\r\n<\/div>\r\n<!--\r\n5022bc0f483f42adb3d890e7a68befde ##### SOURCE BEGIN #####\r\n%% Parallel Computing with Simulink: Running Thousands of Simulations!\r\n%\r\n% If you have to run thousands of simulations, you will probably want to do\r\n% it as quickly as possibly.  While in some cases this means looking for\r\n% more efficient modeling techniques, or buying a more powerful hardware,\r\n% in this post I want to show how to can run thousands of simulations in\r\n% parallel across all of your available MATLAB workers.  Specifically, this\r\n% post is about how to use the\r\n% <https:\/\/www.mathworks.com\/products\/parallel-computing\/ Parallel Computing\r\n% Toolbox>, and <https:\/\/www.mathworks.com\/help\/releases\/R2010b\/techdoc\/ref\/parfor.html\r\n% |PARFOR|> with Simulink.\r\n\r\n%% Embarrassingly Parallel Problems\r\n% |PARFOR| is the parallel for-loop construct in MATLAB.  Using the\r\n% Parallel Computing toolbox, you can start a local pool of MATLAB Workers,\r\n% or connect to a cluster running the\r\n% <https:\/\/www.mathworks.com\/products\/distriben\/index.html MATLAB\r\n% Distributed Computing Server>. Once connected, these |PARFOR| loops are\r\n% automatically split from a serial execution into parallel execution.\r\n%\r\n% |PARFOR| enables parallelization of what are often referred to as\r\n% _embarrassingly parallel problems_.  Each iteration of the loop must be able\r\n% to run independent of every other iteration of the loop.  There can be no\r\n% dependency of the variables in one loop on the results of a previous\r\n% loop.  Some examples of this in Simulink are problems where you need to\r\n% perform a parameter sweep, or running simulations of a large number of\r\n% models just to generate the data output for later analysis.\r\n\r\n%% Computing the Basin of Attraction\r\n% To illustrate an embarrassingly parallel problem, I will look at computing the\r\n% basin of attraction for a simple differential equation.  (Thanks to my\r\n% colleague Ned for giving me this example.)\r\n% \r\n% $$\\ddot{\\theta} + 0.1 \\dot{\\theta} + sin \\theta = 2.1 cos t$$\r\n%\r\n% The equation is from an article about\r\n% <http:\/\/www.scholarpedia.org\/article\/Basin_of_attraction basins of\r\n% attraction on Scholarpedia>.  When the system is integrated forward, the\r\n% $\\theta$ state will approach one of two attractors.  Which attractor it\r\n% approaches depends on the initial conditions of $\\theta$ and its\r\n% derivative, $\\dot{\\theta}$.  The Scholarpedia article shows the following\r\n% image as the fractal basin of attraction for this equation.\r\n%%\r\n%\r\n% <<300px-Ott_fig3.gif>>\r\n%\r\n%%\r\n% (picture made by H.E. Nusse)\r\n\r\n%%\r\n% Each of the thousands of pixels in that plot is the result of a\r\n% simulation.  So, to make that plot you need to run thousands of\r\n% simulations.\r\n\r\n%% Forced Damped Pendulum Model\r\n% Here is my version of that differential equation in Simulink.\r\n\r\nopen_system('ForcedDampedPendulum')\r\n\r\n%%\r\n% To see how $\\theta$ evolves over time, here is a plot of 100 seconds of\r\n% simulation.  (Notice, I'm using the\r\n% <https:\/\/www.mathworks.com\/help\/releases\/R2010b\/toolbox\/simulink\/rn\/br28su4.html single\r\n% output SIM command>.)\r\n\r\nsimOut = sim('ForcedDampedPendulum','StopTime','100');\r\ny = simOut.get('yout');\r\nt = simOut.get('tout');\r\nplot(t,y)\r\nhold on\r\n\r\n%% Nested |FOR| loops\r\n% Using nested |for| loops, we can sweep over a range of values for\r\n% $\\theta$ and $\\dot{\\theta}$.  If we add that to the plot above, you can\r\n% visualize the attractors at $\\pm\\infty$. I can use a grayscale |colorbar|\r\n% to indicate the final value of $\\theta$ at the end of the simulation to\r\n% make a basin of attraction image.\r\n\r\nn = 5;\r\nthetaRange = linspace(-pi,pi,n);\r\nthetadotRange = linspace(-5,5,n);\r\ntic;\r\nfor i = 1:n\r\n    for j = 1:n\r\n        thetaX0 = thetaRange(i);\r\n        thetadotX0 = thetadotRange(j);\r\n        simOut = sim('ForcedDampedPendulum','StopTime','100');\r\n        y = simOut.get('yout');\r\n        t = simOut.get('tout');\r\n        plot(t,y)\r\n    end\r\nend\r\nt1 = toc;\r\ncolormap gray\r\ncolorbar\r\n\r\n%%\r\n% I have added |TIC|\/|TOC| timing to the code to measure the time spent\r\n% running these simulations, and from this we can figure out the time\r\n% required per simulation.\r\nt1PerSim = t1\/n^2;\r\ndisp(['Time per simulation with nested FOR loops = ' num2str(t1PerSim)])\r\n\r\n%% \r\n% During the loops, I iterate over each combination of values only once.\r\n% This represents $n^2$ simulations that need to be run.  Because the\r\n% simulations are all independent of each other, this is a perfect\r\n% candidate for distribution to parallel MATLABs.\r\n\r\n%% Converting to a PARFOR loop\r\n% |PARFOR| loops can not be nested, so the iteration across the initial\r\n% conditions needs to be rewritten as a single loop.  For this problem I\r\n% can re-imagine the output image as a grid of pixels.  If I create a\r\n% corresponding |thetaX0| and |thetadotX0| matrix, I will be able to loop\r\n% over the elements in a PARFOR loop.  An easy way to do this is with\r\n% |MESHGRID|.\r\n%%\r\n% Another important change to the loop is using the |ASSIGNIN| function\r\n% to set the initial values of the states.  Within |PARFOR|, the |'base'|\r\n% workspace refers to the MATLAB workers, where the simulation is run.\r\n\r\nn = 20;\r\n[thetaMat,thetadotMat] = meshgrid(linspace(-pi,pi,n),linspace(-5,5,n));\r\nmap = zeros(n,n);\r\ntic;\r\nparfor i = 1:n^2\r\n    assignin('base','thetaX0',thetaMat(i));\r\n    assignin('base','thetadotX0',thetadotMat(i));\r\n    simOut = sim('ForcedDampedPendulum','StopTime','100');\r\n    y = simOut.get('yout');\r\n    map(i)=y(end);\r\nend\r\nt2 = toc;\r\n\r\n%%\r\n% I can use a scaled image to display the basin of attraction. (Of course,\r\n% this is verly low resolution.)\r\nclf\r\nimagesc([-pi,pi],[-5,5],map)\r\ncolormap(gray), axis xy square\r\n%%\r\n% The |PARFOR| loop acts as a normal |for|-loop when there are no MATLAB\r\n% workers.  We can see this from the timing analysis.\r\n\r\nt2PerSim = t2\/n^2;\r\ndisp(['Time per simulation with PARFOR loop = ' num2str(t2PerSim)])\r\n\r\n\r\n%% Local MATLAB Workers\r\n% Most computers today have multiple processors, or multi-core\r\n% architectures.  To take advantage of all the cores on this computer, I\r\n% can use the |MATLABPOOL| command to start a local MATLAB pool of workers.\r\n\r\nmatlabpool open local\r\nnWorkers3 = matlabpool('size');\r\n%%\r\n% In order to remove any first time effects from my timing analysis, I can\r\n% run commands using |pctRunOnAll| to load the model and simulate it.\r\npctRunOnAll('load_system(''ForcedDampedPendulum'')') % warm-up\r\npctRunOnAll('sim(''ForcedDampedPendulum'')')         % warm-up\r\n\r\n%%\r\n% Now, let's measure the |PARFOR| loop again, using the full power of all\r\n% the cores on this computer.\r\nn = 20;\r\n[thetaMat,thetadotMat] = meshgrid(linspace(-pi,pi,n),linspace(-5,5,n));\r\nmap = zeros(n,n);\r\ntic;\r\nparfor i = 1:n^2\r\n    assignin('base','thetaX0',thetaMat(i));\r\n    assignin('base','thetadotX0',thetadotMat(i));\r\n    simOut = sim('ForcedDampedPendulum','StopTime','100');\r\n    y = simOut.get('yout');\r\n    map(i)=y(end);\r\nend\r\nt3=toc;\r\n\r\nt3PerSim = t3\/n^2;\r\ndisp(['Time per simulation with PARFOR and ' num2str(nWorkers3) ...\r\n    ' workers = ' num2str(t3PerSim)])\r\ndisp(['Speed up factor = ' num2str(t1PerSim\/t3PerSim)])\r\n\r\n%%\r\n% Now that I am done with the local MATLAB Workers, I can issue a command\r\n% to close them.\r\n\r\nmatlabpool close\r\n\r\n%% Connecting to a Cluster\r\n% I am fortunate enough to have access to a small distributed computing\r\n% cluster, configured with MATLAB Distributed Computing Server.  The\r\n% cluster consists of four, quad-core computers with 6 GB of memory.  It is\r\n% configured to start one MATLAB worker for each core, giving a 16 node\r\n% cluster.  I can repeat my timing experiments here.\r\n\r\nmatlabpool open Cluster16\r\nnWorkers4 = matlabpool('size');\r\npctRunOnAll('load_system(''ForcedDampedPendulum'')') % warm-up\r\npctRunOnAll('sim(''ForcedDampedPendulum'')')         % warm-up\r\n%%\r\nn = 20;\r\n[thetaMat,thetadotMat] = meshgrid(linspace(-pi,pi,n),linspace(-5,5,n));\r\nmap = zeros(n,n);\r\ntic;\r\nparfor i = 1:n^2\r\n    assignin('base','thetaX0',thetaMat(i));\r\n    assignin('base','thetadotX0',thetadotMat(i));\r\n    simOut = sim('ForcedDampedPendulum','StopTime','100');\r\n    y = simOut.get('yout');\r\n    map(i)=y(end);\r\nend\r\nt4 = toc;\r\n\r\nt4PerSim = t4\/n^2;\r\ndisp(['Time per simulation with PARFOR and ' num2str(nWorkers4) ...\r\n    ' workers = ' num2str(t4PerSim)])\r\ndisp(['Speed up factor = ' num2str(t1PerSim\/t4PerSim)])\r\n%%\r\nmatlabpool close\r\n\r\n%% Timing Summary\r\n% Here is a summary of the timing measurements:\r\ndisp(['Time per simulation with nested FOR loops = ' num2str(t1PerSim)])\r\ndisp(['Time per simulation with PARFOR loop = ' num2str(t2PerSim)])\r\ndisp(['Time per simulation with PARFOR and ' num2str(nWorkers3) ...\r\n    ' workers = ' num2str(t3PerSim)])\r\ndisp(['Speed up factor for ' num2str(nWorkers3) ...\r\n    ' local workers = ' num2str(t1PerSim\/t3PerSim)])\r\ndisp(['Time per simulation with PARFOR and ' num2str(nWorkers4) ...\r\n    ' workers = ' num2str(t3PerSim)])\r\ndisp(['Speed up factor for ' num2str(nWorkers4) ...\r\n    ' workers on the cluster = ' num2str(t1PerSim\/t4PerSim)])\r\n\r\n%% A Quarter Million Simulations\r\n% Many months ago I tinkered with this problem, using a high performance\r\n% desktop machine with 4 local workers, and generated the following graph.\r\n% This represents 250,000 simulations.\r\n\r\n%%\r\n% \r\n% <<basin500_1000.jpg>>\r\n% \r\n\r\n%% Performance Increase\r\n% There are many factors that contribute to the performance of parallel\r\n% code execution.  Breaking up the loop and distributing it across workers\r\n% represents some amount of overhead.  For some, smaller problems with a\r\n% small number of workers, this overhead can swamp the problem.  Another\r\n% major factor to consider is data transfer and network speed; transferring\r\n% large amounts of data adds overhead that doesn't normally exist in\r\n% MATLAB.\r\n\r\n%% Now it's your turn\r\n% Do you have a problem that could benefit from a |PARFOR| loop and a\r\n% cluster of MATLAB Workers?  Leave a\r\n% <https:\/\/blogs.mathworks.com\/seth\/?p=105&#comment comment here> and tell\r\n% us all about it.\r\n##### SOURCE END ##### 5022bc0f483f42adb3d890e7a68befde\r\n-->","protected":false},"excerpt":{"rendered":"<p>\r\n\r\nUpdate: In MATLAB R2017a the function PARSIM got introduced. For a better experience simulating models in parallel, we recommend using PARSIM instead of SIM inside parfor. See the more recent... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/simulink\/2010\/10\/17\/parallel-computing-with-simulink-running-thousands-of-simulations\/\">read more >><\/a><\/p>","protected":false},"author":40,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[29,143,10],"tags":[148,161],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/simulink\/wp-json\/wp\/v2\/posts\/105"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/simulink\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/simulink\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/simulink\/wp-json\/wp\/v2\/users\/40"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/simulink\/wp-json\/wp\/v2\/comments?post=105"}],"version-history":[{"count":3,"href":"https:\/\/blogs.mathworks.com\/simulink\/wp-json\/wp\/v2\/posts\/105\/revisions"}],"predecessor-version":[{"id":6968,"href":"https:\/\/blogs.mathworks.com\/simulink\/wp-json\/wp\/v2\/posts\/105\/revisions\/6968"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/simulink\/wp-json\/wp\/v2\/media?parent=105"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/simulink\/wp-json\/wp\/v2\/categories?post=105"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/simulink\/wp-json\/wp\/v2\/tags?post=105"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}