{"id":5121,"date":"2014-02-14T09:00:18","date_gmt":"2014-02-14T14:00:18","guid":{"rendered":"https:\/\/blogs.mathworks.com\/pick\/?p=5121"},"modified":"2014-02-14T01:00:11","modified_gmt":"2014-02-14T06:00:11","slug":"create-persistent-resources-on-parallel-workers","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/pick\/2014\/02\/14\/create-persistent-resources-on-parallel-workers\/","title":{"rendered":"Create Persistent Resources on Parallel Workers"},"content":{"rendered":"<div xmlns:mwsh=\"https:\/\/www.mathworks.com\/namespace\/mcode\/v1\/syntaxhighlight.dtd\" class=\"content\">\r\n   <introduction>\r\n      <p><a href=\"https:\/\/www.mathworks.com\/matlabcentral\/answers\/contributors\/3208495\">Sean<\/a>'s pick this week is <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/31972-worker-object-wrapper\">WorkerObjWrapper<\/a> by MathWorks' <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/authors\/410495\">Parallel Computing Team<\/a>.\r\n      <\/p>\r\n   <\/introduction>\r\n   <h3>Background<a name=\"1\"><\/a><\/h3>\r\n   <p>MATLAB's <a href=\"https:\/\/www.mathworks.com\/solutions\/parallel-computing.html\">Parallel Computing Toolbox<\/a> provides you with the the ability to open a pool of MATLAB workers that you can distribute work to with high level commands\r\n      like <a href=\"\"><tt>parfor<\/tt><\/a>.\r\n   <\/p>\r\n   <p>When communicating with the workers in this pool, there will always be an overhead in data communication.  The less data we\r\n      can transmit to the workers the better speed improvements we'll see.  This can be difficult when working with large arrays\r\n      and can actually cause parallel computations to be slower than serial ones.  <i>WorkerObjWrapper<\/i> has provided a convenient way to make data persist on a worker; this could be large arrays, connections to databases or other\r\n      things that we need on each iteration of a <tt>parfor<\/tt> loop.\r\n   <\/p>\r\n   <h3>Let's See it In Action<a name=\"2\"><\/a><\/h3>\r\n   <p>We're going to pull some financial data from <a href=\"\">Yahoo!<\/a> using the connection from the <a href=\"https:\/\/www.mathworks.com\/products\/datafeed\/\">Datafeed Toolbox<\/a>.\r\n   <\/p>\r\n   <p>I have a list of securities and the corresponding fields I want from them:<\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\"><span style=\"color: #228B22\">% Securities and desired fields<\/span>\r\nsecurities = {<span style=\"color: #A020F0\">'MAR'<\/span>,<span style=\"color: #A020F0\">'PG'<\/span>,<span style=\"color: #A020F0\">'MSFT'<\/span>,<span style=\"color: #A020F0\">'SAM'<\/span>,<span style=\"color: #0000FF\">...<\/span>\r\n              <span style=\"color: #A020F0\">'TSLA'<\/span>,<span style=\"color: #A020F0\">'YHOO'<\/span>,<span style=\"color: #A020F0\">'CMG'<\/span>,<span style=\"color: #A020F0\">'AAL'<\/span>};\r\n\r\nfields = {<span style=\"color: #A020F0\">'High'<\/span>,{<span style=\"color: #A020F0\">'low'<\/span>,<span style=\"color: #A020F0\">'High'<\/span>},<span style=\"color: #A020F0\">'High'<\/span>,<span style=\"color: #A020F0\">'High'<\/span>,<span style=\"color: #0000FF\">...<\/span>\r\n          {<span style=\"color: #A020F0\">'Low'<\/span>,<span style=\"color: #A020F0\">'high'<\/span>},<span style=\"color: #A020F0\">'Low'<\/span>,{<span style=\"color: #A020F0\">'low'<\/span>,<span style=\"color: #A020F0\">'Volume'<\/span>},<span style=\"color: #A020F0\">'Low'<\/span>};<\/pre><p>I first want to make sure there is an open parallel pool ( <a href=\"\">parpool<\/a> ) to distribute computations to.  I have a two core laptop, so I'll open two local workers by selecting the icon at the bottom\r\n      left hand side of the desktop.\r\n   <\/p>\r\n   <p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/pick\/Sean\/mainwow\/gcp.PNG\"> <\/p>\r\n   <p>I've written three equivalent functions to pull the prices from Yahoo!<\/p>\r\n   <div>\r\n      <ul>\r\n         <li><tt>fetchFOR<\/tt> - uses a regular for-loop to fetch the prices\r\n         <\/li>\r\n         <li><tt>fetchPARFOR<\/tt> - uses a parallel for-loop\r\n         <\/li>\r\n         <li><tt>fetchWOWPARFOR<\/tt> - uses a parallel for-loop and WorkerObjWrapper to make the connection on all workers.\r\n         <\/li>\r\n      <\/ul>\r\n   <\/div>\r\n   <p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/pick\/Sean\/mainwow\/functions.PNG\"> <\/p>\r\n   <p>First, a sanity check to make sure they all do the same thing:<\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">ff = fetchFOR(securities,fields);\r\nfp = fetchPARFOR(securities,fields);\r\nfw = fetchWOWPARFOR(securities,fields);\r\nassert(isequal(ff,fp,fw)); <span style=\"color: #228B22\">% Errors if they're not equal<\/span><\/pre><p>Since the assertion passed, meaning the functions return the same result, we can now do the timings.  I'll use <a href=\"\"><tt>timeit<\/tt><\/a>.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">t = zeros(3,1);\r\n\r\n<span style=\"color: #228B22\">% Measure timings<\/span>\r\nt(1) = timeit(@()fetchFOR(securities,fields),1);\r\nt(2) = timeit(@()fetchPARFOR(securities,fields),1);\r\nt(3) = timeit(@()fetchWOWPARFOR(securities,fields),1);\r\n\r\n<span style=\"color: #228B22\">% Show results<\/span>\r\nfprintf(<span style=\"color: #A020F0\">'%.3fs %s\\n'<\/span>,t(1),<span style=\"color: #A020F0\">'for'<\/span>,t(2),<span style=\"color: #A020F0\">'parfor'<\/span>,t(3),<span style=\"color: #A020F0\">'parfor with WorkerObjWrapper'<\/span>)<\/pre><pre style=\"font-style:oblique\">8.631s for\r\n5.991s parfor\r\n4.255s parfor with WorkerObjWrapper\r\n<\/pre><p>So we can see that creating the connection once on each worker in the parallel pool and then using <tt>parfor<\/tt> gives us the best computation time.\r\n   <\/p>\r\n   <h3>Comments<a name=\"6\"><\/a><\/h3>\r\n   <p>Do you have to work with large data or repeat a process multiple times where parallel computing might help?  I'm curious to\r\n      hear your experiences and the challenges that you've faced.\r\n   <\/p>\r\n   <p>Give it a try and let us know what you think <a href=\"https:\/\/blogs.mathworks.com\/pick\/?p=5121#respond\">here<\/a> or leave a <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/31972-worker-object-wrapper#comments\">comment<\/a> for our Parallel Computing Team.\r\n   <\/p><script language=\"JavaScript\">\r\n<!--\r\n\r\n    function grabCode_8a0dab43263e418497e23faa4d894301() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='8a0dab43263e418497e23faa4d894301 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 8a0dab43263e418497e23faa4d894301';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        author = 'Sean de Wolski';\r\n        copyright = 'Copyright 2014 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add author and copyright lines at the bottom if specified.\r\n        if ((author.length > 0) || (copyright.length > 0)) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (author.length > 0) {\r\n                d.writeln('% _' + author + '_');\r\n            }\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n      \r\n      d.title = title + ' (MATLAB code)';\r\n      d.close();\r\n      }   \r\n      \r\n-->\r\n<\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_8a0dab43263e418497e23faa4d894301()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n            the MATLAB code \r\n            <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2013b<br><\/p>\r\n<\/div>\r\n<!--\r\n8a0dab43263e418497e23faa4d894301 ##### SOURCE BEGIN #####\r\n%% Worker Object Wrapper\r\n%\r\n% <https:\/\/www.mathworks.com\/matlabcentral\/answers\/contributors\/3208495 Sean>'s pick this week is\r\n% <https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/31972-worker-object-wrapper\r\n% WorkerObjWrapper> by MathWorks' \r\n% <https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/authors\/410495 Parallel Computing Team>.\r\n% \r\n\r\n%% Background\r\n% MATLAB's <https:\/\/www.mathworks.com\/solutions\/parallel-computing.html Parallel Computing\r\n% Toolbox> provides you with the the ability to open a pool MATLAB workers\r\n% that you can distribute work to with high level commands like\r\n% <\r\n% |parfor|>.\r\n%\r\n% When communicating with the workers in this pool, there will always be an\r\n% overhead in data communication.  The less data we can transmit to the\r\n% workers the better speed improvements we'll see.  This can be difficult\r\n% when working with large arrays and can actually cause parallel\r\n% computations to be slower than serial ones.  _WorkerObjWrapper_ has\r\n% provided a convenient way to make data persist on a worker; this could be\r\n% large arrays, connections to databases or other things that we need on\r\n% each iteration of a |parfor| loop.\r\n%\r\n\r\n%% Let's See it In Action\r\n% We're going to pull some financial data from < Yahoo!> using the\r\n% connection from the <https:\/\/www.mathworks.com\/products\/datafeed\/ Datafeed\r\n% Toolbox>.  \r\n%\r\n% I have a list of securities and the corresponding fields I want from\r\n% them:\r\n\r\n% Securities and desired fields\r\nsecurities = {'MAR','PG','MSFT','SAM',...\r\n              'TSLA','YHOO','CMG','AAL'};\r\n          \r\nfields = {'High',{'low','High'},'High','High',...\r\n          {'Low','high'},'Low',{'low','Volume'},'Low'};\r\n\r\n%% \r\n% I first want to make sure there is an open parallel pool (\r\n% <\r\n% parpool> ) to distribute computations to.  I have a two core laptop, so\r\n% I'll open two local workers by selecting the icon at the bottom left hand\r\n% side of the desktop.\r\n% \r\n% <<gcp.PNG>>\r\n% \r\n% I've written three equivalent functions to pull the prices from Yahoo!\r\n% \r\n% * |fetchFOR| - uses a regular for-loop to fetch the prices\r\n% * |fetchPARFOR| - uses a parallel for-loop\r\n% * |fetchWOWPARFOR| - uses a parallel for-loop and WorkerObjWrapper to make the connection on all\r\n% workers.\r\n%\r\n% <<functions.PNG>>\r\n%\r\n% First, a sanity check to make sure they all do the same thing:\r\n%\r\n\r\nff = fetchFOR(securities,fields);\r\nfp = fetchPARFOR(securities,fields);\r\nfw = fetchWOWPARFOR(securities,fields);\r\nassert(isequal(ff,fp,fw)); % Errors if they're not equal\r\n\r\n%%\r\n% Since the assertion passed, meaning the functions return the same\r\n% result, we can now do the timings.  I'll use \r\n% <\r\n% |timeit|>.\r\n% \r\n\r\nt = zeros(3,1);          \r\n\r\n% Measure timings\r\nt(1) = timeit(@()fetchFOR(securities,fields),1);\r\nt(2) = timeit(@()fetchPARFOR(securities,fields),1);\r\nt(3) = timeit(@()fetchWOWPARFOR(securities,fields),1);\r\n\r\n% Show results\r\nfprintf('%.3fs %s\\n',t(1),'for',t(2),'parfor',t(3),'parfor with WorkerObjWrapper')\r\n\r\n%%\r\n% So we can see that creating the connection once on each worker in the\r\n% parallel pool and then using |parfor| gives us the best computation time.\r\n%\r\n\r\n\r\n%% Comments\r\n% \r\n% Do you have to work with large data or repeat a process multiple times\r\n% where parallel computing might help?  I'm curious to hear your\r\n% experiences and the challenges that you've faced.\r\n%\r\n% Give it a try and let us know what you think\r\n% <https:\/\/blogs.mathworks.com\/pick\/?p=5121#respond here> or leave a\r\n% <https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/31972-worker-object-wrapper#comments\r\n% comment> for our Parallel Computing Team.\r\n%\r\n \r\n\r\n##### SOURCE END ##### 8a0dab43263e418497e23faa4d894301\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/images\/pick\/Sean\/mainwow\/gcp.PNG\" onError=\"this.style.display ='none';\" \/><\/div><p>\r\n   \r\n      Sean's pick this week is WorkerObjWrapper by MathWorks' Parallel Computing Team.\r\n      \r\n   \r\n   Background\r\n   MATLAB's Parallel Computing Toolbox provides you with the the ability to... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/pick\/2014\/02\/14\/create-persistent-resources-on-parallel-workers\/\">read more >><\/a><\/p>","protected":false},"author":87,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[7,16],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/posts\/5121"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/users\/87"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/comments?post=5121"}],"version-history":[{"count":7,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/posts\/5121\/revisions"}],"predecessor-version":[{"id":5128,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/posts\/5121\/revisions\/5128"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/media?parent=5121"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/categories?post=5121"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/tags?post=5121"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}