File Exchange Pick of the Week

Our best user submissions

Create Persistent Resources on Parallel Workers

Sean's pick this week is WorkerObjWrapper by MathWorks' Parallel Computing Team.

Background

MATLAB's Parallel Computing Toolbox provides you with the the ability to open a pool of MATLAB workers that you can distribute work to with high level commands like parfor.

When communicating with the workers in this pool, there will always be an overhead in data communication. The less data we can transmit to the workers the better speed improvements we'll see. This can be difficult when working with large arrays and can actually cause parallel computations to be slower than serial ones. WorkerObjWrapper has provided a convenient way to make data persist on a worker; this could be large arrays, connections to databases or other things that we need on each iteration of a parfor loop.

Let's See it In Action

We're going to pull some financial data from Yahoo! using the connection from the Datafeed Toolbox.

I have a list of securities and the corresponding fields I want from them:

% Securities and desired fields
securities = {'MAR','PG','MSFT','SAM',...
              'TSLA','YHOO','CMG','AAL'};

fields = {'High',{'low','High'},'High','High',...
          {'Low','high'},'Low',{'low','Volume'},'Low'};

I first want to make sure there is an open parallel pool ( parpool ) to distribute computations to. I have a two core laptop, so I'll open two local workers by selecting the icon at the bottom left hand side of the desktop.

I've written three equivalent functions to pull the prices from Yahoo!

  • fetchFOR - uses a regular for-loop to fetch the prices
  • fetchPARFOR - uses a parallel for-loop
  • fetchWOWPARFOR - uses a parallel for-loop and WorkerObjWrapper to make the connection on all workers.

First, a sanity check to make sure they all do the same thing:

ff = fetchFOR(securities,fields);
fp = fetchPARFOR(securities,fields);
fw = fetchWOWPARFOR(securities,fields);
assert(isequal(ff,fp,fw)); % Errors if they're not equal

Since the assertion passed, meaning the functions return the same result, we can now do the timings. I'll use timeit.

t = zeros(3,1);

% Measure timings
t(1) = timeit(@()fetchFOR(securities,fields),1);
t(2) = timeit(@()fetchPARFOR(securities,fields),1);
t(3) = timeit(@()fetchWOWPARFOR(securities,fields),1);

% Show results
fprintf('%.3fs %s\n',t(1),'for',t(2),'parfor',t(3),'parfor with WorkerObjWrapper')
8.631s for
5.991s parfor
4.255s parfor with WorkerObjWrapper

So we can see that creating the connection once on each worker in the parallel pool and then using parfor gives us the best computation time.

Comments

Do you have to work with large data or repeat a process multiple times where parallel computing might help? I'm curious to hear your experiences and the challenges that you've faced.

Give it a try and let us know what you think here or leave a comment for our Parallel Computing Team.




Published with MATLAB® R2013b

|
  • print

评论

要发表评论,请点击 此处 登录到您的 MathWorks 帐户或创建一个新帐户。