Stuart’s MATLAB Videos

Watch and Learn

Using parfor to Make Many Web Requests

My colleague asked me to access all the pages on a web server in order to populate it’s cache. I plan to use parfor to get through the more than 250k pages in a timely manner. I will also need to not go too fast and overload the server.

I’ve used parfor before for web page access and found that it is one of the rare situations where you can use more MATLAB workers than available physical or even logical processors, which is not normally recommended. It works because web requests require a lot of waiting, and often the processing I need to do in MATLAB is little.

I’ll admit this video gets a little boring near the end as I try out differed numbers of workers. Remember that you can increase the playback speed of the video in the lower right corner of the player.

Features covered in this code-along style video include:

  • parfor

Play the video in full screen mode for a better viewing experience. Final code is here:

%% Make Requests to Set of URLs

% Assumes a spreasdsheet with a "urls" variable/column
pagesFileName="FILEPATH\all-aem-pages.xlsx";
environments=["dev2" "dev3"];
environment=environments(2);
options=weboptions('Timeout',60);
totalStartTime=clock;
%% Get List of Pages
% Re-use table if already in base workspace
if ~exist('pages','var')
pages=readtable(pagesFileName,'TextType','string');
end
%% Create list of URLs
% Convert environment
urls=replace(pages.urls,".mathworks","-" + environment + ".mathworks");
%% Start Workers
% 12 for dev server
startPool(12);
%% Make Requests
success=false(height(pages),1);
parfor k=1:height(pages)
startTime=[]; % Initialize for parfor
url=urls(k);
try
startTime=clock;
content=webread(url,options);
success(k)=true;
fprintf('Succeeded accessing (%d of %d): %s(%2.1f sec).\n',k,height(pages),url,etime(clock,startTime));
catch
fprintf('Failed accessing (%d of %d): %s(%2.1f sec).\n',k,height(pages),url,etime(clock,startTime));
end
end

%% Finish
fprintf('Finished %s\n',myETimeStr(totalStartTime))

%% Local functions
function p=startPool(numWorkers)
p = gcp('nocreate');
if isempty(p)
p=parpool(numWorkers);
elseif p.NumWorkers~=numWorkers
delete(p);
p=parpool(numWorkers);
end
end

function y=myETimeStr(startTime)
% Return a string (mm:ss) from an elapsed time in seconds.

y= char(duration(0,0,etime(clock,startTime),'Format','mm:ss'));

end

  
|
  • print

Comments

To leave a comment, please click here to sign in to your MathWorks Account or create a new one.