My colleague asked me to access all the pages on a web server in order to populate it’s cache. I plan to use parfor to get through the more than 250k pages in a timely manner. I will also need to not go too fast and overload the server.
I’ve used parfor before for web page access and found that it is one of the rare situations where you can use more MATLAB workers than available physical or even logical processors, which is not normally recommended. It works because web requests require a lot of waiting, and often the processing I need to do in MATLAB is little.
I’ll admit this video gets a little boring near the end as I try out differed numbers of workers. Remember that you can increase the playback speed of the video in the lower right corner of the player.
Features covered in this code-along style video include:
Play the video in full screen mode for a better viewing experience. Final code is here:
%% Make Requests to Set of URLs
% Assumes a spreasdsheet with a "urls" variable/column
%% Get List of Pages
% Re-use table if already in base workspace
%% Create list of URLs
% Convert environment
urls=replace(pages.urls,".mathworks","-" + environment + ".mathworks");
%% Start Workers
% 12 for dev server
%% Make Requests
startTime=; % Initialize for parfor
fprintf('Succeeded accessing (%d of %d): %s(%2.1f sec).\n',k,height(pages),url,etime(clock,startTime));
fprintf('Failed accessing (%d of %d): %s(%2.1f sec).\n',k,height(pages),url,etime(clock,startTime));
%% Local functions
p = gcp('nocreate');
% Return a string (mm:ss) from an elapsed time in seconds.
To leave a comment, please click here to sign in to your MathWorks Account or create a new one.