Stuart’s MATLAB Videos

Watch and Learn

Using parfor to Make Many Web Requests

My colleague asked me to access all the pages on a web server in order to populate it’s cache. I plan to use parfor to get through the more than 250k pages in a timely manner. I will also need to not go too fast and overload the server.

I’ve used parfor before for web page access and found that it is one of the rare situations where you can use more MATLAB workers than available physical or even logical processors, which is not normally recommended. It works because web requests require a lot of waiting, and often the processing I need to do in MATLAB is little.

I’ll admit this video gets a little boring near the end as I try out differed numbers of workers. Remember that you can increase the playback speed of the video in the lower right corner of the player.

Features covered in this code-along style video include:

  • parfor

Video Player is loading.
Current Time 0:00
Duration 47:27
Loaded: 0.35%
Stream Type LIVE
Remaining Time 47:27
 
1x
  • Chapters
  • descriptions off, selected
  • captions off, selected
  • en (Main), selected

    Play the video in full screen mode for a better viewing experience. Final code is here:

    %% Make Requests to Set of URLs

    % Assumes a spreasdsheet with a "urls" variable/column
    pagesFileName="FILEPATH\all-aem-pages.xlsx";
    environments=["dev2" "dev3"];
    environment=environments(2);
    options=weboptions('Timeout',60);
    totalStartTime=clock;
    %% Get List of Pages
    % Re-use table if already in base workspace
    if ~exist('pages','var')
    pages=readtable(pagesFileName,'TextType','string');
    end
    %% Create list of URLs
    % Convert environment
    urls=replace(pages.urls,".mathworks","-" + environment + ".mathworks");
    %% Start Workers
    % 12 for dev server
    startPool(12);
    %% Make Requests
    success=false(height(pages),1);
    parfor k=1:height(pages)
    startTime=[]; % Initialize for parfor
    url=urls(k);
    try
    startTime=clock;
    content=webread(url,options);
    success(k)=true;
    fprintf('Succeeded accessing (%d of %d): %s(%2.1f sec).\n',k,height(pages),url,etime(clock,startTime));
    catch
    fprintf('Failed accessing (%d of %d): %s(%2.1f sec).\n',k,height(pages),url,etime(clock,startTime));
    end
    end

    %% Finish
    fprintf('Finished %s\n',myETimeStr(totalStartTime))

    %% Local functions
    function p=startPool(numWorkers)
    p = gcp('nocreate');
    if isempty(p)
    p=parpool(numWorkers);
    elseif p.NumWorkers~=numWorkers
    delete(p);
    p=parpool(numWorkers);
    end
    end

    function y=myETimeStr(startTime)
    % Return a string (mm:ss) from an elapsed time in seconds.

    y= char(duration(0,0,etime(clock,startTime),'Format','mm:ss'));

    end

      
    |
    • print

    Comments

    To leave a comment, please click here to sign in to your MathWorks Account or create a new one.

    Loading...
    Go to top of page