Loren on the Art of MATLAB

Turn ideas into MATLAB

Getting Data from a Web API in Parallel

Posted by Loren Shure,

I'd like to introduce this week's guest blogger Edric Ellis. Edric works for the Parallel Computing development team here at The MathWorks. In this post he will talk about using the parfeval command in Parallel Computing Toolbox.

In release R2013b, we introduced a new function parfeval to Parallel Computing Toolbox which allows you to run MATLAB® functions on workers in a parallel pool without blocking your desktop MATLAB. This allows you to create parallel programs that are more loosely structured than those using either parfor or spmd. You can also have your desktop MATLAB process the results as they become available, allowing you to create more interactive parallel programs.

We're going to take a look at using parfeval to speed up accessing a web service by making multiple requests in parallel, and then displaying each result as soon as it is available. We're going to query the Flickr image sharing site for astronomical images that have structured data associated with them. We'll approach the problem in three phases:

  1. Query the web service for relevant images
  2. Use parfeval to submit work to the parallel workers to read the images
  3. Use fetchNext to retrieve results and then display them.

Contents

Query the web service for relevant images

Flickr provides a web-based API that allows you to query its large database of images. To use the API, you submit a structured query to their servers, and an XML document is returned to you. Retrieving the resulting image data from Flickr takes some time, and this can run more quickly by performing the retrievals in parallel. This is an example of an "I/O-bound" operation.

The Flickr web API is fully documented here. To use the API, you need a Flickr account, and you need to request an 'API Key' which is included in each request to the API. Once you've got your own key, you can place it in a function called flickrApiKey which simply returns the key as a string.

appKey = flickrApiKey;

Build the search URL and interpret the XML response

We're going to search for images which have a specific tag: astrometrydotnet:status=solved. These images are submitted to a Flickr group, and then astrometry.net analyses and annotates the images. We build a search url, and then let xmlread read the results from that URL.

url = ['http://api.flickr.com/services/rest/', ...       % API Base URL
       '?method=flickr.photos.search', ...               % API Method
       '&api_key=', appKey, ...                          % Our API key
       '&tags=astrometrydotnet%3Astatus%3Dsolved', ...   % Tag to search
       '&license=4', ...                                 % Creative Commons
       '&sort=interestingness-desc', ...                 % Sort order
       '&per_page=12', ...                               % Limit results to 12
       '&media=photos'];                                 % Only return photos
response = xmlread(url);

% Get all the 'photo' elements from the document.
photos    = response.getDocumentElement.getElementsByTagName('photo');
numPhotos = getLength(photos);

% Ensure our search query return some results.
if numPhotos == 0
    error('Failed to retrieve photos from Flickr. XML response was: %s', ...
          xmlwrite(response));
end

% Otherwise, convert the search results into a cell array of structures
% containing the search result information.
photoInfo = cell(1, numPhotos);
for idx = 1:numPhotos
    node = photos.item(idx-1);
    photoInfo{idx}  = struct('farm',   char(node.getAttribute('farm')), ...
                             'server', char(node.getAttribute('server')), ...
                             'id',     char(node.getAttribute('id')), ...
                             'secret', char(node.getAttribute('secret')), ...
                             'owner',  char(node.getAttribute('owner')));
end

Define a function to retrieve and process the photos

For each search result, we need to run a function to retrieve the image data. The function also searches through the Flickr comments for the structured information from astrometry.net, and returns the direction the telescope was pointing. In the next stage, we'll submit multiple invocations of this function using parfeval.

function [photo, location, username] = getAstrometryPhotoFromFlickr(appKey, info)
% First, build up the photo URL as per the Flickr documentation
% here: <http://www.flickr.com/services/api/misc.urls.html>
url = sprintf('http://farm%s.staticflickr.com/%s/%s_%s.jpg', ...
              info.farm, info.server, info.id, info.secret);
try
    photo = imread(url);
catch E
    % Sometimes the read fails. Try one more time.
    photo = imread(url);
end
%
% Get the photo info to extract the username
url = ['http://api.flickr.com/services/rest/', ...
       '?method=flickr.photos.getInfo', ...
       '&api_key=', appKey, '&photo_id=', info.id];
response = xmlread(url);
owner = response.getDocumentElement.getElementsByTagName('owner');
if getLength(owner) == 1
    username = char(owner.item(0).getAttribute('username'));
else
    username = 'Unknown';
end
%
% Next, look through the comments for the photo to extract
% the annotation from astrometry.net.
url = ['http://api.flickr.com/services/rest/', ...
       '?method=flickr.photos.comments.getList', ...
       '&api_key=', appKey, '&photo_id=', info.id];
response = xmlread(url);
%
% Get all the actual comment elements from the XML response.
comments = response.getDocumentElement.getElementsByTagName('comment');
%
% Loop over all the comments, looking for the first by astrometry.net
% which contains information about the photo.
for idx = 0:(getLength(comments)-1)
    comment = comments.item(idx);
    if strcmp(char(comment.getAttribute('authorname')), 'astrometry.net')
        % We've found the comment, extract the comment text.
        commentText = char(comment.getTextContent());
        % Pick out the location information into a row vector by
        % using a regular expression.
        locationText = regexprep(...
            commentText, '.*center: *\(([-0-9,. ]+)\) *degrees.*', '$1');
        location     = sscanf(locationText, '%g,', [1, 2]);
        return
    end
end
%
% We didn't find the astrometry.net comment, so location is unknown.
location = [NaN, NaN];
end

Use parfeval to submit work to the parallel workers to read the images

For each search result, we will make a parfeval request for that particular result. This will cause one of the workers in our parallel pool to invoke that function. If a parallel pool is not already open, then one will be created automatically. Each parfeval call returns a parallel.Future instance. The parfeval requests are executed in order by the parallel pool workers. The syntax of parfeval is rather similar to that of the MATLAB function feval, except that because the evaluation doesn't take place immediately, you must specify how many output arguments you want to request. In this case, we want 3 outputs from getAstrometryPhotoFromFlickr.

for idx = 1:numPhotos
    futures(idx) = parfeval(@getAstrometryPhotoFromFlickr, 3, ...
                            appKey, photoInfo{idx});
end
Starting parallel pool (parpool) using the 'local' profile ... connected to 6 workers.

Prepare a figure for displaying the results

While the workers are busy retrieving images from Flickr, we can set up a figure to display those images.

% Create a new figure, initially not visible.
figHandle = figure('Position', [200, 200, 600, 800], ...
                   'Units', 'pixels', 'Visible', 'off');

% Use FEX submission 'tight_subplot' to set up the axes
% https://www.mathworks.com/matlabcentral/fileexchange/27991-tight-subplot-nh--nw--gap--marg-h--marg-w-
axesHandles = tight_subplot(4, 3, [0.06, 0.01], [0.01, 0.06], 0.01);
axis(axesHandles, 'off');

Use fetchNext to retrieve results and then display them

We now enter a loop where we wait for results to become available on the workers. We use fetchNext to achieve this. We pass fetchNext our array futures, and it returns when another element has completed and a new result is available. We can also specify a maximum time to wait for each new result, so that if the web service takes a really long time to run, we can abort execution gracefully.

Note that the results from fetchNext can arrive in any order, and we use the first output argument originalIdx to work out which element of futures the result corresponds to.

We're going to specify an upper bound on how long we're going to wait for all the futures to complete. If we exceed that time limit and there are still futures running, we can cancel them.

overallTimeLimit = 10; % seconds

t = tic;
set(figHandle, 'Visible', 'on');
numCompleted = 0;
while numCompleted < numPhotos
    try
        % This blocks for up to 1 second waiting for a result.
        [originalIdx, photo, location, user] = fetchNext(futures, 1);
    catch E
        % Sometimes, the URL simply cannot be read, so an
        % error is thrown by the worker. Let's display that
        % and carry on.
        warning('Failed to read an image: %s', getReport(E));
        originalIdx = [];
    end

    % If 'fetchNext' completed successfully, originalIdx
    % will be non-empty, and will contain the index into 'futures'
    % corresponding to the work that has just finished.
    if ~isempty(originalIdx)
        % Display attribution for photo and link to the original
        fprintf('Photo %2d by Flickr user: %s\n', originalIdx, user);
        info = photoInfo{originalIdx};
        fprintf('Original at: http://www.flickr.com/%s/%s/\n', info.owner, info.id);

        % Pick our axes:
        axesToUse = axesHandles(originalIdx);

        % Display the image
        image(photo, 'Parent', axesToUse);
        axis(axesToUse, 'off'); axis(axesToUse, 'equal');

        % Set the title of the axis to be the location
        title(axesToUse, num2str(location));

        numCompleted = numCompleted + 1;
    elseif toc(t) > overallTimeLimit
        % We have exceeded our time budget!
        disp('Time limit expired!');

        % Cancelling the futures stops execution of any running
        % futures, but has no effect on already-completed futures.
        cancel(futures);
        break;
    end
end
toc(t);
Photo  6 by Flickr user: s58y
Original at: http://www.flickr.com/45032885@N04/4145493112/
Photo  2 by Flickr user: davedehetre
Original at: http://www.flickr.com/22433418@N04/4950431138/
Photo  5 by Flickr user: s58y
Original at: http://www.flickr.com/45032885@N04/8541349810/
Photo  1 by Flickr user: s58y
Original at: http://www.flickr.com/45032885@N04/4553112538/
Photo  4 by Flickr user: Torben Bjørn Hansen
Original at: http://www.flickr.com/21067003@N07/6105409913/
Photo  3 by Flickr user: Harry Thomas Photography
Original at: http://www.flickr.com/84598277@N08/8882356507/
Photo  7 by Flickr user: s58y
Original at: http://www.flickr.com/45032885@N04/4144380775/
Photo  8 by Flickr user: Jody Roberts
Original at: http://www.flickr.com/8374715@N06/8719966332/
Photo  9 by Flickr user: s58y
Original at: http://www.flickr.com/45032885@N04/4145140448/
Photo 11 by Flickr user: davedehetre
Original at: http://www.flickr.com/22433418@N04/4954464378/
Photo 12 by Flickr user: s58y
Original at: http://www.flickr.com/45032885@N04/4535865643/
Photo 10 by Flickr user: cfaobam
Original at: http://www.flickr.com/33763963@N05/7060054229/
Elapsed time is 2.897368 seconds.

Conclusions

We have seen how using the parfeval command can be used to retrieve data from a web service, and display the results interactively as they become available. parfeval allows you more flexibility than parfor to:

  1. Plot results while the parallel computations proceed
  2. Gracefully break out of parallel computations

If you've ever wanted to show a waitbar or plot partial results during a parfor loop, then you might be able to adapt your code to use parfeval. If you have any further ideas, let us know here.


Get the MATLAB code

Published with MATLAB® R2013b

Note

Comments are closed.