Getting Data from a Web API in Parallel
I'd like to introduce this week's guest blogger Edric Ellis. Edric works for the Parallel Computing development team here at The MathWorks. In this post he will talk about using the parfeval command in Parallel Computing Toolbox.
In release R2013b, we introduced a new function parfeval to Parallel Computing Toolbox which allows you to run MATLAB® functions on workers in a parallel pool without blocking your desktop MATLAB. This allows you to create parallel programs that are more loosely structured than those using either parfor or spmd. You can also have your desktop MATLAB process the results as they become available, allowing you to create more interactive parallel programs.
We're going to take a look at using parfeval to speed up accessing a web service by making multiple requests in parallel, and then displaying each result as soon as it is available. We're going to query the Flickr image sharing site for astronomical images that have structured data associated with them. We'll approach the problem in three phases:
- Query the web service for relevant images
- Use parfeval to submit work to the parallel workers to read the images
- Use fetchNext to retrieve results and then display them.
Contents
- Query the web service for relevant images
- Build the search URL and interpret the XML response
- Define a function to retrieve and process the photos
- Use parfeval to submit work to the parallel workers to read the images
- Prepare a figure for displaying the results
- Use fetchNext to retrieve results and then display them
- Conclusions
Query the web service for relevant images
Flickr provides a web-based API that allows you to query its large database of images. To use the API, you submit a structured query to their servers, and an XML document is returned to you. Retrieving the resulting image data from Flickr takes some time, and this can run more quickly by performing the retrievals in parallel. This is an example of an "I/O-bound" operation.
The Flickr web API is fully documented here. To use the API, you need a Flickr account, and you need to request an 'API Key' which is included in each request to the API. Once you've got your own key, you can place it in a function called flickrApiKey which simply returns the key as a string.
appKey = flickrApiKey;
Build the search URL and interpret the XML response
We're going to search for images which have a specific tag: astrometrydotnet:status=solved. These images are submitted to a Flickr group, and then astrometry.net analyses and annotates the images. We build a search url, and then let xmlread read the results from that URL.
url = ['http://api.flickr.com/services/rest/', ... % API Base URL '?method=flickr.photos.search', ... % API Method '&api_key=', appKey, ... % Our API key '&tags=astrometrydotnet%3Astatus%3Dsolved', ... % Tag to search '&license=4', ... % Creative Commons '&sort=interestingness-desc', ... % Sort order '&per_page=12', ... % Limit results to 12 '&media=photos']; % Only return photos response = xmlread(url); % Get all the 'photo' elements from the document. photos = response.getDocumentElement.getElementsByTagName('photo'); numPhotos = getLength(photos); % Ensure our search query return some results. if numPhotos == 0 error('Failed to retrieve photos from Flickr. XML response was: %s', ... xmlwrite(response)); end % Otherwise, convert the search results into a cell array of structures % containing the search result information. photoInfo = cell(1, numPhotos); for idx = 1:numPhotos node = photos.item(idx-1); photoInfo{idx} = struct('farm', char(node.getAttribute('farm')), ... 'server', char(node.getAttribute('server')), ... 'id', char(node.getAttribute('id')), ... 'secret', char(node.getAttribute('secret')), ... 'owner', char(node.getAttribute('owner'))); end
Define a function to retrieve and process the photos
For each search result, we need to run a function to retrieve the image data. The function also searches through the Flickr comments for the structured information from astrometry.net, and returns the direction the telescope was pointing. In the next stage, we'll submit multiple invocations of this function using parfeval.
function [photo, location, username] = getAstrometryPhotoFromFlickr(appKey, info) % First, build up the photo URL as per the Flickr documentation % here: <http://www.flickr.com/services/api/misc.urls.html> url = sprintf('http://farm%s.staticflickr.com/%s/%s_%s.jpg', ... info.farm, info.server, info.id, info.secret); try photo = imread(url); catch E % Sometimes the read fails. Try one more time. photo = imread(url); end % % Get the photo info to extract the username url = ['http://api.flickr.com/services/rest/', ... '?method=flickr.photos.getInfo', ... '&api_key=', appKey, '&photo_id=', info.id]; response = xmlread(url); owner = response.getDocumentElement.getElementsByTagName('owner'); if getLength(owner) == 1 username = char(owner.item(0).getAttribute('username')); else username = 'Unknown'; end % % Next, look through the comments for the photo to extract % the annotation from astrometry.net. url = ['http://api.flickr.com/services/rest/', ... '?method=flickr.photos.comments.getList', ... '&api_key=', appKey, '&photo_id=', info.id]; response = xmlread(url); % % Get all the actual comment elements from the XML response. comments = response.getDocumentElement.getElementsByTagName('comment'); % % Loop over all the comments, looking for the first by astrometry.net % which contains information about the photo. for idx = 0:(getLength(comments)-1) comment = comments.item(idx); if strcmp(char(comment.getAttribute('authorname')), 'astrometry.net') % We've found the comment, extract the comment text. commentText = char(comment.getTextContent()); % Pick out the location information into a row vector by % using a regular expression. locationText = regexprep(... commentText, '.*center: *\(([-0-9,. ]+)\) *degrees.*', '$1'); location = sscanf(locationText, '%g,', [1, 2]); return end end % % We didn't find the astrometry.net comment, so location is unknown. location = [NaN, NaN]; end
Use parfeval to submit work to the parallel workers to read the images
For each search result, we will make a parfeval request for that particular result. This will cause one of the workers in our parallel pool to invoke that function. If a parallel pool is not already open, then one will be created automatically. Each parfeval call returns a parallel.Future instance. The parfeval requests are executed in order by the parallel pool workers. The syntax of parfeval is rather similar to that of the MATLAB function feval, except that because the evaluation doesn't take place immediately, you must specify how many output arguments you want to request. In this case, we want 3 outputs from getAstrometryPhotoFromFlickr.
for idx = 1:numPhotos futures(idx) = parfeval(@getAstrometryPhotoFromFlickr, 3, ... appKey, photoInfo{idx}); end
Starting parallel pool (parpool) using the 'local' profile ... connected to 6 workers.
Prepare a figure for displaying the results
While the workers are busy retrieving images from Flickr, we can set up a figure to display those images.
% Create a new figure, initially not visible. figHandle = figure('Position', [200, 200, 600, 800], ... 'Units', 'pixels', 'Visible', 'off'); % Use FEX submission 'tight_subplot' to set up the axes % https://www.mathworks.com/matlabcentral/fileexchange/27991-tight-subplot-nh--nw--gap--marg-h--marg-w- axesHandles = tight_subplot(4, 3, [0.06, 0.01], [0.01, 0.06], 0.01); axis(axesHandles, 'off');
Use fetchNext to retrieve results and then display them
We now enter a loop where we wait for results to become available on the workers. We use fetchNext to achieve this. We pass fetchNext our array futures, and it returns when another element has completed and a new result is available. We can also specify a maximum time to wait for each new result, so that if the web service takes a really long time to run, we can abort execution gracefully.
Note that the results from fetchNext can arrive in any order, and we use the first output argument originalIdx to work out which element of futures the result corresponds to.
We're going to specify an upper bound on how long we're going to wait for all the futures to complete. If we exceed that time limit and there are still futures running, we can cancel them.
overallTimeLimit = 10; % seconds t = tic; set(figHandle, 'Visible', 'on'); numCompleted = 0; while numCompleted < numPhotos try % This blocks for up to 1 second waiting for a result. [originalIdx, photo, location, user] = fetchNext(futures, 1); catch E % Sometimes, the URL simply cannot be read, so an % error is thrown by the worker. Let's display that % and carry on. warning('Failed to read an image: %s', getReport(E)); originalIdx = []; end % If 'fetchNext' completed successfully, originalIdx % will be non-empty, and will contain the index into 'futures' % corresponding to the work that has just finished. if ~isempty(originalIdx) % Display attribution for photo and link to the original fprintf('Photo %2d by Flickr user: %s\n', originalIdx, user); info = photoInfo{originalIdx}; fprintf('Original at: http://www.flickr.com/%s/%s/\n', info.owner, info.id); % Pick our axes: axesToUse = axesHandles(originalIdx); % Display the image image(photo, 'Parent', axesToUse); axis(axesToUse, 'off'); axis(axesToUse, 'equal'); % Set the title of the axis to be the location title(axesToUse, num2str(location)); numCompleted = numCompleted + 1; elseif toc(t) > overallTimeLimit % We have exceeded our time budget! disp('Time limit expired!'); % Cancelling the futures stops execution of any running % futures, but has no effect on already-completed futures. cancel(futures); break; end end toc(t);
Photo 6 by Flickr user: s58y Original at: http://www.flickr.com/45032885@N04/4145493112/ Photo 2 by Flickr user: davedehetre Original at: http://www.flickr.com/22433418@N04/4950431138/ Photo 5 by Flickr user: s58y Original at: http://www.flickr.com/45032885@N04/8541349810/ Photo 1 by Flickr user: s58y Original at: http://www.flickr.com/45032885@N04/4553112538/ Photo 4 by Flickr user: Torben Bjørn Hansen Original at: http://www.flickr.com/21067003@N07/6105409913/ Photo 3 by Flickr user: Harry Thomas Photography Original at: http://www.flickr.com/84598277@N08/8882356507/ Photo 7 by Flickr user: s58y Original at: http://www.flickr.com/45032885@N04/4144380775/ Photo 8 by Flickr user: Jody Roberts Original at: http://www.flickr.com/8374715@N06/8719966332/ Photo 9 by Flickr user: s58y Original at: http://www.flickr.com/45032885@N04/4145140448/ Photo 11 by Flickr user: davedehetre Original at: http://www.flickr.com/22433418@N04/4954464378/ Photo 12 by Flickr user: s58y Original at: http://www.flickr.com/45032885@N04/4535865643/ Photo 10 by Flickr user: cfaobam Original at: http://www.flickr.com/33763963@N05/7060054229/ Elapsed time is 2.897368 seconds.
Conclusions
We have seen how using the parfeval command can be used to retrieve data from a web service, and display the results interactively as they become available. parfeval allows you more flexibility than parfor to:
- Plot results while the parallel computations proceed
- Gracefully break out of parallel computations
If you've ever wanted to show a waitbar or plot partial results during a parfor loop, then you might be able to adapt your code to use parfeval. If you have any further ideas, let us know here.
- Category:
- New Feature,
- Parallel
Comments
To leave a comment, please click here to sign in to your MathWorks Account or create a new one.