Developer Zone

Advanced Software Development with MATLAB

Accessing Twitter with HTTP Interface in MATLAB

Folks, today we have a nice collaboration between blogs. If you haven't already seen it, Toshi Takeuchi just posted on Loren's blog today to help you analyze fake news. However, he is also here today to teach you how to fish, so to speak, so you are empowered to use the protocol to transfer hypertext all over the interweb! Here's Toshi...

When it comes to Twitter, our usual go-to tool is venerable Twitty by Vladimir Bondarenko. Twitty hasn't been updated since July 2013 and you may want to take advantage of newer features in MATLAB, such as string arrays and jsonencode and jsondecode functions.

One of the new exciting features introduced in R2016b is HTTP Interface, and we can use it to accessTwitter API and deal with its OAuth requirements without Twitty if you want to roll your own tool. Let's search for tweets related to NASA as an example.

Contents

Twitty as reference

Twitty is still a good reference for how we deal with OAuth authentication, which is taken care of in callTwitterAPI method. I would like to walk through this method, replacing Java calls with HTTP Interface where applicable. To follow along please also check this diagram.

dbtype twitty_1.1.1/twitty.m 1816:1819
1816  %% The main API caller
1817      methods(Access = private)
1818          function S = callTwitterAPI(twtr,httpMethod,url,params,~)
1819          % Call to the twitter API. 

Twitter Credentials

I am assuming that you already have your developer credentials and downloaded Twitty into your curent folder. The workspace variable creds should contain your credentials in a struct in the following format:

creds = struct;                                             % example
creds.ConsumerKey = 'your consumer key';
creds.ConsumerSecret = 'your consumer secret';
creds.AccessToken = 'your token';
creds.AccessTokenSecret = 'your token secret';
load creds                                                  % load my real credentials

Percent Encoding

The Twitter documentation says it is important to use proper percent encoding comforming to* *RFC 3986, Section 2.1. Java's URLEncoder.encode does almost everything except for a space, which becomes '+' rather than '%20'. Let's fix that with an anonymous function pctencode. Twitty also does the same.

pctencode = @(str) ...                                      % anonymous function
    replace(char(java.net.URLEncoder.encode(str,'UTF-8') ),'+','%20');
query = 'matlab http interface (new in R2016b)';            % sample text to encode
disp(pctencode(query))                                      % encoded text
matlab%20http%20interface%20%28new%20in%20R2016b%29

Twitter Search API

Now let's try the Twitter Search API to search for tweets about NASA. I am also going to use string arrays to vectorize string handling.

base_url = 'https://api.twitter.com/1.1/search/tweets.json';% search API URL
params = struct;                                            % initialize params
params.q = pctencode('nasa');                               % search term
params.include_entities = urlencode('true');                % search param
keys = string(fieldnames(params));                          % get field names
values = string(struct2cell(params));                       % get field values

% vectorized string handling
keys = keys + '=';                                          % add =
values = values + '&';                                      % add &
queryStr = join([keys, values],'');                         % join columns
queryStr = join(queryStr(:),'');                            % convert to a single string
queryStr{1}(end) = [];                                      % remove extra &
disp(queryStr)                                              % print string
q=nasa&include_entities=true

Set up OAuth Parameters for Authentication

Twitter documentation also covers number of other requirements in order to implement OAuth authentication properly.

  • Declare OAuth version
params.oauth_version = '1.0';
  • Create unique token for an HTTP request by generating a random alphanumeric sequence.
params.oauth_nonce = replace([num2str(now) num2str(rand)], '.', '');
  • Create a timestamp in number of seconds since the Unix epoch using datetime arrays.
epoch = datetime('1970-01-01T00:00:00Z','InputFormat', ...
    'yyyy-MM-dd''T''HH:mm:ssXXX','TimeZone','UTC');
curr = datetime('now','TimeZone','UTC','Format', ...
    'yyyy-MM-dd''T''HH:mm:ssXXX');
params.oauth_timestamp = int2str(seconds(curr - epoch));
  • Get user credentials.
params.oauth_consumer_key = creds.ConsumerKey;
params.oauth_token = creds.AccessToken;
  • Declare HTTP request signature encryption scheme, which must be HMAC-SHA1.
sigMethod = 'HMAC-SHA1';
params.oauth_signature_method = sigMethod;

Now we can put it all together using vectorized string handling.

params = orderfields(params);                               % sort the field order
keys = string(fieldnames(params));                          % get field names
values = string(struct2cell(params));                       % get field values
keys = keys + '=';                                          % add =
values = values + '&';                                      % add &
oauthStr = join([keys, values],'');                         % join columns
oauthStr = join(oauthStr(:),'');                            % convert to a single string
oauthStr{1}(end) = [];                                      % remove extra &

Generate Signature with HMAC-SH1 Encryption

Create the signature base string and signature key.

httpMethod = string('GET');                                 % set http method to get
sigStr = upper(httpMethod);                                 % add it to sigStr
sigStr = [sigStr pctencode(base_url)];                      % add base Url
sigStr = [sigStr pctencode(char(oauthStr))];                % add oauthStr
sigStr = char(join(sigStr,'&'));                            % join them with &
sigKey = [creds.ConsumerSecret '&' creds.AccessTokenSecret];% create sigKey

Generate the signature with HMAC-SHA1 using Java as Twitty does.

import javax.crypto.Mac                                     % import Java libraries
import javax.crypto.spec.SecretKeySpec;                     % key spec methods
import org.apache.commons.codec.binary.Base64               % base64 codec

algorithm = strrep(sigMethod,'-','');                       % use HMAC-SHA1
key = SecretKeySpec(int8(sigKey), algorithm);               % get key
mac = Mac.getInstance(algorithm);                           % message auth code
mac.init(key);                                              % initialize mac
mac.update(int8(sigStr));                                   % add base string
params.oauth_signature = ...                                % get signature
    char(Base64.encodeBase64(mac.doFinal)');
params = orderfields(params);                               % order fields

Build the OAuth Header

With the signature added to the parameter, we can now build the OAuth header.

keys = fieldnames(params);                                  % get field names
isOauth = contains(keys,'oauth');                           % oauth params
keys = cellfun(pctencode,keys,'UniformOutput',false);       % percent-encode keys
keys = string(keys) + '="';                                 % convert to string and add ="
values = struct2cell(params);                               % get field values
values = cellfun(pctencode,values,'UniformOutput',false);   % percent-encode values
values = string(values) +'", ';                             % convert to string and add ",
oauth_header = join([keys, values],'');                     % join columns
oauth_header = oauth_header(isOauth,:);                     % keep OAuth params only
oauth_header = 'OAuth ' + join(oauth_header(:),'');         % convert to a single string
oauth_header{1}(end-1:end) = [];                            % remove extra "

Send HTTP Request

This is where everything we have done so far comes together and you get to see how you can use HTTP Interface functions to send an HTTP request, such as matlab.net.URI, matlab.net.http.RequestMethod, matlab.net.http.field.AuthenticateField, etc. I will just load the previously saved response object here for reproducibility.

import matlab.net.URI;                                      % shorten lib calls
import matlab.net.http.RequestMethod;
import matlab.net.http.RequestMessage;
import matlab.net.http.field.AuthorizationField;
import matlab.net.http.field.ContentTypeField;

connURI = URI(base_url,queryStr);                           % create URI obj
method = RequestMethod.(char(httpMethod));                  % create method obj
auth = AuthorizationField('Authorization',oauth_header);    % create auth field obj
ct_value = 'application/x-www-form-urlencoded';             % content type
content_type = ContentTypeField(ct_value);                  % create content type obj
header = [auth content_type];                               % combine header fields objs
request = RequestMessage(method,header);                    % create request obj
response = send(request,connURI);                           % send http request
load response                                               % load saved resp
disp(response)
  ResponseMessage with properties:

    StatusLine: ' 200 OK'
    StatusCode: OK
        Header: [1×25 matlab.net.http.HeaderField]
          Body: [1×1 matlab.net.http.MessageBody]
     Completed: 0

Process Response

The Twitter API returns its response in JSON format, but the HTTP interface in MATLAB automatically parses it into a structure array. Now you can process it with any familiar MATLAB functions of your choice.

if string(fieldnames(response.Body.Data)) ~= 'errors'       % if not errors
    s = response.Body.Data.statuses;                        % get tweets
    if ~isempty(s)                                          % if not empty
        tweets = cellfun(@(x) x.text, s, 'UniformOutput',0);% extract text
        tweets = char(replace(tweets,char(10),' '));        % remove new line
        tweets = string(tweets(:,1:70)) + '...';            % truncate tweets
        disp(tweets)
    end
end
    "RT @RogueNASA: 10 days, 11 hours remaining. We've almost made it to $1..."
    "RT @gp_pulipaka: NASA Tests AI For Autonomous Exploration. #BigData #D..."
    "RT @BillP____: - Που πας, Παιδί μου ; - Στο Διάστημα με τον Αλέξη, Μάν..."
    "Camp Century: Put on Ice, But Only for So Long : Image of the Day http..."
    "RT @SAI: This super powerful telescope is looking at the first black h..."
    "RT @medsymeds: Minsan talaga mahirap intindihin ang mga magulang kahit..."
    "RT @hot_jmea: Haayy. naku Yuan...tlaga nga nmang nasa huli ang pagsisi..."
    "RT @WIRED: 🎥  The traditional, white NASA space suit just got a makeo..."
    "RT @ariscfu: Νάτος και ο Βέρνερ φον Μπράουν... #greek_NASA https://t.c..."
    "RT @LitsaXarou: Δυο πράγματα δεν τόλμησε να κανει κανει ο Γιώργος Παπα..."
    "Simula nung jan. 26 nung nalaman ko, lagi nasa isip ko "MAG IPON" gust..."
    "RT @Rika_Skafida: Όπου αεροπλάνο, βάζουμε διαστημόπλοιο #Διαστημικα_υπ..."
    "RT @Fezaari12: You Only Live Once ang sagot sa tanong nasa tv screen #..."
    "@mik3cap NASA's CI Newsletter.                                        ..."
    "RT @vasilisxatzis13: Μετά από τις περικοπές στην #greek_NASA https://t..."

Summary

In this post you saw how you can use HTTP Interface to access Twitter and you now have a foundation to build your own tool to analyze Twitter feed. HTTP interface feature offers other possibilities as well. For example, I used it to get expanded urls from shortened urls. How would you use this in your projects? Please share your ideas here.




Published with MATLAB® R2016b

|
  • print

评论

要发表评论,请点击 此处 登录到您的 MathWorks 帐户或创建一个新帐户。