Accessing Twitter with HTTP Interface in MATLAB
Folks, today we have a nice collaboration between blogs. If you haven't already seen it, Toshi Takeuchi just posted on Loren's blog today to help you analyze fake news. However, he is also here today to teach you how to fish, so to speak, so you are empowered to use the protocol to transfer hypertext all over the interweb! Here's Toshi...
When it comes to Twitter, our usual go-to tool is venerable Twitty by Vladimir Bondarenko. Twitty hasn't been updated since July 2013 and you may want to take advantage of newer features in MATLAB, such as string arrays and jsonencode and jsondecode functions.
One of the new exciting features introduced in R2016b is HTTP Interface, and we can use it to accessTwitter API and deal with its OAuth requirements without Twitty if you want to roll your own tool. Let's search for tweets related to NASA as an example.
Contents
Twitty as reference
Twitty is still a good reference for how we deal with OAuth authentication, which is taken care of in callTwitterAPI method. I would like to walk through this method, replacing Java calls with HTTP Interface where applicable. To follow along please also check this diagram.
dbtype twitty_1.1.1/twitty.m 1816:1819
1816 %% The main API caller 1817 methods(Access = private) 1818 function S = callTwitterAPI(twtr,httpMethod,url,params,~) 1819 % Call to the twitter API.
Twitter Credentials
I am assuming that you already have your developer credentials and downloaded Twitty into your curent folder. The workspace variable creds should contain your credentials in a struct in the following format:
creds = struct; % example creds.ConsumerKey = 'your consumer key'; creds.ConsumerSecret = 'your consumer secret'; creds.AccessToken = 'your token'; creds.AccessTokenSecret = 'your token secret'; load creds % load my real credentials
Percent Encoding
The Twitter documentation says it is important to use proper percent encoding comforming to* *RFC 3986, Section 2.1. Java's URLEncoder.encode does almost everything except for a space, which becomes '+' rather than '%20'. Let's fix that with an anonymous function pctencode. Twitty also does the same.
pctencode = @(str) ... % anonymous function replace(char(java.net.URLEncoder.encode(str,'UTF-8') ),'+','%20'); query = 'matlab http interface (new in R2016b)'; % sample text to encode disp(pctencode(query)) % encoded text
matlab%20http%20interface%20%28new%20in%20R2016b%29
Twitter Search API
Now let's try the Twitter Search API to search for tweets about NASA. I am also going to use string arrays to vectorize string handling.
base_url = 'https://api.twitter.com/1.1/search/tweets.json';% search API URL params = struct; % initialize params params.q = pctencode('nasa'); % search term params.include_entities = urlencode('true'); % search param keys = string(fieldnames(params)); % get field names values = string(struct2cell(params)); % get field values % vectorized string handling keys = keys + '='; % add = values = values + '&'; % add & queryStr = join([keys, values],''); % join columns queryStr = join(queryStr(:),''); % convert to a single string queryStr{1}(end) = []; % remove extra & disp(queryStr) % print string
q=nasa&include_entities=true
Set up OAuth Parameters for Authentication
Twitter documentation also covers number of other requirements in order to implement OAuth authentication properly.
- Declare OAuth version
params.oauth_version = '1.0';
- Create unique token for an HTTP request by generating a random alphanumeric sequence.
params.oauth_nonce = replace([num2str(now) num2str(rand)], '.', '');
- Create a timestamp in number of seconds since the Unix epoch using datetime arrays.
epoch = datetime('1970-01-01T00:00:00Z','InputFormat', ... 'yyyy-MM-dd''T''HH:mm:ssXXX','TimeZone','UTC'); curr = datetime('now','TimeZone','UTC','Format', ... 'yyyy-MM-dd''T''HH:mm:ssXXX'); params.oauth_timestamp = int2str(seconds(curr - epoch));
- Get user credentials.
params.oauth_consumer_key = creds.ConsumerKey; params.oauth_token = creds.AccessToken;
- Declare HTTP request signature encryption scheme, which must be HMAC-SHA1.
sigMethod = 'HMAC-SHA1';
params.oauth_signature_method = sigMethod;
Now we can put it all together using vectorized string handling.
params = orderfields(params); % sort the field order keys = string(fieldnames(params)); % get field names values = string(struct2cell(params)); % get field values keys = keys + '='; % add = values = values + '&'; % add & oauthStr = join([keys, values],''); % join columns oauthStr = join(oauthStr(:),''); % convert to a single string oauthStr{1}(end) = []; % remove extra &
Generate Signature with HMAC-SH1 Encryption
Create the signature base string and signature key.
httpMethod = string('GET'); % set http method to get sigStr = upper(httpMethod); % add it to sigStr sigStr = [sigStr pctencode(base_url)]; % add base Url sigStr = [sigStr pctencode(char(oauthStr))]; % add oauthStr sigStr = char(join(sigStr,'&')); % join them with & sigKey = [creds.ConsumerSecret '&' creds.AccessTokenSecret];% create sigKey
Generate the signature with HMAC-SHA1 using Java as Twitty does.
import javax.crypto.Mac % import Java libraries import javax.crypto.spec.SecretKeySpec; % key spec methods import org.apache.commons.codec.binary.Base64 % base64 codec algorithm = strrep(sigMethod,'-',''); % use HMAC-SHA1 key = SecretKeySpec(int8(sigKey), algorithm); % get key mac = Mac.getInstance(algorithm); % message auth code mac.init(key); % initialize mac mac.update(int8(sigStr)); % add base string params.oauth_signature = ... % get signature char(Base64.encodeBase64(mac.doFinal)'); params = orderfields(params); % order fields
Build the OAuth Header
With the signature added to the parameter, we can now build the OAuth header.
keys = fieldnames(params); % get field names isOauth = contains(keys,'oauth'); % oauth params keys = cellfun(pctencode,keys,'UniformOutput',false); % percent-encode keys keys = string(keys) + '="'; % convert to string and add =" values = struct2cell(params); % get field values values = cellfun(pctencode,values,'UniformOutput',false); % percent-encode values values = string(values) +'", '; % convert to string and add ", oauth_header = join([keys, values],''); % join columns oauth_header = oauth_header(isOauth,:); % keep OAuth params only oauth_header = 'OAuth ' + join(oauth_header(:),''); % convert to a single string oauth_header{1}(end-1:end) = []; % remove extra "
Send HTTP Request
This is where everything we have done so far comes together and you get to see how you can use HTTP Interface functions to send an HTTP request, such as matlab.net.URI, matlab.net.http.RequestMethod, matlab.net.http.field.AuthenticateField, etc. I will just load the previously saved response object here for reproducibility.
import matlab.net.URI; % shorten lib calls import matlab.net.http.RequestMethod; import matlab.net.http.RequestMessage; import matlab.net.http.field.AuthorizationField; import matlab.net.http.field.ContentTypeField; connURI = URI(base_url,queryStr); % create URI obj method = RequestMethod.(char(httpMethod)); % create method obj auth = AuthorizationField('Authorization',oauth_header); % create auth field obj ct_value = 'application/x-www-form-urlencoded'; % content type content_type = ContentTypeField(ct_value); % create content type obj header = [auth content_type]; % combine header fields objs request = RequestMessage(method,header); % create request obj response = send(request,connURI); % send http request load response % load saved resp disp(response)
ResponseMessage with properties: StatusLine: ' 200 OK' StatusCode: OK Header: [1×25 matlab.net.http.HeaderField] Body: [1×1 matlab.net.http.MessageBody] Completed: 0
Process Response
The Twitter API returns its response in JSON format, but the HTTP interface in MATLAB automatically parses it into a structure array. Now you can process it with any familiar MATLAB functions of your choice.
if string(fieldnames(response.Body.Data)) ~= 'errors' % if not errors s = response.Body.Data.statuses; % get tweets if ~isempty(s) % if not empty tweets = cellfun(@(x) x.text, s, 'UniformOutput',0);% extract text tweets = char(replace(tweets,char(10),' ')); % remove new line tweets = string(tweets(:,1:70)) + '...'; % truncate tweets disp(tweets) end end
"RT @RogueNASA: 10 days, 11 hours remaining. We've almost made it to $1..." "RT @gp_pulipaka: NASA Tests AI For Autonomous Exploration. #BigData #D..." "RT @BillP____: - Που πας, Παιδί μου ; - Στο Διάστημα με τον Αλέξη, Μάν..." "Camp Century: Put on Ice, But Only for So Long : Image of the Day http..." "RT @SAI: This super powerful telescope is looking at the first black h..." "RT @medsymeds: Minsan talaga mahirap intindihin ang mga magulang kahit..." "RT @hot_jmea: Haayy. naku Yuan...tlaga nga nmang nasa huli ang pagsisi..." "RT @WIRED: 🎥 The traditional, white NASA space suit just got a makeo..." "RT @ariscfu: Νάτος και ο Βέρνερ φον Μπράουν... #greek_NASA https://t.c..." "RT @LitsaXarou: Δυο πράγματα δεν τόλμησε να κανει κανει ο Γιώργος Παπα..." "Simula nung jan. 26 nung nalaman ko, lagi nasa isip ko "MAG IPON" gust..." "RT @Rika_Skafida: Όπου αεροπλάνο, βάζουμε διαστημόπλοιο #Διαστημικα_υπ..." "RT @Fezaari12: You Only Live Once ang sagot sa tanong nasa tv screen #..." "@mik3cap NASA's CI Newsletter. ..." "RT @vasilisxatzis13: Μετά από τις περικοπές στην #greek_NASA https://t..."
Summary
In this post you saw how you can use HTTP Interface to access Twitter and you now have a foundation to build your own tool to analyze Twitter feed. HTTP interface feature offers other possibilities as well. For example, I used it to get expanded urls from shortened urls. How would you use this in your projects? Please share your ideas here.
评论
要发表评论,请点击 此处 登录到您的 MathWorks 帐户或创建一个新帐户。