{"id":920,"date":"2014-06-04T15:13:32","date_gmt":"2014-06-04T20:13:32","guid":{"rendered":"https:\/\/blogs.mathworks.com\/loren\/?p=920"},"modified":"2017-01-06T11:04:54","modified_gmt":"2017-01-06T16:04:54","slug":"analyzing-twitter-with-matlab","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/loren\/2014\/06\/04\/analyzing-twitter-with-matlab\/","title":{"rendered":"Analyzing Twitter with MATLAB"},"content":{"rendered":"<div class=\"content\"><!--introduction--><p>Whatever your opinion of social media these days, there is no denying it is now an integral part of our digital life. So much so, that social media metrics are now considered part of <a href=\"http:\/\/en.wikipedia.org\/wiki\/Altmetrics\">altmetrics<\/a>, an alternative to the established metrics such as citations to measure the impact of scientific papers.<\/p><p>Today's guest blogger, Toshi, will show you how to access the Twitter API and analyze tweets with MATLAB.<\/p><!--\/introduction--><h3>Contents<\/h3><div><ul><li><a href=\"#d35f871f-1c4b-4d4e-b4e3-6ba8f54a2a8f\">Why Twitter<\/a><\/li><li><a href=\"#1d9f294f-e18d-468e-b2fc-494f10def545\">Sentiment Analysis<\/a><\/li><li><a href=\"#b6c3c420-848a-41db-b498-0b7ea07a994a\">Tweet Content Visualization<\/a><\/li><li><a href=\"#8a8d773e-76dc-46f2-bab2-ceb3a61cab84\">Who Tweeted the News?<\/a><\/li><li><a href=\"#08d25693-9bb2-4393-be90-158102525c4c\">Does Follower Count Really Matter? Going Viral on Twitter<\/a><\/li><li><a href=\"#4e356d6f-d84c-47d7-bf6e-2ed599c22663\">Visualizing the Retweet Social Graph<\/a><\/li><li><a href=\"#dadf0a8b-12db-4902-b96a-f163a20ef083\">Getting Started with Twitter using Twitty<\/a><\/li><li><a href=\"#69cace70-53e1-4682-80e9-c66c765f6655\">Processing Tweets and Scoring Sentiments<\/a><\/li><li><a href=\"#2f9c9396-fd30-4764-a003-736abcc9d67c\">Processing Tweets for Content Visualization<\/a><\/li><li><a href=\"#b2caf181-37b5-4969-b324-e08841fbee72\">Get the Profile of Top 5 Users<\/a><\/li><li><a href=\"#a4f55b8e-89ba-4749-b2c1-38d368be7744\">Streaming API for High Volume Real Time Tweets<\/a><\/li><li><a href=\"#50f871fc-d8fd-4e93-8041-f16c853291fe\">Save an Edge List for Social Graph Visualization<\/a><\/li><li><a href=\"#18cac743-8cff-4184-8213-ce18753c3147\">Closing<\/a><\/li><\/ul><\/div><h4>Why Twitter<a name=\"d35f871f-1c4b-4d4e-b4e3-6ba8f54a2a8f\"><\/a><\/h4><p>Twitter is a good starting point for social media analysis because people openly share their opinions to the general public. This is very different from Facebook, where social interactions are often private. In this post, I would like to share simple examples of sentiment analysis and social graph visualization using Twitter's Search and Streaming APIs.<\/p><p>The first part of this post discusses analysis with Twitter, and the latter part shows the code that computes and creates plots, like those shown earlier.<\/p><h4>Sentiment Analysis<a name=\"1d9f294f-e18d-468e-b2fc-494f10def545\"><\/a><\/h4><p>One of the very common analyses you can perform on a large number of tweets is sentiment analysis. Sentiment is scored based on the words contained in a tweet. If you manage a brand or political campaign, for example, it may be important to keep track of your popularity, and sentiment analysis provides a convenient way to take the pulse of the tweeting public. Here is an example of sentiment analysis between <a href=\"http:\/\/www.theatlantic.com\/business\/archive\/2014\/05\/how-the-amazon-hachette-fight-could-shape-the-future-of-ideas\/371756\/\">Amazon and Hachette<\/a> as of this writing, based on 100 tweets collected via the Twitter Search API.<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014\/sentimentAnalysis.png\" alt=\"\"> <\/p><p>The sentiment distributions are nearly identical between the two brands, but you can see that tweets mentioning both have clearly skewed to the negative, since the news is about a war between Amazon and a publisher over ebook profit margin. Is there a single metric we can use to make this comparison easier? That's where <i>Net Sentiment Rate (NSR)<\/i> comes in.<\/p><pre>NSR = (Positive Tweets-Negative Tweets)\/Total<\/pre><p>Here is the result. You could keep taking this measurement periodically for ongoing sentiment monitoring, if interested. Perhaps you may discover that NSR is correlated to their stock prices!<\/p><pre>Amazon NSR  :  0.84\r\nHachette NSR:  0.58\r\nBoth NSR    : -0.30<\/pre><p>And lastly, but not in the least, did sentiment scoring actually work? Check out the top 5 positive and negative tweets for Hachette for your own assessment.<\/p><pre>                               Top 5 positive tweets\r\n   ___________________________________________________________________________<\/pre><pre>   '@deckchairs @OccupyMyCat @aworkinglibrary but I think Hachette artists...'\r\n   '@emzleb Hachette has Rowling so they hold a lot of cards (A LOT of car...'\r\n   'Amazon Confirms Hachette Spat Is To \"Get a Better Deal\" http:\/\/t.co\/Ka...'\r\n   '@shaunduke @DarkMatterzine Yeah, Gollancz is owned by Orion Publishing...'\r\n   'MUST READ Book publisher Hachette says working to resolve Amazon dispu...'<\/pre><pre>                               Top 5 negative tweets\r\n   ___________________________________________________________________________<\/pre><pre>   'Reading into the Amazon vs. Hachette battle - May 28 - The war between...'\r\n   '#Vtech Reading into the Amazon vs. Hachette battle - May 28 - The war ...'\r\n   '#Vbnss Reading into the Amazon vs. Hachette battle - May 28 - The war ...'\r\n   'RT @text_publishing: Amazon war with Hachette over ebook profit margin...'\r\n   'RT @text_publishing: Amazon war with Hachette over ebook profit margin...'<\/pre><h4>Tweet Content Visualization<a name=\"b6c3c420-848a-41db-b498-0b7ea07a994a\"><\/a><\/h4><p>What were the main themes they tweeted about when those users mentioned both Amazon and Hachette? The word count plot shows that mostly those tweets repeated the news headlines like &#8220;Amazon admits dispute (with) Hachette&#8221;, perhaps with some commentary - showing that Twitter was being used for news amplification.<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014\/content.png\" alt=\"\"> <\/p><h4>Who Tweeted the News?<a name=\"8a8d773e-76dc-46f2-bab2-ceb3a61cab84\"><\/a><\/h4><p>The 100 tweets collected came from 86 users. So on average each user only tweeted 1.16 times. Instead of frequency, let's find out who has a large number of followers (an indicator that they may be influential) and check their profile. It appears that 2 or 3 out of the 5 top users (based on follower count) are writers, and others are news syndication services.<\/p><pre>         Name          Followers                        Description\r\n   ________________    _________    ____________________________________________________<\/pre><pre>   'Daton L Fluker'    73578        '#Horror #Novelist of Death Keeper's Biological Wast...'\r\n   'WellbeingVigor'    22224        'Writer  - 10 years .here, Incurable music enthusiast #'\r\n   'E-Book Update'     10870        ''\r\n   'Michael Rosa'      10297        ''\r\n   'Net Tech News'      7487        'Latest internet and technology news headlines from ...'<\/pre><h4>Does Follower Count Really Matter? Going Viral on Twitter<a name=\"08d25693-9bb2-4393-be90-158102525c4c\"><\/a><\/h4><p>In the previous section, we checked out the top 5 users based on their follower count. The assumption was that, if you have a large number of followers, you are considered more influential because more people may see your tweets.<\/p><p>Now let's test this assumption. For that I need more than 100 tweets. So I collected a new batch of data - 1000 tweets from 4 trending topics from the UK, and plotted the users based on their follower counts vs. how often their tweets got retweeted. The size (and the color) of the bubbles show how often those users tweeted.<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014\/bubblechart.png\" alt=\"\"> <\/p><p>It looks like you do need some base number of followers to make it to the national level, but the correlation between the follower counts to the frequency of getting retweeted looks weak. Those charts look like different stages of viral diffusion - the top two charts clearly show one user broke away from the rest of the crowd, and in that process they may have also gained more followers. The bottom two charts show a number of users competing for attention but no one has a clear breakout yet. If this was an animation, it may look like boiling water. Is anyone interested in analyzing whether this is indeed how a tweet goes viral?<\/p><h4>Visualizing the Retweet Social Graph<a name=\"4e356d6f-d84c-47d7-bf6e-2ed599c22663\"><\/a><\/h4><p>Retweeting of one user's tweet by others creates a network of relationships that can be represented as a social graph. We can visualize such relationship with a popular social networking analysis tool <a href=\"https:\/\/gephi.org\/\">Gephi<\/a>.<\/p><p>\"I Can't Sing\" Social Graph <a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014\/ICantSing.png\">Larger<\/a><\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014\/ICantSing-sm.png\" alt=\"\"> <\/p><p>\"#InABlackHousehold\" Social Graph <a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014\/InABlackHousehold.png\">Larger<\/a><\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014\/InABlackHousehold-sm.png\" alt=\"\"> <\/p><p>You can see that, in the first case, two users formed large clusters of people retweeting their tweets, and everyone else was dwarfed. In the second case, we also see two dominant users, but they have not yet formed a large scale cluster.<\/p><h4>Getting Started with Twitter using Twitty<a name=\"dadf0a8b-12db-4902-b96a-f163a20ef083\"><\/a><\/h4><p>Now that you have seen a simple analysis I did with Twitter, it is time to share how I did it in MATLAB. To get started with Twitter, you need to <a href=\"https:\/\/developer.twitter.com\/en\/docs\/basics\/authentication\/guides\/access-tokens\">get your developer credentials<\/a>. You also need Twitty by Vladimir Bondarenko. It is simple to use and comes with excellent documentation.<\/p><div><ol><li>Create a <a href=\"https:\/\/twitter.com\/\">Twitter  account<\/a> if you do not already have one<\/li><li>Create a  <a href=\"https:\/\/apps.twitter.com\/\">Twitter app<\/a> to obtain developer credentials<\/li><li>Download and install <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/34837-twitty\">Twitty<\/a> from the FileExchange, along with the <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/20565-json-parser\">JSON Parser<\/a> and optionally <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/33381-jsonlab--a-toolbox-to-encode-decode-json-files\">JSONLab<\/a><\/li><li>Create a structure array to store your credentials for Twitty<\/li><\/ol><\/div><p>Let's search for tweets that mention <tt>'amazon'<\/tt> and <tt>'hachette'<\/tt>.<\/p><pre class=\"codeinput\"><span class=\"comment\">% a sample structure array to store the credentials<\/span>\r\ncreds = struct(<span class=\"string\">'ConsumerKey'<\/span>,<span class=\"string\">'your-consumer-key-here'<\/span>,<span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">'ConsumerSecret'<\/span>,<span class=\"string\">'your-consumer-secret-here'<\/span>,<span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">'AccessToken'<\/span>,<span class=\"string\">'your-token-here'<\/span>,<span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">'AccessTokenSecret'<\/span>,<span class=\"string\">'your-token-secret-here'<\/span>);\r\n\r\n<span class=\"comment\">% set up a Twitty object<\/span>\r\naddpath <span class=\"string\">twitty_1.1.1<\/span>; <span class=\"comment\">% Twitty<\/span>\r\naddpath <span class=\"string\">parse_json<\/span>; <span class=\"comment\">% Twitty's default json parser<\/span>\r\naddpath <span class=\"string\">jsonlab<\/span>; <span class=\"comment\">% I prefer JSONlab, however.<\/span>\r\nload(<span class=\"string\">'creds.mat'<\/span>) <span class=\"comment\">% load my real credentials<\/span>\r\ntw = twitty(creds); <span class=\"comment\">% instantiate a Twitty object<\/span>\r\ntw.jsonParser = @loadjson; <span class=\"comment\">% specify JSONlab as json parser<\/span>\r\n\r\n<span class=\"comment\">% search for English tweets that mention 'amazon' and 'hachette'<\/span>\r\namazon = tw.search(<span class=\"string\">'amazon'<\/span>,<span class=\"string\">'count'<\/span>,100,<span class=\"string\">'include_entities'<\/span>,<span class=\"string\">'true'<\/span>,<span class=\"string\">'lang'<\/span>,<span class=\"string\">'en'<\/span>);\r\nhachette = tw.search(<span class=\"string\">'hachette'<\/span>,<span class=\"string\">'count'<\/span>,100,<span class=\"string\">'include_entities'<\/span>,<span class=\"string\">'true'<\/span>,<span class=\"string\">'lang'<\/span>,<span class=\"string\">'en'<\/span>);\r\nboth = tw.search(<span class=\"string\">'amazon hachette'<\/span>,<span class=\"string\">'count'<\/span>,100,<span class=\"string\">'include_entities'<\/span>,<span class=\"string\">'true'<\/span>,<span class=\"string\">'lang'<\/span>,<span class=\"string\">'en'<\/span>);\r\n<\/pre><h4>Processing Tweets and Scoring Sentiments<a name=\"69cace70-53e1-4682-80e9-c66c765f6655\"><\/a><\/h4><p>Twitty stores tweets in structure array created from the API response in JSON format. I prefer using a <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/tables.html\">table<\/a> when it comes to working with heterogeneous data containing a mix of numbers and text. I wrote some code, <a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014\/processTweets.m\"><tt>processTweets<\/tt><\/a>, to convert structure arrays into tables and compute sentiment scores.  You can find the Amazon-Hachette data file <a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014\/amazonHachette.mat\">here<\/a>.<\/p><p>For sentiment analysis, I used <a href=\"http:\/\/www2.imm.dtu.dk\/pubdb\/views\/publication_details.php?id=6010\">AFINN<\/a>, along with <a href=\"http:\/\/www.textfixer.com\/resources\/common-english-words.txt\">a list of English stop words<\/a> so that we don't count frequent common words like \"a\" or \"the\".<\/p><pre class=\"codeinput\"><span class=\"comment\">% load supporting data for text processing<\/span>\r\nscoreFile = <span class=\"string\">'AFINN\/AFINN-111.txt'<\/span>;\r\nstopwordsURL =<span class=\"string\">'http:\/\/www.textfixer.com\/resources\/common-english-words.txt'<\/span>;\r\n<span class=\"comment\">% load previously saved data<\/span>\r\nload <span class=\"string\">amazonHachette.mat<\/span>\r\n\r\n<span class=\"comment\">% process the structure array with a utility method |extract|<\/span>\r\n[amazonUsers,amazonTweets] = processTweets.extract(amazon);\r\n<span class=\"comment\">% compute the sentiment scores with |scoreSentiment|<\/span>\r\namazonTweets.Sentiment = processTweets.scoreSentiment(amazonTweets, <span class=\"keyword\">...<\/span>\r\n    scoreFile,stopwordsURL);\r\n\r\n<span class=\"comment\">% repeat the process for hachette<\/span>\r\n[hachetteUsers,hachetteTweets] = processTweets.extract(hachette);\r\nhachetteTweets.Sentiment = processTweets.scoreSentiment(hachetteTweets, <span class=\"keyword\">...<\/span>\r\n    scoreFile,stopwordsURL);\r\n\r\n<span class=\"comment\">% repeat the process for tweets containing both<\/span>\r\n[bothUsers,bothTweets] = processTweets.extract(both);\r\nbothTweets.Sentiment = processTweets.scoreSentiment(bothTweets, <span class=\"keyword\">...<\/span>\r\n    scoreFile,stopwordsURL);\r\n\r\n<span class=\"comment\">% calculate and print NSRs<\/span>\r\namazonNSR = (sum(amazonTweets.Sentiment&gt;=0) <span class=\"keyword\">...<\/span>\r\n    -sum(amazonTweets.Sentiment&lt;0)) <span class=\"keyword\">...<\/span>\r\n    \/height(amazonTweets);\r\nhachetteNSR = (sum(hachetteTweets.Sentiment&gt;=0) <span class=\"keyword\">...<\/span>\r\n    -sum(hachetteTweets.Sentiment&lt;0)) <span class=\"keyword\">...<\/span>\r\n    \/height(hachetteTweets);\r\nbothNSR = (sum(bothTweets.Sentiment&gt;=0) <span class=\"keyword\">...<\/span>\r\n    -sum(bothTweets.Sentiment&lt;0)) <span class=\"keyword\">...<\/span>\r\n    \/height(bothTweets);\r\nfprintf(<span class=\"string\">'Amazon NSR  :  %.2f\\n'<\/span>,amazonNSR)\r\nfprintf(<span class=\"string\">'Hachette NSR:  %.2f\\n'<\/span>,hachetteNSR)\r\nfprintf(<span class=\"string\">'Both NSR    : %.2f\\n\\n'<\/span>,bothNSR)\r\n\r\n<span class=\"comment\">% plot the sentiment histogram of two brands<\/span>\r\nbinranges = min([amazonTweets.Sentiment; <span class=\"keyword\">...<\/span>\r\n    hachetteTweets.Sentiment; <span class=\"keyword\">...<\/span>\r\n    bothTweets.Sentiment]): <span class=\"keyword\">...<\/span>\r\n    max([amazonTweets.Sentiment; <span class=\"keyword\">...<\/span>\r\n    hachetteTweets.Sentiment; <span class=\"keyword\">...<\/span>\r\n    bothTweets.Sentiment]);\r\nbincounts = [histc(amazonTweets.Sentiment,binranges)<span class=\"keyword\">...<\/span>\r\n    histc(hachetteTweets.Sentiment,binranges)<span class=\"keyword\">...<\/span>\r\n    histc(bothTweets.Sentiment,binranges)];\r\nfigure\r\nbar(binranges,bincounts,<span class=\"string\">'hist'<\/span>)\r\nlegend(<span class=\"string\">'Amazon'<\/span>,<span class=\"string\">'Hachette'<\/span>,<span class=\"string\">'Both'<\/span>,<span class=\"string\">'Location'<\/span>,<span class=\"string\">'Best'<\/span>)\r\ntitle(<span class=\"string\">'Sentiment Distribution of 100 Tweets'<\/span>)\r\nxlabel(<span class=\"string\">'Sentiment Score'<\/span>)\r\nylabel(<span class=\"string\">'# Tweets'<\/span>)\r\n<\/pre><pre class=\"codeoutput\">Amazon NSR  :  0.84\r\nHachette NSR:  0.58\r\nBoth NSR    : -0.30\r\n\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014\/AnalyzeTwitter_01.png\" alt=\"\"> <h4>Processing Tweets for Content Visualization<a name=\"2f9c9396-fd30-4764-a003-736abcc9d67c\"><\/a><\/h4><p><tt>processTweets<\/tt> also has a function <tt>tokenize<\/tt> that parses tweets to calculate the word count.<\/p><pre class=\"codeinput\"><span class=\"comment\">% tokenize tweets with |tokenize| method of |processTweets|<\/span>\r\n[words, dict] = processTweets.tokenize(bothTweets,stopwordsURL);\r\n<span class=\"comment\">% create a dictionary of unique words<\/span>\r\ndict = unique(dict);\r\n<span class=\"comment\">% create a word count matrix<\/span>\r\n[~,tdf] = processTweets.getTFIDF(words,dict);\r\n\r\n<span class=\"comment\">% plot the word count<\/span>\r\nfigure\r\nplot(1:length(dict),sum(tdf),<span class=\"string\">'b.'<\/span>)\r\nxlabel(<span class=\"string\">'Word Indices'<\/span>)\r\nylabel(<span class=\"string\">'Word Count'<\/span>)\r\ntitle(<span class=\"string\">'Words contained in the tweets'<\/span>)\r\n<span class=\"comment\">% annotate high frequency words<\/span>\r\nannotated = find(sum(tdf)&gt;= 10);\r\njitter = 6*rand(1,length(annotated))-3;\r\n<span class=\"keyword\">for<\/span> i = 1:length(annotated)\r\n    text(annotated(i)+3, <span class=\"keyword\">...<\/span>\r\n        sum(tdf(:,annotated(i)))+jitter(i),dict{annotated(i)})\r\n<span class=\"keyword\">end<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014\/AnalyzeTwitter_02.png\" alt=\"\"> <h4>Get the Profile of Top 5 Users<a name=\"b2caf181-37b5-4969-b324-e08841fbee72\"><\/a><\/h4><p>Twitty also supports the 'users\/show' API to retrieve user profile information. Let's get the profile of the top 5 users based on the follower count.<\/p><pre class=\"codeinput\"><span class=\"comment\">% sort the user table by follower count in descending order<\/span>\r\n[~,order] = sortrows(bothUsers,<span class=\"string\">'Followers'<\/span>,<span class=\"string\">'descend'<\/span>);\r\n<span class=\"comment\">% select top 5 users<\/span>\r\ntop5users = bothUsers(order(1:5),[3,1,5]);\r\n<span class=\"comment\">% add a column to store the profile<\/span>\r\ntop5users.Description = repmat({<span class=\"string\">''<\/span>},height(top5users),1);\r\n<span class=\"comment\">% retrieve user profile for each user<\/span>\r\n<span class=\"keyword\">for<\/span> i = 1:5\r\n    userInfo = tw.usersShow(<span class=\"string\">'user_id'<\/span>, top5users.Id(i));\r\n    <span class=\"keyword\">if<\/span> ~isempty(userInfo{1}.description)\r\n        top5users.Description{i} = userInfo{1}.description;\r\n    <span class=\"keyword\">end<\/span>\r\n<span class=\"keyword\">end<\/span>\r\n<span class=\"comment\">% print the result<\/span>\r\ndisp(top5users(:,2:end))\r\n<\/pre><pre class=\"codeoutput\">          Name          Followers\r\n    ________________    _________\r\n    'Daton L Fluker'    73578    \r\n    'WellbeingVigor'    22224    \r\n    'E-Book Update'     10870    \r\n    'Michael Rosa'      10297    \r\n    'Net Tech News'      7487    \r\n\r\n                                    Description                                \r\n    ___________________________________________________________________________\r\n    '#Horror #Novelist of Death Keeper's Biological Wasteland, Finished Cri...'\r\n    'Writer  - 10 years .here, Incurable music enthusiast #'                   \r\n    ''                                                                         \r\n    ''                                                                         \r\n    'Latest internet and technology news headlines from news sources around...'\r\n<\/pre><h4>Streaming API for High Volume Real Time Tweets<a name=\"a4f55b8e-89ba-4749-b2c1-38d368be7744\"><\/a><\/h4><p>If you need more than 100 tweets to work with, then your only option is to use the Streaming API which provides access to the sampled Twitter fire hose in real time. That also means you need to access the tweets that are currently active. You typically start with a trending topic from a specific location.<\/p><p>You get local trends by specifying the geography with WOEID (Where On Earth ID), available at <a href=\"http:\/\/woeid.rosselliot.co.nz\/\">WOEID Lookup<\/a>.<\/p><pre>uk_woeid = '23424975'; % UK\r\nuk_trends = tw.trendsPlace(uk_woeid);\r\nuk_trends = cellfun(@(x) x.name, uk_trends{1}.trends, 'UniformOutput',false)';<\/pre><p>Once you have the current trends (or download them from <a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014\/uk_data.mat\">here<\/a>), you can use the Streaming API to retrieve the tweets that mention the trending topic. When you specify an output function with Twitty, the data is store within Twitty. Twitty will process incoming tweets up to the sample size specified, and process data by the batch size specified.<\/p><pre>tw.outFcn = @saveTweets; % output function\r\ntw.sampleSize = 1000;  % default 1000\r\ntw.batchSize = 1; % default 20\r\ntic;\r\ntw.filterStatuses('track',uk_trends{1}); % Streaming API call\r\ntoc\r\nuk_trend_data = tw.data; % save the data<\/pre><pre class=\"codeinput\"><span class=\"comment\">% reload the previously saved search result for 4 trending topics in the UK<\/span>\r\nload(<span class=\"string\">'uk_data.mat'<\/span>)\r\n\r\n<span class=\"comment\">% plot<\/span>\r\nfigure\r\n<span class=\"keyword\">for<\/span> i = 1:4\r\n    <span class=\"comment\">% process tweets<\/span>\r\n    [users,tweets] = processTweets.extract(uk_data(i).statuses);\r\n\r\n    <span class=\"comment\">% get who are mentioned in retweets<\/span>\r\n    retweeted = tweets.Mentions(tweets.isRT);\r\n    retweeted = retweeted(~cellfun(<span class=\"string\">'isempty'<\/span>,retweeted));\r\n    [screen_names,~,idx] = unique(retweeted);\r\n    count = accumarray(idx,1);\r\n    retweeted = table(screen_names,count,<span class=\"string\">'VariableNames'<\/span>,{<span class=\"string\">'Screen_Name'<\/span>,<span class=\"string\">'Count'<\/span>});\r\n\r\n    <span class=\"comment\">% get the users who were mentioned in retweets<\/span>\r\n    match = ismember(users.Screen_Name,retweeted.Screen_Name);\r\n    retweetedUsers = sortrows(users(match,:),<span class=\"string\">'Screen_Name'<\/span>);\r\n    match = ismember(retweeted.Screen_Name,retweetedUsers.Screen_Name);\r\n    retweetedUsers.Retweeted_Count = retweeted.Count(match);\r\n    [~,order] = sortrows(retweetedUsers,<span class=\"string\">'Retweeted_Count'<\/span>,<span class=\"string\">'descend'<\/span>);\r\n\r\n    <span class=\"comment\">% plot each topic<\/span>\r\n    subplot(2,2,i)\r\n    scatter(retweetedUsers.Followers(order),<span class=\"keyword\">...<\/span>\r\n        retweetedUsers.Retweeted_Count(order),retweetedUsers.Freq(order)*50,<span class=\"keyword\">...<\/span>\r\n        retweetedUsers.Freq(order),<span class=\"string\">'fill'<\/span>)\r\n\r\n    <span class=\"keyword\">if<\/span> ismember(i, [1,2])\r\n        ylim([-20,90]); xpos = 2; ypos1 = 50; ypos2 = 40;\r\n    <span class=\"keyword\">elseif<\/span> i == 3\r\n        ylim([-1,7])\r\n        xlabel(<span class=\"string\">'Follower Count (Log Scale)'<\/span>)\r\n        xpos = 1010; ypos1 = 0; ypos2 = -1;\r\n    <span class=\"keyword\">else<\/span>\r\n        ylim([-5,23])\r\n        xlabel(<span class=\"string\">'Follower Count (Log Scale)'<\/span>)\r\n        xpos = 110; ypos1 = 20; ypos2 = 17;\r\n    <span class=\"keyword\">end<\/span>\r\n\r\n    <span class=\"comment\">% set x axis to log scale<\/span>\r\n    set(gca, <span class=\"string\">'XScale'<\/span>, <span class=\"string\">'log'<\/span>)\r\n\r\n    <span class=\"keyword\">if<\/span> ismember(i, [1,3])\r\n        ylabel(<span class=\"string\">'Retweeted Count'<\/span>)\r\n    <span class=\"keyword\">end<\/span>\r\n    title(sprintf(<span class=\"string\">'UK Tweets for: \"%s\"'<\/span>,uk_data(i).query.name))\r\n<span class=\"keyword\">end<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014\/AnalyzeTwitter_03.png\" alt=\"\"> <h4>Save an Edge List for Social Graph Visualization<a name=\"50f871fc-d8fd-4e93-8041-f16c853291fe\"><\/a><\/h4><p>Gephi imports an edge list in CSV format. I added a new method <tt>saveEdgeList<\/tt> to <tt>processTweet<\/tt> that saves the screen names of the users as <tt>source<\/tt> and the hashtags and screen names they mention in their tweets as <tt>target<\/tt> in a &lt;https:\/\/gephi.org\/users\/supported-graph-formats\/csv-format\/ Gephi-ready CSV file.<\/p><pre class=\"codeinput\">processTweets.saveEdgeList(uk_data(1).statuses,<span class=\"string\">'edgeList.csv'<\/span>);\r\n<\/pre><pre class=\"codeoutput\">File \"edgeList.csv\" was successfully saved.\r\n\r\n<\/pre><h4>Closing<a name=\"18cac743-8cff-4184-8213-ce18753c3147\"><\/a><\/h4><p>It is quite easy to get started with Twitter Analytics with MATLAB and hopefully you got the taste of what kind of analyses are possible.<\/p><p>We only scratched the surface. Twitter offers many of the most interesting opportunities for data analytics. How would you use Twitter Analytics? Check out some examples from <a href=\"http:\/\/journals.plos.org\/plosone\/search?from=globalSimpleSearch&filterJournals=PLoSONE&q=twitter&x=0&y=0\">this search result from PLOS ONE<\/a> that list various papers that used Twitter for their study.  Tell us about your Twitty experiences <a href=\"https:\/\/blogs.mathworks.com\/loren\/?p=920#respond\">here<\/a>.<\/p><script language=\"JavaScript\"> <!-- \r\n    function grabCode_72eef71aff1940849dd5f518c478b8af() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='72eef71aff1940849dd5f518c478b8af ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 72eef71aff1940849dd5f518c478b8af';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2014 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_72eef71aff1940849dd5f518c478b8af()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2014a<br><\/p><\/div><!--\r\n72eef71aff1940849dd5f518c478b8af ##### SOURCE BEGIN #####\r\n%% Analyzing Twitter with MATLAB\r\n% Whatever your opinion of social media these days, there is no denying \r\n% it is now an integral part of our digital life. So much so, that social \r\n% media metrics are now considered part of \r\n% <http:\/\/en.wikipedia.org\/wiki\/Altmetrics altmetrics>, an alternative to\r\n% the established metrics such as citations to measure the impact of \r\n% scientific papers. \r\n%\r\n% Today's guest blogger, Toshi, will show you how to access the Twitter API\r\n% and analyze tweets with MATLAB.\r\n%\r\n%% Why Twitter\r\n% Twitter is a good starting point for social media analysis because \r\n% people openly share their opinions to the general public. This is very \r\n% different from Facebook, where social interactions are often private.\r\n% In this post, I would like to share simple examples of sentiment\r\n% analysis and social graph visualization using Twitter's Search and \r\n% Streaming APIs.\r\n%\r\n% The first part of this post discusses analysis with Twitter, and the\r\n% latter part shows the code that computes and creates plots, like those\r\n% shown earlier.\r\n%\r\n%% Sentiment Analysis\r\n% One of the very common analyses you can perform on a large number of\r\n% tweets is sentiment analysis. Sentiment is scored based on the words\r\n% contained in a tweet. If you manage a brand or political campaign, for\r\n% example, it may be important to keep track of your popularity, and\r\n% sentiment analysis provides a convenient way to take the pulse of the\r\n% tweeting public. Here is an example of sentiment analysis between\r\n% <http:\/\/www.theatlantic.com\/business\/archive\/2014\/05\/how-the-amazon-hachette-fight-could-shape-the-future-of-ideas\/371756\/\r\n% Amazon and Hachette> as of this writing, based on 100 tweets collected\r\n% via the Twitter Search API.\r\n%\r\n% <<sentimentAnalysis.png>>\r\n% \r\n% The sentiment distributions are nearly identical between the two brands,\r\n% but you can see that tweets mentioning both have clearly skewed to the\r\n% negative, since the news is about a war between Amazon and a publisher\r\n% over ebook profit margin. Is there a single metric we can use to make\r\n% this comparison easier? That's where _Net Sentiment Rate (NSR)_ comes in.\r\n%\r\n%  NSR = (Positive Tweets-Negative Tweets)\/Total\r\n%\r\n% Here is the result. You could keep taking this measurement periodically \r\n% for ongoing sentiment monitoring, if interested. Perhaps you may discover\r\n% that NSR is correlated to their stock prices!\r\n%\r\n%  Amazon NSR  :  0.84\r\n%  Hachette NSR:  0.58\r\n%  Both NSR    : -0.30\r\n%\r\n% And lastly, but not in the least, did sentiment scoring actually work?\r\n% Check out the top 5 positive and negative tweets for Hachette for your\r\n% own assessment.\r\n%  \r\n%                                 Top 5 positive tweets                                   \r\n%     ___________________________________________________________________________\r\n% \r\n%     '@deckchairs @OccupyMyCat @aworkinglibrary but I think Hachette artists...'\r\n%     '@emzleb Hachette has Rowling so they hold a lot of cards (A LOT of car...'\r\n%     'Amazon Confirms Hachette Spat Is To \"Get a Better Deal\" http:\/\/t.co\/Ka...'\r\n%     '@shaunduke @DarkMatterzine Yeah, Gollancz is owned by Orion Publishing...'\r\n%     'MUST READ Book publisher Hachette says working to resolve Amazon dispu...'\r\n% \r\n%  \r\n%                                 Top 5 negative tweets                                   \r\n%     ___________________________________________________________________________\r\n% \r\n%     'Reading into the Amazon vs. Hachette battle - May 28 - The war between...'\r\n%     '#Vtech Reading into the Amazon vs. Hachette battle - May 28 - The war ...'\r\n%     '#Vbnss Reading into the Amazon vs. Hachette battle - May 28 - The war ...'\r\n%     'RT @text_publishing: Amazon war with Hachette over ebook profit margin...'\r\n%     'RT @text_publishing: Amazon war with Hachette over ebook profit margin...'\r\n%\r\n%% Tweet Content Visualization\r\n% What were the main themes they tweeted about when those users mentioned\r\n% both Amazon and Hachette? The word count plot shows that mostly those \r\n% tweets repeated the news headlines like \u00e2\u20ac\u0153Amazon admits dispute (with) \r\n% Hachette\u00e2\u20ac\ufffd, perhaps with some commentary - showing that Twitter was being \r\n% used for news amplification. \r\n%\r\n% <<content.png>>\r\n% \r\n%% Who Tweeted the News?\r\n% The 100 tweets collected came from 86 users. So on average each user only\r\n% tweeted 1.16 times. Instead of frequency, let's find out who has a large\r\n% number of followers (an indicator that they may be influential) and \r\n% check their profile. It appears that 2 or 3 out of the 5 top users\r\n% (based on follower count) are writers, and others are news syndication\r\n% services. \r\n% \r\n%           Name          Followers                        Description                     \r\n%     ________________    _________    ____________________________________________________\r\n% \r\n%     'Daton L Fluker'    73578        '#Horror #Novelist of Death Keeper's Biological Wast...'\r\n%     'WellbeingVigor'    22224        'Writer  - 10 years .here, Incurable music enthusiast #'\r\n%     'E-Book Update'     10870        ''                                       \r\n%     'Michael Rosa'      10297        ''                                          \r\n%     'Net Tech News'      7487        'Latest internet and technology news headlines from ...'\r\n%\r\n%% Does Follower Count Really Matter? Going Viral on Twitter\r\n% In the previous section, we checked out the top 5 users based on their\r\n% follower count. The assumption was that, if you have a large number \r\n% of followers, you are considered more influential because more people \r\n% may see your tweets. \r\n% \r\n% Now let's test this assumption. For that I need more than 100 tweets. \r\n% So I collected a new batch of data - 1000 tweets from 4 trending topics \r\n% from the UK, and plotted the users based on their follower counts vs. how \r\n% often their tweets got retweeted. The size (and the color) of the bubbles \r\n% show how often those users tweeted. \r\n% \r\n% <<bubblechart.png>>\r\n% \r\n% It looks like you do need some base number of followers to make it to\r\n% the national level, but the correlation between the follower counts to\r\n% the frequency of getting retweeted looks weak. Those charts look like\r\n% different stages of viral diffusion - the top two charts clearly\r\n% show one user broke away from the rest of the crowd, and in that process\r\n% they may have also gained more followers. The bottom two charts show\r\n% a number of users competing for attention but no one has a clear breakout \r\n% yet. If this was an animation, it may look like boiling water. Is anyone \r\n% interested in analyzing whether this is indeed how a tweet goes viral?\r\n%\r\n%%% Visualizing the Retweet Social Graph\r\n% Retweeting of one user's tweet by others creates a network of\r\n% relationships that can be represented as a social graph. We can visualize\r\n% such relationship with a popular social networking analysis tool\r\n% <https:\/\/gephi.org\/ Gephi>.\r\n%\r\n% \"I Can't Sing\" Social Graph <https:\/\/blogs.mathworks.com\/loren\/2014\/ICantSing.png Larger>\r\n%\r\n% <<ICantSing-sm.png>> \r\n% \r\n% \"#InABlackHousehold\" Social Graph <https:\/\/blogs.mathworks.com\/loren\/2014\/InABlackHousehold.png Larger>\r\n%\r\n% <<InABlackHousehold-sm.png>> \r\n%\r\n% You can see that, in the first case, two users formed large clusters of \r\n% people retweeting their tweets, and everyone else was dwarfed. In the \r\n% second case, we also see two dominant users, but they have not yet \r\n% formed a large scale cluster.  \r\n%\r\n%% Getting Started with Twitter using Twitty\r\n% Now that you have seen a simple analysis I did with Twitter, it is time\r\n% to share how I did it in MATLAB. To get started with Twitter, you need to\r\n% <https:\/\/developer.twitter.com\/en\/docs\/basics\/authentication\/guides\/access-tokens get your \r\n% developer credentials>. You also need Twitty by Vladimir Bondarenko.\r\n% It is simple to use and comes with excellent documentation. \r\n% \r\n% # Create a <https:\/\/twitter.com\/ Twitter  account> if you do not already \r\n% have one\r\n% # Create a  <https:\/\/apps.twitter.com\/ Twitter app> to obtain developer\r\n% credentials\r\n% # Download and install \r\n% <https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/34837-twitty Twitty>\r\n% from the FileExchange, along with the \r\n% <https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/20565-json-parser\r\n% JSON Parser> and optionally \r\n% <http:\/\/www.mathworks.co.uk\/matlabcentral\/fileexchange\/33381-jsonlabREPLACE_WITH_DASH_DASHa-toolbox-to-encode-decode-json-files-in-matlab-octave\r\n% JSONLab>\r\n% # Create a structure array to store your credentials for Twitty \r\n%\r\n% Let's search for tweets that mention |'amazon'| and |'hachette'|.\r\n\r\n% a sample structure array to store the credentials\r\ncreds = struct('ConsumerKey','your-consumer-key-here',...\r\n    'ConsumerSecret','your-consumer-secret-here',...\r\n    'AccessToken','your-token-here',...\r\n    'AccessTokenSecret','your-token-secret-here');\r\n\r\n% set up a Twitty object\r\naddpath twitty_1.1.1; % Twitty\r\naddpath parse_json; % Twitty's default json parser\r\naddpath jsonlab; % I prefer JSONlab, however. \r\nload('creds.mat') % load my real credentials\r\ntw = twitty(creds); % instantiate a Twitty object \r\ntw.jsonParser = @loadjson; % specify JSONlab as json parser\r\n\r\n% search for English tweets that mention 'amazon' and 'hachette'\r\namazon = tw.search('amazon','count',100,'include_entities','true','lang','en');\r\nhachette = tw.search('hachette','count',100,'include_entities','true','lang','en');\r\nboth = tw.search('amazon hachette','count',100,'include_entities','true','lang','en');\r\n\r\n%% Processing Tweets and Scoring Sentiments\r\n% Twitty stores tweets in structure array created from the API response in\r\n% JSON format. I prefer using a\r\n% <https:\/\/www.mathworks.com\/help\/matlab\/tables.html table> when it comes to\r\n% working with heterogeneous data containing a mix of numbers and text. I\r\n% wrote some code,\r\n% <https:\/\/blogs.mathworks.com\/images\/loren\/2014\/processTweets.m\r\n% |processTweets|>, to convert structure arrays into tables and compute\r\n% sentiment scores.  You can find the Amazon-Hachette data file\r\n% <https:\/\/blogs.mathworks.com\/images\/loren\/2014\/amazonHachette.mat here>.\r\n%\r\n% For sentiment analysis, I used \r\n% <http:\/\/www2.imm.dtu.dk\/pubdb\/views\/publication_details.php?id=6010 \r\n% AFINN>, along\r\n% with <http:\/\/www.textfixer.com\/resources\/common-english-words.txt a list\r\n% of English stop words> so that we don't count frequent common words like\r\n% \"a\" or \"the\". \r\n\r\n% load supporting data for text processing\r\nscoreFile = 'AFINN\/AFINN-111.txt';\r\nstopwordsURL ='http:\/\/www.textfixer.com\/resources\/common-english-words.txt';\r\n% load previously saved data\r\nload amazonHachette.mat\r\n\r\n% process the structure array with a utility method |extract|\r\n[amazonUsers,amazonTweets] = processTweets.extract(amazon);\r\n% compute the sentiment scores with |scoreSentiment|\r\namazonTweets.Sentiment = processTweets.scoreSentiment(amazonTweets, ...\r\n    scoreFile,stopwordsURL);\r\n\r\n% repeat the process for hachette\r\n[hachetteUsers,hachetteTweets] = processTweets.extract(hachette);\r\nhachetteTweets.Sentiment = processTweets.scoreSentiment(hachetteTweets, ...\r\n    scoreFile,stopwordsURL);\r\n\r\n% repeat the process for tweets containing both\r\n[bothUsers,bothTweets] = processTweets.extract(both);\r\nbothTweets.Sentiment = processTweets.scoreSentiment(bothTweets, ...\r\n    scoreFile,stopwordsURL);\r\n\r\n% calculate and print NSRs\r\namazonNSR = (sum(amazonTweets.Sentiment>=0) ...\r\n    -sum(amazonTweets.Sentiment<0)) ...\r\n    \/height(amazonTweets);\r\nhachetteNSR = (sum(hachetteTweets.Sentiment>=0) ...\r\n    -sum(hachetteTweets.Sentiment<0)) ...\r\n    \/height(hachetteTweets);\r\nbothNSR = (sum(bothTweets.Sentiment>=0) ...\r\n    -sum(bothTweets.Sentiment<0)) ...\r\n    \/height(bothTweets);\r\nfprintf('Amazon NSR  :  %.2f\\n',amazonNSR)\r\nfprintf('Hachette NSR:  %.2f\\n',hachetteNSR)\r\nfprintf('Both NSR    : %.2f\\n\\n',bothNSR)\r\n\r\n% plot the sentiment histogram of two brands\r\nbinranges = min([amazonTweets.Sentiment; ...\r\n    hachetteTweets.Sentiment; ...\r\n    bothTweets.Sentiment]): ...\r\n    max([amazonTweets.Sentiment; ...\r\n    hachetteTweets.Sentiment; ...\r\n    bothTweets.Sentiment]);\r\nbincounts = [histc(amazonTweets.Sentiment,binranges)...\r\n    histc(hachetteTweets.Sentiment,binranges)...\r\n    histc(bothTweets.Sentiment,binranges)];\r\nfigure\r\nbar(binranges,bincounts,'hist')\r\nlegend('Amazon','Hachette','Both','Location','Best')\r\ntitle('Sentiment Distribution of 100 Tweets')\r\nxlabel('Sentiment Score')\r\nylabel('# Tweets')\r\n\r\n%% Processing Tweets for Content Visualization\r\n% |processTweets| also has a function\r\n% |tokenize| that parses tweets to calculate the word count.\r\n\r\n% tokenize tweets with |tokenize| method of |processTweets|\r\n[words, dict] = processTweets.tokenize(bothTweets,stopwordsURL);\r\n% create a dictionary of unique words\r\ndict = unique(dict);\r\n% create a word count matrix\r\n[~,tdf] = processTweets.getTFIDF(words,dict);\r\n\r\n% plot the word count\r\nfigure\r\nplot(1:length(dict),sum(tdf),'b.')\r\nxlabel('Word Indices')\r\nylabel('Word Count')\r\ntitle('Words contained in the tweets')\r\n% annotate high frequency words\r\nannotated = find(sum(tdf)>= 10);\r\njitter = 6*rand(1,length(annotated))-3;\r\nfor i = 1:length(annotated)\r\n    text(annotated(i)+3, ...\r\n        sum(tdf(:,annotated(i)))+jitter(i),dict{annotated(i)})\r\nend\r\n\r\n%% Get the Profile of Top 5 Users\r\n% Twitty also supports the 'users\/show' API to retrieve user profile\r\n% information. Let's get the profile of the top 5 users based on the\r\n% follower count.\r\n\r\n% sort the user table by follower count in descending order\r\n[~,order] = sortrows(bothUsers,'Followers','descend');\r\n% select top 5 users\r\ntop5users = bothUsers(order(1:5),[3,1,5]);\r\n% add a column to store the profile\r\ntop5users.Description = repmat({''},height(top5users),1);\r\n% retrieve user profile for each user\r\nfor i = 1:5\r\n    userInfo = tw.usersShow('user_id', top5users.Id(i));\r\n    if ~isempty(userInfo{1}.description)\r\n        top5users.Description{i} = userInfo{1}.description;\r\n    end\r\nend\r\n% print the result\r\ndisp(top5users(:,2:end))\r\n\r\n%% Streaming API for High Volume Real Time Tweets\r\n% If you need more than 100 tweets to work with, then your only\r\n% option is to use the Streaming API which provides access to the sampled \r\n% Twitter fire hose in real time. That also means you need to access the\r\n% tweets that are currently active. You typically start with a trending\r\n% topic from a specific location. \r\n%\r\n% You get local trends by specifying the geography with WOEID (Where On \r\n% Earth ID), available at <http:\/\/woeid.rosselliot.co.nz\/ WOEID Lookup>.\r\n%\r\n%  uk_woeid = '23424975'; % UK\r\n%  uk_trends = tw.trendsPlace(uk_woeid);\r\n%  uk_trends = cellfun(@(x) x.name, uk_trends{1}.trends, 'UniformOutput',false)';\r\n%\r\n% Once you have the current trends (or download them from\r\n% <https:\/\/blogs.mathworks.com\/images\/loren\/2014\/uk_data.mat here>), you can\r\n% use the Streaming API to retrieve the tweets that mention the trending\r\n% topic. When you specify an output function with Twitty, the data is store\r\n% within Twitty. Twitty will process incoming tweets up to the sample size\r\n% specified, and process data by the batch size specified.\r\n%\r\n%  tw.outFcn = @saveTweets; % output function\r\n%  tw.sampleSize = 1000;  % default 1000 \r\n%  tw.batchSize = 1; % default 20 \r\n%  tic;\r\n%  tw.filterStatuses('track',uk_trends{1}); % Streaming API call\r\n%  toc\r\n%  uk_trend_data = tw.data; % save the data\r\n\r\n% reload the previously saved search result for 4 trending topics in the UK\r\nload('uk_data.mat')\r\n\r\n% plot\r\nfigure\r\nfor i = 1:4\r\n    % process tweets\r\n    [users,tweets] = processTweets.extract(uk_data(i).statuses);\r\n    \r\n    % get who are mentioned in retweets\r\n    retweeted = tweets.Mentions(tweets.isRT);\r\n    retweeted = retweeted(~cellfun('isempty',retweeted));\r\n    [screen_names,~,idx] = unique(retweeted);\r\n    count = accumarray(idx,1);\r\n    retweeted = table(screen_names,count,'VariableNames',{'Screen_Name','Count'});\r\n\r\n    % get the users who were mentioned in retweets\r\n    match = ismember(users.Screen_Name,retweeted.Screen_Name);\r\n    retweetedUsers = sortrows(users(match,:),'Screen_Name');\r\n    match = ismember(retweeted.Screen_Name,retweetedUsers.Screen_Name);\r\n    retweetedUsers.Retweeted_Count = retweeted.Count(match);\r\n    [~,order] = sortrows(retweetedUsers,'Retweeted_Count','descend');\r\n    \r\n    % plot each topic\r\n    subplot(2,2,i)\r\n    scatter(retweetedUsers.Followers(order),...\r\n        retweetedUsers.Retweeted_Count(order),retweetedUsers.Freq(order)*50,...\r\n        retweetedUsers.Freq(order),'fill')\r\n\r\n    if ismember(i, [1,2])\r\n        ylim([-20,90]); xpos = 2; ypos1 = 50; ypos2 = 40;\r\n    elseif i == 3\r\n        ylim([-1,7])\r\n        xlabel('Follower Count (Log Scale)')\r\n        xpos = 1010; ypos1 = 0; ypos2 = -1;\r\n    else\r\n        ylim([-5,23])\r\n        xlabel('Follower Count (Log Scale)')\r\n        xpos = 110; ypos1 = 20; ypos2 = 17;\r\n    end\r\n    \r\n    % set x axis to log scale\r\n    set(gca, 'XScale', 'log')\r\n    \r\n    if ismember(i, [1,3])\r\n        ylabel('Retweeted Count')\r\n    end\r\n    title(sprintf('UK Tweets for: \"%s\"',uk_data(i).query.name))\r\nend\r\n\r\n%% Save an Edge List for Social Graph Visualization\r\n% Gephi imports an edge list in CSV format. I added a new method \r\n% |saveEdgeList| to |processTweet| that saves the screen names of the \r\n% users as |source| and the hashtags and screen names they mention in \r\n% their tweets as |target| in a \r\n% <https:\/\/gephi.org\/users\/supported-graph-formats\/csv-format\/ \r\n% Gephi-ready CSV file.\r\n\r\nprocessTweets.saveEdgeList(uk_data(1).statuses,'edgeList.csv');\r\n\r\n%% Closing\r\n% It is quite easy to get started with Twitter Analytics with MATLAB \r\n% and hopefully you got the taste of what kind of analyses are possible.\r\n% \r\n% We only scratched the surface. Twitter offers many of the most \r\n% interesting opportunities for data analytics. How would you use Twitter \r\n% Analytics? Check out some examples from \r\n% <http:\/\/www.plosone.org\/search\/simple?from=globalSimpleSearch&filterJournals=PLoSONE&query=twitter&x=0&y=0 \r\n% this search result from PLOS ONE> that list various papers that used \r\n% Twitter for their study.  Tell us about your Twitty experiences\r\n% <https:\/\/blogs.mathworks.com\/loren\/?p=920#respond here>.\r\n\r\n\r\n\r\n##### SOURCE END ##### 72eef71aff1940849dd5f518c478b8af\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014\/AnalyzeTwitter_03.png\" onError=\"this.style.display ='none';\" \/><\/div><!--introduction--><p>Whatever your opinion of social media these days, there is no denying it is now an integral part of our digital life. So much so, that social media metrics are now considered part of <a href=\"http:\/\/en.wikipedia.org\/wiki\/Altmetrics\">altmetrics<\/a>, an alternative to the established metrics such as citations to measure the impact of scientific papers.... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/loren\/2014\/06\/04\/analyzing-twitter-with-matlab\/\">read more >><\/a><\/p>","protected":false},"author":39,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[33,39,61],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/920"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/comments?post=920"}],"version-history":[{"count":9,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/920\/revisions"}],"predecessor-version":[{"id":2183,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/920\/revisions\/2183"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/media?parent=920"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/categories?post=920"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/tags?post=920"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}