{"id":2209,"date":"2017-02-07T08:36:12","date_gmt":"2017-02-07T13:36:12","guid":{"rendered":"https:\/\/blogs.mathworks.com\/loren\/?p=2209"},"modified":"2017-02-01T15:57:32","modified_gmt":"2017-02-01T20:57:32","slug":"analyzing-fake-news-with-twitter","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/loren\/2017\/02\/07\/analyzing-fake-news-with-twitter\/","title":{"rendered":"Analyzing Fake News with Twitter"},"content":{"rendered":"\r\n<div class=\"content\"><!--introduction--><p>Social media has become an important part of modern life, and Twitter is again a center of focus in recent events. Today's guest blogger, <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/951521\">Toshi Takeuchi<\/a> gives us an update on how you can use MATLAB to analyze a Twitter feed.<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/fake_news.png\" alt=\"\"> <\/p><!--\/introduction--><h3>Contents<\/h3><div><ul><li><a href=\"#0930276d-508a-456e-8487-a2f1ff5ef679\">Twitter Revisited<\/a><\/li><li><a href=\"#61ffe798-b97a-4941-b1e9-197237eac62b\">Load Tweets<\/a><\/li><li><a href=\"#8f752b5d-0720-40bf-bbc7-f09824f9226d\">Short Urls<\/a><\/li><li><a href=\"#d97e4d80-fa03-4b05-8700-6f6d72a8adea\">Tokenize Tweets<\/a><\/li><li><a href=\"#33df3cf7-ff49-49b8-8f0c-0a9a469116d8\">Sentiment Analysis<\/a><\/li><li><a href=\"#3304bad1-b282-4cc5-b641-93cdf3ba0052\">What Words Appear Frequently in Tweets?<\/a><\/li><li><a href=\"#a8aa9afc-ec28-4356-8e8a-bd496f86f95e\">What Hashtags Appear Frequently in Tweets?<\/a><\/li><li><a href=\"#0b6b3d74-60d5-4e3b-9c58-9a178d575972\">Who Got Frequent Mentions in Tweets?<\/a><\/li><li><a href=\"#3f762782-10f2-464b-9266-c6dbe020c30b\">Frequently Cited Web Sites<\/a><\/li><li><a href=\"#62dddba2-a319-4def-921b-2c9fc7878e8c\">Frequently Cited Sources<\/a><\/li><li><a href=\"#ba2f6710-91a9-44a4-bff1-c71a4882456a\">Generating a Social Graph<\/a><\/li><li><a href=\"#0c36ee71-bb24-48a3-8fb3-d7940ed8d8fd\">Handling Mentions<\/a><\/li><li><a href=\"#bc45889a-d785-449f-b5f0-c7fc54d012b7\">Creating the Edge List<\/a><\/li><li><a href=\"#b82f0ed0-b4c5-443a-849c-1567e0c2a5bd\">Creating the Graph<\/a><\/li><li><a href=\"#c2458dd3-af1f-47e5-a506-5a8a2e4fadfc\">Zooming into the Largest Subgraph<\/a><\/li><li><a href=\"#87471d3f-9915-4157-8215-4e8666c08aa6\">Using Twitty<\/a><\/li><li><a href=\"#4076e05e-3c6f-414c-81e8-4eac755371b1\">Twitter Search API Example<\/a><\/li><li><a href=\"#a6fb7cfd-12b3-447f-afaa-8dc0986784be\">Twitter Trending Topic API Example<\/a><\/li><li><a href=\"#dbf110d8-6928-48ac-9021-9820bfdc97cd\">Twitter Streaming API Example<\/a><\/li><li><a href=\"#20fb1dc3-39f7-42c3-8a8d-9d0e0bb29521\">Summary - Visit Andy's Developer Zone for More<\/a><\/li><\/ul><\/div><h4>Twitter Revisited<a name=\"0930276d-508a-456e-8487-a2f1ff5ef679\"><\/a><\/h4><p>When I wrote about <a href=\"https:\/\/blogs.mathworks.com\/loren\/2014\/06\/04\/analyzing-twitter-with-matlab\">analyzing Twitter with MATLAB<\/a> back in 2014 I didn't expect that 3 years later Twitter would come to play such a huge role in politics. There have been a lot of changes in MATLAB in those years as well. Perhaps it is time to revisit this topic. We hear a lot about <a href=\"https:\/\/en.wikipedia.org\/wiki\/Fake_news_website\">fake news<\/a> since <a href=\"https:\/\/en.wikipedia.org\/wiki\/United_States_presidential_election,_2016\">the US Presidential Election of 2016<\/a>. Let's use Twitter to analyze this phenomenon. While fake news spreads mainly on Facebook, Twitter is the favorite social media platform for journalists who discuss them.<\/p><h4>Load Tweets<a name=\"61ffe798-b97a-4941-b1e9-197237eac62b\"><\/a><\/h4><p>I collected 1,000 tweets that contain the term 'fake news' using the Streaming API and saved them in <tt><a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/fake_news.mat\">fake_news.mat<\/a><\/tt>. Let's start processing tweets by looking at the top 10 users based on the followers count.<\/p><pre class=\"codeinput\">load <span class=\"string\">fake_news<\/span>                                              <span class=\"comment\">% load data<\/span>\r\nt = table;                                                  <span class=\"comment\">% initialize a table<\/span>\r\nt.names = arrayfun(@(x) x.status.user.name, <span class=\"keyword\">...<\/span><span class=\"comment\">             % get user names<\/span>\r\n    fake_news.statuses, <span class=\"string\">'UniformOutput'<\/span>, false);\r\nt.names = regexprep(t.names,<span class=\"string\">'[^a-zA-Z .,'']'<\/span>,<span class=\"string\">''<\/span>);           <span class=\"comment\">% remove non-ascii<\/span>\r\nt.screen_names = arrayfun(@(x) <span class=\"keyword\">...<\/span><span class=\"comment\">                          % get screen names<\/span>\r\n    x.status.user.screen_name, fake_news.statuses, <span class=\"string\">'UniformOutput'<\/span>, false);\r\nt.followers_count = arrayfun(@(x)  <span class=\"keyword\">...<\/span><span class=\"comment\">                      % get followers count<\/span>\r\n    x.status.user.followers_count, fake_news.statuses);\r\nt = unique(t,<span class=\"string\">'rows'<\/span>);                                       <span class=\"comment\">% remove duplicates<\/span>\r\nt = sortrows(t,<span class=\"string\">'followers_count'<\/span>, <span class=\"string\">'descend'<\/span>);               <span class=\"comment\">% rank users<\/span>\r\ndisp(t(1:10,:))                                             <span class=\"comment\">% show the table<\/span>\r\n<\/pre><pre class=\"codeoutput\">           names              screen_names       followers_count\r\n    ____________________    _________________    _______________\r\n    'Glenn Greenwald'       'ggreenwald'         7.9605e+05     \r\n    'Soledad O'Brien'       'soledadobrien'      5.6769e+05     \r\n    'Baratunde'             'baratunde'          2.0797e+05     \r\n    'Kenneth Roth'          'KenRoth'            1.9189e+05     \r\n    'Stock Trade Alerts'    'AlertTrade'         1.1921e+05     \r\n    'SokoAnalyst'           'SokoAnalyst'        1.1864e+05     \r\n    'Tactical Investor'     'saul42'                  98656     \r\n    'Vladimir Bajic'        'trend_auditor'           70502     \r\n    'Marketing Gurus'       'MarketingGurus2'         68554     \r\n    'Jillian C. York '      'jilliancyork'            53744     \r\n<\/pre><h4>Short Urls<a name=\"8f752b5d-0720-40bf-bbc7-f09824f9226d\"><\/a><\/h4><p>Until recently Twitter had a 140 character limit per tweet including links. Therefore when people embed urls in their tweets, they typically used url shortening services. To identify the actual sources, we need to get the expanded urls that those short urls point to. To do it I wrote a utility function <tt><a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/expandUrl.m\">expandUrl<\/a><\/tt> taking advantage of the new <a title=\"https:\/\/www.mathworks.com\/help\/matlab\/call-wsdl-web-services_bu9hx2b-1.html (link no longer works)\">HTTP interface<\/a> introduced in R2016b. You can see that I create <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/matlab.net.uri-class.html\">URI<\/a> and <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/matlab.net.http.requestmessage-class.html\">RequestMessage<\/a> objects and used the <tt><a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/matlab.net.http.requestmessage.send.html\">send<\/a><\/tt> method to get a <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/matlab.net.http.responsemessage-class.html\">ResponseMessage<\/a> object.<\/p><pre class=\"codeinput\">dbtype <span class=\"string\">expandUrl<\/span> <span class=\"string\">25:32<\/span>\r\n<\/pre><pre class=\"codeoutput\">\r\n25    import  matlab.net.* matlab.net.http.*                  % http interface libs\r\n26    for ii = 1:length(urls)                                 % for each url\r\n27        if contains(urls(ii),shorteners)                    % if shortened\r\n28            uri = URI(urls(ii));                            % create URI obj\r\n29            r = RequestMessage;                             % request object\r\n30            options = HTTPOptions('MaxRedirects',0);        % prevent redirect\r\n31            try                                             % try\r\n32                response = r.send(uri,options);             % send http request\r\n<\/pre><p>Let's give it a try.<\/p><pre class=\"codeinput\">expanded = char(expandUrl(<span class=\"string\">'http:\/\/trib.al\/ZQuUDNx'<\/span>));       <span class=\"comment\">% expand url<\/span>\r\ndisp([expanded(1:70) <span class=\"string\">'...'<\/span>])\r\n<\/pre><pre class=\"codeoutput\">https:\/\/hbr.org\/2017\/01\/the-u-s-medias-problems-are-much-bigger-than-f...\r\n<\/pre><h4>Tokenize Tweets<a name=\"d97e4d80-fa03-4b05-8700-6f6d72a8adea\"><\/a><\/h4><p>To get a sense of what was being discussed in those tweets and what sentiments were represented there, we need to process the text.<\/p><div><ul><li>Our first step is to turn tweets into tokens.<\/li><li>Once we have tokens, we can use them to compute sentiment scores based on lexicons like <a href=\"http:\/\/www2.imm.dtu.dk\/pubdb\/views\/publication_details.php?id=6010\">AFINN<\/a>.<\/li><li>You can also use it to visualize tweets as a word cloud.<\/li><\/ul><\/div><p>We also want to collect embedded links along the way.<\/p><pre class=\"codeinput\">delimiters = {<span class=\"string\">' '<\/span>,<span class=\"string\">'$'<\/span>,<span class=\"string\">'\/'<\/span>,<span class=\"string\">'.'<\/span>,<span class=\"string\">'-'<\/span>,<span class=\"string\">':'<\/span>,<span class=\"string\">'&amp;'<\/span>,<span class=\"string\">'*'<\/span>, <span class=\"keyword\">...<\/span><span class=\"comment\">          % remove those<\/span>\r\n    <span class=\"string\">'+'<\/span>,<span class=\"string\">'='<\/span>,<span class=\"string\">'['<\/span>,<span class=\"string\">']'<\/span>,<span class=\"string\">'?'<\/span>,<span class=\"string\">'!'<\/span>,<span class=\"string\">'('<\/span>,<span class=\"string\">')'<\/span>,<span class=\"string\">'{'<\/span>,<span class=\"string\">'}'<\/span>,<span class=\"string\">','<\/span>, <span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">'\"'<\/span>,<span class=\"string\">'&gt;'<\/span>,<span class=\"string\">'_'<\/span>,<span class=\"string\">'&lt;'<\/span>,<span class=\"string\">';'<\/span>,<span class=\"string\">'%'<\/span>,char(10),char(13)};\r\nAFINN = readtable(<span class=\"string\">'AFINN\/AFINN-111.txt'<\/span>, <span class=\"keyword\">...<\/span><span class=\"comment\">                % load score file<\/span>\r\n    <span class=\"string\">'Delimiter'<\/span>,<span class=\"string\">'\\t'<\/span>,<span class=\"string\">'ReadVariableNames'<\/span>,0);\r\nAFINN.Properties.VariableNames = {<span class=\"string\">'Term'<\/span>,<span class=\"string\">'Score'<\/span>};          <span class=\"comment\">% add var names<\/span>\r\nstopwordsURL =<span class=\"string\">'http:\/\/www.textfixer.com\/resources\/common-english-words.txt'<\/span>;\r\nstopWords = webread(stopwordsURL);                          <span class=\"comment\">% read stop words<\/span>\r\nstopWords = split(string(stopWords),<span class=\"string\">','<\/span>);                   <span class=\"comment\">% split stop words<\/span>\r\ntokens = cell(fake_news.tweetscnt,1);                       <span class=\"comment\">% cell arrray as accumulator<\/span>\r\nexpUrls = strings(fake_news.tweetscnt,1);                   <span class=\"comment\">% cell arrray as accumulator<\/span>\r\ndispUrls = strings(fake_news.tweetscnt,1);                  <span class=\"comment\">% cell arrray as accumulator<\/span>\r\nscores = zeros(fake_news.tweetscnt,1);                      <span class=\"comment\">% initialize accumulator<\/span>\r\n<span class=\"keyword\">for<\/span> ii = 1:fake_news.tweetscnt                              <span class=\"comment\">% loop over tweets<\/span>\r\n    tweet = string(fake_news.statuses(ii).status.text);     <span class=\"comment\">% get tweet<\/span>\r\n    s = split(tweet, delimiters)';                          <span class=\"comment\">% split tweet by delimiters<\/span>\r\n    s = lower(s);                                           <span class=\"comment\">% use lowercase<\/span>\r\n    s = regexprep(s, <span class=\"string\">'[0-9]+'<\/span>,<span class=\"string\">''<\/span>);                          <span class=\"comment\">% remove numbers<\/span>\r\n    s = regexprep(s,<span class=\"string\">'(http|https):\/\/[^\\s]*'<\/span>,<span class=\"string\">''<\/span>);            <span class=\"comment\">% remove urls<\/span>\r\n    s = erase(s,<span class=\"string\">'''s'<\/span>);                                     <span class=\"comment\">% remove possessive s<\/span>\r\n    s(s == <span class=\"string\">''<\/span>) = [];                                        <span class=\"comment\">% remove empty strings<\/span>\r\n    s(ismember(s, stopWords)) = [];                         <span class=\"comment\">% remove stop words<\/span>\r\n    tokens{ii} = s;                                         <span class=\"comment\">% add to the accumulator<\/span>\r\n    scores(ii) = sum(AFINN.Score(ismember(AFINN.Term,s)));  <span class=\"comment\">% add to the accumulator<\/span>\r\n    <span class=\"keyword\">if<\/span> ~isempty( <span class=\"keyword\">...<\/span><span class=\"comment\">                                        % if display_url exists<\/span>\r\n            fake_news.statuses(ii).status.entities.urls) &amp;&amp; <span class=\"keyword\">...<\/span>\r\n            isfield(fake_news.statuses(ii).status.entities.urls,<span class=\"string\">'display_url'<\/span>)\r\n        durl = fake_news.statuses(ii).status.entities.urls.display_url;\r\n        durl = regexp(durl,<span class=\"string\">'^(.*?)\\\/'<\/span>,<span class=\"string\">'match'<\/span>,<span class=\"string\">'once'<\/span>);      <span class=\"comment\">% get its domain name<\/span>\r\n        dispUrls(ii) = durl(1:end-1);                       <span class=\"comment\">% add to dipUrls<\/span>\r\n        furl = fake_news.statuses(ii).status.entities.urls.expanded_url;\r\n        furl = expandUrl(furl,<span class=\"string\">'RemoveParams'<\/span>,1);            <span class=\"comment\">% expand links<\/span>\r\n        expUrls(ii) = expandUrl(furl,<span class=\"string\">'RemoveParams'<\/span>,1);     <span class=\"comment\">% one more time<\/span>\r\n    <span class=\"keyword\">end<\/span>\r\n<span class=\"keyword\">end<\/span>\r\n<\/pre><p>Now we can create the document term matrix. We will also do the same thing for embedded links.<\/p><pre class=\"codeinput\">dict = unique([tokens{:}]);                                 <span class=\"comment\">% unique words<\/span>\r\ndomains = unique(dispUrls);                                 <span class=\"comment\">% unique domains<\/span>\r\ndomains(domains == <span class=\"string\">''<\/span>) = [];                                <span class=\"comment\">% remove empty string<\/span>\r\nlinks = unique(expUrls);                                    <span class=\"comment\">% unique links<\/span>\r\nlinks(links == <span class=\"string\">''<\/span>) = [];                                    <span class=\"comment\">% remove empty string<\/span>\r\nDTM = zeros(fake_news.tweetscnt,length(dict));              <span class=\"comment\">% Doc Term Matrix<\/span>\r\nDDM = zeros(fake_news.tweetscnt,length(domains));           <span class=\"comment\">% Doc Domain Matrix<\/span>\r\nDLM = zeros(fake_news.tweetscnt,length(links));             <span class=\"comment\">% Doc Link Matrix<\/span>\r\n<span class=\"keyword\">for<\/span> ii = 1:fake_news.tweetscnt                              <span class=\"comment\">% loop over tokens<\/span>\r\n    [words,~,idx] = unique(tokens{ii});                     <span class=\"comment\">% get uniqe words<\/span>\r\n    wcounts = accumarray(idx, 1);                           <span class=\"comment\">% get word counts<\/span>\r\n    cols = ismember(dict, words);                           <span class=\"comment\">% find cols for words<\/span>\r\n    DTM(ii,cols) = wcounts;                                 <span class=\"comment\">% unpdate DTM with word counts<\/span>\r\n    cols = ismember(domains,dispUrls(ii));                  <span class=\"comment\">% find col for domain<\/span>\r\n    DDM(ii,cols) = 1;                                       <span class=\"comment\">% increment DMM<\/span>\r\n    expanded = expandUrl(expUrls(ii));                      <span class=\"comment\">% expand links<\/span>\r\n    expanded = expandUrl(expanded);                         <span class=\"comment\">% one more time<\/span>\r\n    cols = ismember(links,expanded);                        <span class=\"comment\">% find col for link<\/span>\r\n    DLM(ii,cols) = 1;                                       <span class=\"comment\">% increment DLM<\/span>\r\n<span class=\"keyword\">end<\/span>\r\nDTM(:,ismember(dict,{<span class=\"string\">'#'<\/span>,<span class=\"string\">'@'<\/span>})) = [];                       <span class=\"comment\">% remove # and @<\/span>\r\ndict(ismember(dict,{<span class=\"string\">'#'<\/span>,<span class=\"string\">'@'<\/span>})) = [];                        <span class=\"comment\">% remove # and @<\/span>\r\n<\/pre><h4>Sentiment Analysis<a name=\"33df3cf7-ff49-49b8-8f0c-0a9a469116d8\"><\/a><\/h4><p>One of the typical analyses you perform on Twitter feed is <a href=\"https:\/\/en.wikipedia.org\/wiki\/Sentiment_analysis\">sentiment analysis<\/a>. The histogram shows, not surprisingly, that those tweets were mostly very negative. We can summarize this by the Net Sentiment Rate (NSR), which is based on the ratio of positive tweets to negative tweets.<\/p><pre class=\"codeinput\">NSR = (sum(scores &gt;= 0) - sum(scores &lt; 0)) \/ length(scores);<span class=\"comment\">% net setiment rate<\/span>\r\nfigure                                                      <span class=\"comment\">% new figure<\/span>\r\nhistogram(scores,<span class=\"string\">'Normalization'<\/span>,<span class=\"string\">'probability'<\/span>)             <span class=\"comment\">% positive tweets<\/span>\r\nline([0 0], [0 .35],<span class=\"string\">'Color'<\/span>,<span class=\"string\">'r'<\/span>);                           <span class=\"comment\">% reference line<\/span>\r\ntitle([<span class=\"string\">'Sentiment Score Distribution of \"Fake News\" '<\/span> <span class=\"keyword\">...<\/span><span class=\"comment\">   % add title<\/span>\r\n    sprintf(<span class=\"string\">'(NSR: %.2f)'<\/span>,NSR)])\r\nxlabel(<span class=\"string\">'Sentiment Score'<\/span>)                                   <span class=\"comment\">% x-axis label<\/span>\r\nylabel(<span class=\"string\">'% Tweets'<\/span>)                                          <span class=\"comment\">% y-axis label<\/span>\r\nyticklabels(string(0:5:35))                                 <span class=\"comment\">% y-axis ticks<\/span>\r\ntext(-10,.25,<span class=\"string\">'Negative'<\/span>);text(3,.25,<span class=\"string\">'Positive'<\/span>);            <span class=\"comment\">% annotate<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/twitter_revisited_01.png\" alt=\"\"> <h4>What Words Appear Frequently in Tweets?<a name=\"3304bad1-b282-4cc5-b641-93cdf3ba0052\"><\/a><\/h4><p>Now let's plot the word frequency to visualize what was discussed in those tweets. They seem to be about dominant news headlines at the time the tweets were collected.<\/p><pre class=\"codeinput\">count = sum(DTM);                                           <span class=\"comment\">% get word count<\/span>\r\nlabels = erase(dict(count &gt;= 40),<span class=\"string\">'@'<\/span>);                      <span class=\"comment\">% high freq words<\/span>\r\npos = [find(count &gt;= 40);count(count &gt;= 40)] + 0.1;         <span class=\"comment\">% x y positions<\/span>\r\nfigure                                                      <span class=\"comment\">% new figure<\/span>\r\nscatter(1:length(dict),count)                               <span class=\"comment\">% scatter plot<\/span>\r\ntext(pos(1,1),pos(2,1)+3,cellstr(labels(1)),<span class=\"keyword\">...<\/span><span class=\"comment\">             % place labels<\/span>\r\n    <span class=\"string\">'HorizontalAlignment'<\/span>,<span class=\"string\">'center'<\/span>);\r\ntext(pos(1,2),pos(2,2)-2,cellstr(labels(2)),<span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">'HorizontalAlignment'<\/span>,<span class=\"string\">'right'<\/span>);\r\ntext(pos(1,3),pos(2,3)-4,cellstr(labels(3)));\r\ntext(pos(1,3:end),pos(2,3:end),cellstr(labels(3:end)));\r\ntitle(<span class=\"string\">'Frequent Words in Tweets Mentioning Fake News'<\/span>)      <span class=\"comment\">% add title<\/span>\r\nxlabel(<span class=\"string\">'Indices'<\/span>)                                           <span class=\"comment\">% x-axis label<\/span>\r\nylabel(<span class=\"string\">' Count'<\/span>)                                            <span class=\"comment\">% y-axis label<\/span>\r\nylim([0 150])                                               <span class=\"comment\">% y-axis range<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/twitter_revisited_02.png\" alt=\"\"> <h4>What Hashtags Appear Frequently in Tweets?<a name=\"a8aa9afc-ec28-4356-8e8a-bd496f86f95e\"><\/a><\/h4><p>Hashtags that start with \"#\" are often used to identify the main theme of tweets, and we see those related to the dominant news again as you would expect.<\/p><pre class=\"codeinput\">is_hash = startsWith(dict,<span class=\"string\">'#'<\/span>) &amp; dict ~= <span class=\"string\">'#'<\/span>;               <span class=\"comment\">% get indices<\/span>\r\nhashes = erase(dict(is_hash),<span class=\"string\">'#'<\/span>);                          <span class=\"comment\">% get hashtags<\/span>\r\nhash_count = count(is_hash);                                <span class=\"comment\">% get count<\/span>\r\nlabels = hashes(hash_count &gt;= 4);                           <span class=\"comment\">% high freq tags<\/span>\r\npos = [find(hash_count &gt;= 4) + 1; <span class=\"keyword\">...<\/span><span class=\"comment\">                       % x y positions<\/span>\r\n    hash_count(hash_count &gt;= 4) + 0.1];\r\nfigure                                                      <span class=\"comment\">% new figure<\/span>\r\nscatter(1:length(hashes),hash_count)                        <span class=\"comment\">% scatter plot<\/span>\r\ntext(pos(1,1),pos(2,1)- .5,cellstr(labels(1)),<span class=\"keyword\">...<\/span><span class=\"comment\">           % place labels<\/span>\r\n    <span class=\"string\">'HorizontalAlignment'<\/span>,<span class=\"string\">'center'<\/span>);\r\ntext(pos(1,2:end-1),pos(2,2:end-1),cellstr(labels(2:end-1)));\r\ntext(pos(1,end),pos(2,end)-.5,cellstr(labels(end)),<span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">'HorizontalAlignment'<\/span>,<span class=\"string\">'right'<\/span>);\r\ntitle(<span class=\"string\">'Frequently Used Hashtags'<\/span>)                           <span class=\"comment\">% add title<\/span>\r\nxlabel(<span class=\"string\">'Indices'<\/span>)                                           <span class=\"comment\">% x-axis label<\/span>\r\nylabel(<span class=\"string\">'Count'<\/span>)                                             <span class=\"comment\">% y-axis label<\/span>\r\nylim([0 15])                                                <span class=\"comment\">% y-axis range<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/twitter_revisited_03.png\" alt=\"\"> <h4>Who Got Frequent Mentions in Tweets?<a name=\"0b6b3d74-60d5-4e3b-9c58-9a178d575972\"><\/a><\/h4><p>Twitter is also a commmunication medium and people can direct their tweets to specific users by including their screen names in the tweets starting with \"@\". These are called \"mentions\". We can see there is one particular user who got a lot of mentions.<\/p><pre class=\"codeinput\">is_ment = startsWith(dict,<span class=\"string\">'@'<\/span>) &amp; dict ~= <span class=\"string\">'@'<\/span>;               <span class=\"comment\">% get indices<\/span>\r\nmentions = erase(dict(is_ment),<span class=\"string\">'@'<\/span>);                        <span class=\"comment\">% get mentions<\/span>\r\nment_count = count(is_ment);                                <span class=\"comment\">% get count<\/span>\r\nlabels = mentions(ment_count &gt;= 10);                        <span class=\"comment\">% high freq mentions<\/span>\r\npos = [find(ment_count &gt;= 10) + 1; <span class=\"keyword\">...<\/span><span class=\"comment\">                      % x y positions<\/span>\r\n    ment_count(ment_count &gt;= 10) + 0.1];\r\nfigure                                                      <span class=\"comment\">% new figure<\/span>\r\nscatter(1:length(mentions),ment_count)                      <span class=\"comment\">% scatter plot<\/span>\r\ntext(pos(1,:),pos(2,:),cellstr(labels));                    <span class=\"comment\">% place labels<\/span>\r\ntitle(<span class=\"string\">'Frequent Mentions'<\/span>)                                  <span class=\"comment\">% add title<\/span>\r\nxlabel(<span class=\"string\">'Indices'<\/span>)                                           <span class=\"comment\">% x-axis label<\/span>\r\nylabel(<span class=\"string\">'Count'<\/span>)                                             <span class=\"comment\">% y-axis label<\/span>\r\nylim([0 100])                                               <span class=\"comment\">% y-axis range<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/twitter_revisited_04.png\" alt=\"\"> <h4>Frequently Cited Web Sites<a name=\"3f762782-10f2-464b-9266-c6dbe020c30b\"><\/a><\/h4><p>You can also embed a link in a tweet, usually for citing sources and directing people to get more details from those sources. This tends to show where the original information came from.<\/p><p>Twitter was the most frequently cited source. This was interesting to me. Usually, if you want to cite other tweets, you retweet them. When you retweet, the original user gets a credit. By embedding the link without retweeting it, people circumvent this mechanism. Very curious.<\/p><pre class=\"codeinput\">count = sum(DDM);                                           <span class=\"comment\">% get domain count<\/span>\r\nlabels = domains(count &gt; 5);                                <span class=\"comment\">% high freq citations<\/span>\r\npos = [find(count &gt; 5) + 1;count(count &gt; 5) + 0.1];         <span class=\"comment\">% x y positions<\/span>\r\nfigure                                                      <span class=\"comment\">% new figure<\/span>\r\nscatter(1:length(domains),count)                            <span class=\"comment\">% scatter plot<\/span>\r\ntext(pos(1,:),pos(2,:),cellstr(labels));                    <span class=\"comment\">% place labels<\/span>\r\ntitle(<span class=\"string\">'Frequently Cited Web Sites'<\/span>)                         <span class=\"comment\">% add title<\/span>\r\nxlabel(<span class=\"string\">'Indices'<\/span>)                                           <span class=\"comment\">% x-axis label<\/span>\r\nylabel(<span class=\"string\">'Count'<\/span>)                                             <span class=\"comment\">% y-axis label<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/twitter_revisited_05.png\" alt=\"\"> <h4>Frequently Cited Sources<a name=\"62dddba2-a319-4def-921b-2c9fc7878e8c\"><\/a><\/h4><p>You can also see that many of the web sites are for url shortening services. Let's find out the real urls linked from those short urls.<\/p><pre class=\"codeinput\">count = sum(DLM);                                           <span class=\"comment\">% get domain count<\/span>\r\nlabels = links(count &gt;= 15);                                <span class=\"comment\">% high freq citations<\/span>\r\npos = [find(count &gt;= 15) + 1;count(count &gt;= 15)];           <span class=\"comment\">% x y positions<\/span>\r\nfigure                                                      <span class=\"comment\">% new figure<\/span>\r\nscatter(1:length(links),count)                              <span class=\"comment\">% scatter plot<\/span>\r\ntext(ones(size(pos(1,:))),pos(2,:)-2,cellstr(labels));      <span class=\"comment\">% place labels<\/span>\r\ntitle(<span class=\"string\">'Frequently Cited Sources '<\/span>)                          <span class=\"comment\">% add title<\/span>\r\nxlabel(<span class=\"string\">'Indices'<\/span>)                                           <span class=\"comment\">% x-axis label<\/span>\r\nylabel(<span class=\"string\">'Count'<\/span>)                                             <span class=\"comment\">% y-axis label<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/twitter_revisited_06.png\" alt=\"\"> <h4>Generating a Social Graph<a name=\"ba2f6710-91a9-44a4-bff1-c71a4882456a\"><\/a><\/h4><p>Now let's think of a way to see the association between users to the entities included in their tweets to reveal their relationships. We have a matrix of words by tweets, and we can convert it into a matrix of users vs. entities, such as hashtags, mentions and links.<\/p><pre class=\"codeinput\">users = arrayfun(@(x) x.status.user.screen_name, <span class=\"keyword\">...<\/span><span class=\"comment\">        % screen names<\/span>\r\n    fake_news.statuses, <span class=\"string\">'UniformOutput'<\/span>, false);\r\nuniq = unique(users);                                       <span class=\"comment\">% remove duplicates<\/span>\r\ncombo = [DTM DLM];                                          <span class=\"comment\">% combine matrices<\/span>\r\nUEM = zeros(length(uniq),size(combo,2));                    <span class=\"comment\">% User Entity Matrix<\/span>\r\n<span class=\"keyword\">for<\/span> ii = 1:length(uniq)                                     <span class=\"comment\">% for unique user<\/span>\r\n    UEM(ii,:) = sum(combo(ismember(users,uniq(ii)),:),1);   <span class=\"comment\">% sum cols<\/span>\r\n<span class=\"keyword\">end<\/span>\r\ncols = is_hash | is_ment;                                   <span class=\"comment\">% hashtags, mentions<\/span>\r\ncols = [cols true(1,length(links))];                        <span class=\"comment\">% add links<\/span>\r\nUEM = UEM(:,cols);                                          <span class=\"comment\">% select those cols<\/span>\r\nent = dict(is_hash | is_ment);                              <span class=\"comment\">% select entities<\/span>\r\nent = [ent links'];                                         <span class=\"comment\">% add links<\/span>\r\n<\/pre><h4>Handling Mentions<a name=\"0c36ee71-bb24-48a3-8fb3-d7940ed8d8fd\"><\/a><\/h4><p>Some of the mentions are for users of those tweets, and others are not. When two users mention another, that forms an user-user edge, rather than user-entity edge. To map such edges correctly, we want to treat mentioned users separately.<\/p><pre class=\"codeinput\">ment_users = uniq(ismember(uniq,mentions));                 <span class=\"comment\">% mentioned users<\/span>\r\nis_ment = ismember(ent,<span class=\"string\">'@'<\/span> + string(ment_users));           <span class=\"comment\">% their mentions<\/span>\r\nent(is_ment) = erase(ent(is_ment),<span class=\"string\">'@'<\/span>);                     <span class=\"comment\">% remove @<\/span>\r\nUUM = zeros(length(uniq));                                  <span class=\"comment\">% User User Matrix<\/span>\r\n<span class=\"keyword\">for<\/span> ii =  1:length(ment_users)                              <span class=\"comment\">% for each ment user<\/span>\r\n    row = string(uniq) == ment_users{ii};                   <span class=\"comment\">% get row<\/span>\r\n    col = ent == ment_users{ii};                            <span class=\"comment\">% get col<\/span>\r\n    UUM(row,ii) = UEM(row,col);                             <span class=\"comment\">% copy count<\/span>\r\n<span class=\"keyword\">end<\/span>\r\n<\/pre><h4>Creating the Edge List<a name=\"bc45889a-d785-449f-b5f0-c7fc54d012b7\"><\/a><\/h4><p>Now we can add the user to user matrix to the existing user to entity matrix, but we also need to remove the mentioned users from entities since they are already included in the user to user matrix.<\/p><p>All we need to do then is to turn that into a <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/sparse.html\">sparse<\/a> matrix and <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/find.html\">find<\/a> indices of non zero elements. We can then use those indices as the edge list.<\/p><pre class=\"codeinput\">UEM(:,is_ment) = [];                                        <span class=\"comment\">% remove mentioned users<\/span>\r\nUEM = [UUM, UEM];                                           <span class=\"comment\">% add UUM to adj<\/span>\r\nnodes = [uniq; cellstr(ent(~is_ment))'];                    <span class=\"comment\">% create node list<\/span>\r\ns = sparse(UEM);                                            <span class=\"comment\">% sparse matrix<\/span>\r\n[i,j,s] = find(s);                                          <span class=\"comment\">% find indices<\/span>\r\n<\/pre><h4>Creating the Graph<a name=\"b82f0ed0-b4c5-443a-849c-1567e0c2a5bd\"><\/a><\/h4><p>Once you have the edge list, it is a piece of cake to make a social graph from that. Since our relationships have directions (user --&gt; entity), we will create a directed graph with <tt><a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/digraph.html\">digraph<\/a><\/tt>. The size of the nodes are scaled and colored based on the number of incoming relationships called <a href=\"https:\/\/en.wikipedia.org\/wiki\/Directed_graph\">in-degrees<\/a>. As you can see, most tweets are disjointed but we see some large clusters of tweets.<\/p><pre class=\"codeinput\">G = digraph(i,j);                                           <span class=\"comment\">% directed graph<\/span>\r\nG.Nodes.Name = nodes;                                       <span class=\"comment\">% add node names<\/span>\r\nfigure                                                      <span class=\"comment\">% new figure<\/span>\r\ncolormap <span class=\"string\">cool<\/span>                                               <span class=\"comment\">% set color map<\/span>\r\ndeg = indegree(G);                                          <span class=\"comment\">% get indegrees<\/span>\r\nmarkersize = log(deg + 2) * 2;                              <span class=\"comment\">% indeg for marker size<\/span>\r\nplot(G,<span class=\"string\">'MarkerSize'<\/span>,markersize,<span class=\"string\">'NodeCData'<\/span>,deg)             <span class=\"comment\">% plot graph<\/span>\r\nlabels = colorbar; labels.Label.String = <span class=\"string\">'Indegrees'<\/span>;                 <span class=\"comment\">% add colorbar<\/span>\r\ntitle(<span class=\"string\">'Graph of Tweets containing \"Fake News\"'<\/span>)             <span class=\"comment\">% add title<\/span>\r\nxticklabels(<span class=\"string\">''<\/span>); yticklabels(<span class=\"string\">''<\/span>);                           <span class=\"comment\">% hide tick labels<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/twitter_revisited_07.png\" alt=\"\"> <h4>Zooming into the Largest Subgraph<a name=\"c2458dd3-af1f-47e5-a506-5a8a2e4fadfc\"><\/a><\/h4><p>Let's zoom into the largest subgraph to see the details. This gives a much clearer idea about what those tweets were about because you see who was mentioned and what sources were cited. You can see a New York Times opinion column and an article from Sweden generated a lot of tweets along with those who were mentioned in those tweets.<\/p><pre class=\"codeinput\">bins = conncomp(G,<span class=\"string\">'OutputForm'<\/span>,<span class=\"string\">'cell'<\/span>,<span class=\"string\">'Type'<\/span>,<span class=\"string\">'weak'<\/span>);       <span class=\"comment\">% get connected comps<\/span>\r\nbinsizes = cellfun(@length,bins);                           <span class=\"comment\">% get bin sizes<\/span>\r\n[~,idx] = max(binsizes);                                    <span class=\"comment\">% find biggest comp<\/span>\r\nsubG = subgraph(G,bins{idx});                               <span class=\"comment\">% create sub graph<\/span>\r\nfigure                                                      <span class=\"comment\">% new figure<\/span>\r\ncolormap <span class=\"string\">cool<\/span>                                               <span class=\"comment\">% set color map<\/span>\r\ndeg = indegree(subG);                                       <span class=\"comment\">% get indegrees<\/span>\r\nmarkersize = log(deg + 2) * 2;                              <span class=\"comment\">% indeg for marker size<\/span>\r\nh = plot(subG,<span class=\"string\">'MarkerSize'<\/span>,markersize,<span class=\"string\">'NodeCData'<\/span>,deg);     <span class=\"comment\">% plot graph<\/span>\r\nc = colorbar; c.Label.String = <span class=\"string\">'In-degrees'<\/span>;                <span class=\"comment\">% add colorbar<\/span>\r\ntitle(<span class=\"string\">'The Largest Subgraph (Close-up)'<\/span>)                    <span class=\"comment\">% add title<\/span>\r\nxticklabels(<span class=\"string\">''<\/span>); yticklabels(<span class=\"string\">''<\/span>);                           <span class=\"comment\">% hide tick labels<\/span>\r\n[~,rank] = sort(deg,<span class=\"string\">'descend'<\/span>);                             <span class=\"comment\">% get ranking<\/span>\r\ntop15 = subG.Nodes.Name(rank(1:15));                        <span class=\"comment\">% get top 15<\/span>\r\nlabelnode(h,top15,top15 );                               \t<span class=\"comment\">% label nodes<\/span>\r\naxis([-.5 2.5 -1.6 -0.7]);                                  <span class=\"comment\">% define axis limits<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/twitter_revisited_08.png\" alt=\"\"> <h4>Using Twitty<a name=\"87471d3f-9915-4157-8215-4e8666c08aa6\"><\/a><\/h4><p>If you want to analyze Twitter for different topics, you need to collect your own tweets. For this analysis I used <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/34837-twitty\">Twitty<\/a> by <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/1421693-vladimir-bondarenko\">Vladimir Bondarenko<\/a>. It hasn't been updated since July 2013 but it still works. Let's go over how you use Twitty. I am assuming that you already have <a href=\"https:\/\/developer.twitter.com\/en\/docs\/basics\/authentication\/guides\/access-tokens\">your developer credentials<\/a> and downloaded Twitty into your curent folder. The workspace variable <tt>creds<\/tt> should contain your credentials in a struct in the following format:<\/p><pre class=\"codeinput\">creds = struct;                                             <span class=\"comment\">% example<\/span>\r\ncreds.ConsumerKey = <span class=\"string\">'your consumer key'<\/span>;\r\ncreds.ConsumerSecret = <span class=\"string\">'your consumer secret'<\/span>;\r\ncreds.AccessToken = <span class=\"string\">'your token'<\/span>;\r\ncreds.AccessTokenSecret = <span class=\"string\">'your token secret'<\/span>;\r\n<\/pre><p>Twitty by default expects the <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/20565-json-parser\">JSON Parser<\/a> by <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/1286842-joel-feenstra\">Joel Feenstra<\/a>. However, I would like to use the new built-in functions in R2016 <tt><a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/jsonencode.html\">jsonencode<\/a><\/tt> and <tt><a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/jsondecode.html\">jsondecode<\/a><\/tt> instead. To suppress the warning Twitty generates, I will use <tt><a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/warning.html\">warning<\/a><\/tt>.<\/p><pre class=\"codeinput\">warning(<span class=\"string\">'off'<\/span>)                                              <span class=\"comment\">% turn off warning<\/span>\r\naddpath <span class=\"string\">twitty_1.1.1<\/span>;                                       <span class=\"comment\">% add Twitty folder to the path<\/span>\r\nload <span class=\"string\">creds<\/span>                                                  <span class=\"comment\">% load my real credentials<\/span>\r\ntw = twitty(creds);                                         <span class=\"comment\">% instantiate a Twitty object<\/span>\r\nwarning(<span class=\"string\">'on'<\/span>)                                               <span class=\"comment\">% turn on warning<\/span>\r\n<\/pre><h4>Twitter Search API Example<a name=\"4076e05e-3c6f-414c-81e8-4eac755371b1\"><\/a><\/h4><p>Since Twitty returns JSON as plain text if you don't specify the parser, you can use <tt>jsondecode<\/tt> once you get the output from Twitty.  The number of tweets you can get from the <a href=\"https:\/\/dev.twitter.com\/rest\/public\/search\">Search API<\/a> is limited to 100 per request. If you need more, you usually use the <a href=\"https:\/\/dev.twitter.com\/streaming\/overview\">Streaming API<\/a>.<\/p><pre class=\"codeinput\">keyword = <span class=\"string\">'nfl'<\/span>;                                            <span class=\"comment\">% keyword to search<\/span>\r\ntweets = tw.search(keyword,<span class=\"string\">'count'<\/span>,100,<span class=\"string\">'include_entities'<\/span>,<span class=\"string\">'true'<\/span>,<span class=\"string\">'lang'<\/span>,<span class=\"string\">'en'<\/span>);\r\ntweets = jsondecode(tweets);                                <span class=\"comment\">% parse JSON<\/span>\r\ntweet = tweets.statuses{1}.text;                            <span class=\"comment\">% index into text<\/span>\r\ndisp([tweet(1:70) <span class=\"string\">'...'<\/span>])                                   <span class=\"comment\">% show 70 chars<\/span>\r\n<\/pre><pre class=\"codeoutput\">RT @JBaezaTopDawg: .@NFL will be announcing a @Patriots v @RAIDERS mat...\r\n<\/pre><h4>Twitter Trending Topic API Example<a name=\"a6fb7cfd-12b3-447f-afaa-8dc0986784be\"><\/a><\/h4><p>If you want to find a high volume topic with thousands of tweets, one way to find such a topic is to use <a href=\"https:\/\/dev.twitter.com\/rest\/reference\/get\/trends\/place\">trending topics<\/a>. Those topics will give you plenty of tweets to work with.<\/p><pre class=\"codeinput\">us_woeid = 23424977;                                        <span class=\"comment\">% US as location<\/span>\r\nus_trends = tw.trendsPlace(us_woeid);                       <span class=\"comment\">% get trending topics<\/span>\r\nus_trends = jsondecode(us_trends);                          <span class=\"comment\">% parse JSON<\/span>\r\ntrends = arrayfun(@(x) x.name, us_trends.trends, <span class=\"string\">'UniformOutput'<\/span>,false);\r\ndisp(trends(1:10))\r\n<\/pre><pre class=\"codeoutput\">    'Beyonc&eacute;'\r\n    'Rex Tillerson'\r\n    '#NSD17'\r\n    '#PressOn'\r\n    'DeVos'\r\n    'Roger Goodell'\r\n    '#nationalsigningday'\r\n    'Skype'\r\n    '#wednesdaywisdom'\r\n    '#MyKindOfPartyIncludes'\r\n<\/pre><h4>Twitter Streaming API Example<a name=\"dbf110d8-6928-48ac-9021-9820bfdc97cd\"><\/a><\/h4><p>Once you find a high volume topic to work with, you can use the Streaming API to get tweets that contain it. Twitty stores the retrieved tweets in the <tt>'data'<\/tt> property. What you save is defined in an output function like <a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/saveTweets.m\">saveTweets.m<\/a>. <tt>'S'<\/tt> in this case will be a character array of JSON formatted text and we need to use <tt>jsondecode<\/tt> to convert it into a struct since we didn't specify the JSON parser.<\/p><pre class=\"codeinput\">dbtype <span class=\"string\">twitty_1.1.1\/saveTweets.m<\/span> <span class=\"string\">17:24<\/span>\r\n<\/pre><pre class=\"codeoutput\">\r\n17    % Parse input:\r\n18    S = jsondecode(S);\r\n19    \r\n20    if length(S)== 1 &amp;&amp; isfield(S, 'statuses')\r\n21        T = S{1}.statuses;\r\n22    else\r\n23        T = S;\r\n24    end\r\n<\/pre><p>Now let's give it a try. By default, Twitty will get 20 batches of 1000 tweets = 20,000 tweets, but that will take a long time. We will just get 10 tweets in this example.<\/p><pre class=\"codeinput\">keyword = <span class=\"string\">'nfl'<\/span>;                                            <span class=\"comment\">% specify keyword<\/span>\r\ntw.outFcn = @saveTweets;                                    <span class=\"comment\">% output function<\/span>\r\ntw.sampleSize = 10;                                         <span class=\"comment\">% default 1000<\/span>\r\ntw.batchSize = 1;                                           <span class=\"comment\">% default 20<\/span>\r\ntw.filterStatuses(<span class=\"string\">'track'<\/span>,keyword);                         <span class=\"comment\">% Streaming API call<\/span>\r\nresult = tw.data;                                           <span class=\"comment\">% save the data<\/span>\r\nlength(result.statuses)                                     <span class=\"comment\">% number of tweets<\/span>\r\ntweet = result.statuses(1).status.text;                     <span class=\"comment\">% get a tweet<\/span>\r\ndisp([tweet(1:70) <span class=\"string\">'...'<\/span>])                                   <span class=\"comment\">% show 70 chars<\/span>\r\n<\/pre><pre class=\"codeoutput\">Tweets processed: 1 (out of 10).\r\nTweets processed: 2 (out of 10).\r\nTweets processed: 3 (out of 10).\r\nTweets processed: 4 (out of 10).\r\nTweets processed: 5 (out of 10).\r\nTweets processed: 6 (out of 10).\r\nTweets processed: 7 (out of 10).\r\nTweets processed: 8 (out of 10).\r\nTweets processed: 9 (out of 10).\r\nTweets processed: 10 (out of 10).\r\nans =\r\n    10\r\nRT @Russ_Mac876: Michael Jackson is still the greatest https:\/\/t.co\/BE...\r\n<\/pre><h4>Summary - Visit Andy's Developer Zone for More<a name=\"20fb1dc3-39f7-42c3-8a8d-9d0e0bb29521\"><\/a><\/h4><p>In this post you saw how you can analyze tweets using the more recent features in MATLAB, such as the HTTP interface to expand short urls. You also got a quick tutorial on how to use Twitty to collect tweets for your own purpose.<\/p><p>Twitty covers your basic needs. But you can go beyond Twitty and roll your own tool by taking advantage of the new HTTP interface. I show you how in a <a title=\"https:\/\/blogs.mathworks.com\/developer\/2017\/02\/07\/connect-to-twitter-with-oauth-over-http-interface\/ (link no longer works)\">second blog post<\/a> I wrote for Andy's Developer Zone.<\/p><p>Now that you understand how you can use Twitter to analyze social issues like fake news, tell us how you would put it to good use <a href=\"https:\/\/blogs.mathworks.com\/loren\/?p=2209#respond\">here<\/a>.<\/p><script language=\"JavaScript\"> <!-- \r\n    function grabCode_5eba22c4e7a047d68e4b431a2457d392() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='5eba22c4e7a047d68e4b431a2457d392 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 5eba22c4e7a047d68e4b431a2457d392';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2017 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_5eba22c4e7a047d68e4b431a2457d392()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2016b<br><\/p><\/div><!--\r\n5eba22c4e7a047d68e4b431a2457d392 ##### SOURCE BEGIN #####\r\n%% Analyzing Fake News with Twitter\r\n% Social media has become an important part of modern life, and Twitter is\r\n% again a center of focus in recent events. Today's guest blogger,\r\n% <https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/951521 Toshi\r\n% Takeuchi> gives us an update on how you can use MATLAB to analyze a\r\n% Twitter feed.\r\n%\r\n% <<fake_news.png>>\r\n\r\n%% Twitter Revisited\r\n% When I wrote about\r\n% <https:\/\/blogs.mathworks.com\/loren\/2014\/06\/04\/analyzing-twitter-with-matlab\r\n% analyzing Twitter with MATLAB> back in 2014 I didn't expect that 3 years\r\n% later Twitter would come to play such a huge role in politics. There have\r\n% been a lot of changes in MATLAB in those years as well. Perhaps it is\r\n% time to revisit this topic. We hear a lot about\r\n% <https:\/\/en.wikipedia.org\/wiki\/Fake_news_website fake news> since\r\n% <https:\/\/en.wikipedia.org\/wiki\/United_States_presidential_election,_2016\r\n% the US Presidential Election of 2016>. Let's use Twitter to analyze this\r\n% phenomenon. While fake news spreads mainly on Facebook, Twitter is the\r\n% favorite social media platform for journalists who discuss them. \r\n% \r\n%% Load Tweets\r\n% I collected 1,000 tweets that contain the term 'fake news' using the\r\n% Streaming API and saved them in\r\n% |<https:\/\/blogs.mathworks.com\/images\/loren\/2017\/fake_news.mat\r\n% fake_news.mat>|. Let's start processing tweets by looking at the top 10\r\n% users based on the followers count.\r\n\r\nload fake_news                                              % load data\r\nt = table;                                                  % initialize a table\r\nt.names = arrayfun(@(x) x.status.user.name, ...             % get user names\r\n    fake_news.statuses, 'UniformOutput', false);\r\nt.names = regexprep(t.names,'[^a-zA-Z .,'']','');           % remove non-ascii\r\nt.screen_names = arrayfun(@(x) ...                          % get screen names\r\n    x.status.user.screen_name, fake_news.statuses, 'UniformOutput', false);\r\nt.followers_count = arrayfun(@(x)  ...                      % get followers count\r\n    x.status.user.followers_count, fake_news.statuses);\r\nt = unique(t,'rows');                                       % remove duplicates\r\nt = sortrows(t,'followers_count', 'descend');               % rank users\r\ndisp(t(1:10,:))                                             % show the table\r\n\r\n%% Short Urls\r\n% Until recently Twitter had a 140 character limit per tweet including\r\n% links. Therefore when people embed urls in their tweets, they typically\r\n% used url shortening services. To identify the actual sources, we need to\r\n% get the expanded urls that those short urls point to. To do it I wrote a\r\n% utility function\r\n% |<https:\/\/blogs.mathworks.com\/images\/loren\/2017\/expandUrl.m expandUrl>|\r\n% taking advantage of the new\r\n% <https:\/\/www.mathworks.com\/help\/matlab\/call-wsdl-web-services_bu9hx2b-1.html\r\n% HTTP interface> introduced in R2016b. You can see that I create\r\n% <https:\/\/www.mathworks.com\/help\/matlab\/ref\/matlab.net.uri-class.html\r\n% URI> and\r\n% <https:\/\/www.mathworks.com\/help\/matlab\/ref\/matlab.net.http.requestmessage-class.html\r\n% RequestMessage> objects and used the\r\n% |<https:\/\/www.mathworks.com\/help\/matlab\/ref\/matlab.net.http.requestmessage.send.html\r\n% send>| method to get a\r\n% <https:\/\/www.mathworks.com\/help\/matlab\/ref\/matlab.net.http.responsemessage-class.html\r\n% ResponseMessage> object.\r\n\r\ndbtype expandUrl 25:32\r\n\r\n%%\r\n% Let's give it a try.\r\nexpanded = char(expandUrl('http:\/\/trib.al\/ZQuUDNx'));       % expand url\r\ndisp([expanded(1:70) '...'])\r\n\r\n%% Tokenize Tweets\r\n% To get a sense of what was being discussed in those tweets and what\r\n% sentiments were represented there, we need to process the text.\r\n% \r\n% * Our first step is to turn tweets into tokens.\r\n% * Once we have tokens, we can use them to compute sentiment scores based\r\n% on lexicons like\r\n% <http:\/\/www2.imm.dtu.dk\/pubdb\/views\/publication_details.php?id=6010\r\n% AFINN>.\r\n% * You can also use it to visualize tweets as a word cloud.\r\n%\r\n% We also want to collect embedded links along the way.\r\n\r\ndelimiters = {' ','$','\/','.','-',':','&','*', ...          % remove those\r\n    '+','=','[',']','?','!','(',')','{','}',',', ...\r\n    '\"','>','_','<',';','%',char(10),char(13)};\r\nAFINN = readtable('AFINN\/AFINN-111.txt', ...                % load score file\r\n    'Delimiter','\\t','ReadVariableNames',0);\r\nAFINN.Properties.VariableNames = {'Term','Score'};          % add var names\r\nstopwordsURL ='http:\/\/www.textfixer.com\/resources\/common-english-words.txt';\r\nstopWords = webread(stopwordsURL);                          % read stop words\r\nstopWords = split(string(stopWords),',');                   % split stop words\r\ntokens = cell(fake_news.tweetscnt,1);                       % cell arrray as accumulator\r\nexpUrls = strings(fake_news.tweetscnt,1);                   % cell arrray as accumulator\r\ndispUrls = strings(fake_news.tweetscnt,1);                  % cell arrray as accumulator\r\nscores = zeros(fake_news.tweetscnt,1);                      % initialize accumulator\r\nfor ii = 1:fake_news.tweetscnt                              % loop over tweets\r\n    tweet = string(fake_news.statuses(ii).status.text);     % get tweet\r\n    s = split(tweet, delimiters)';                          % split tweet by delimiters\r\n    s = lower(s);                                           % use lowercase\r\n    s = regexprep(s, '[0-9]+','');                          % remove numbers\r\n    s = regexprep(s,'(http|https):\/\/[^\\s]*','');            % remove urls\r\n    s = erase(s,'''s');                                     % remove possessive s\r\n    s(s == '') = [];                                        % remove empty strings\r\n    s(ismember(s, stopWords)) = [];                         % remove stop words\r\n    tokens{ii} = s;                                         % add to the accumulator\r\n    scores(ii) = sum(AFINN.Score(ismember(AFINN.Term,s)));  % add to the accumulator\r\n    if ~isempty( ...                                        % if display_url exists\r\n            fake_news.statuses(ii).status.entities.urls) && ...\r\n            isfield(fake_news.statuses(ii).status.entities.urls,'display_url')\r\n        durl = fake_news.statuses(ii).status.entities.urls.display_url;\r\n        durl = regexp(durl,'^(.*?)\\\/','match','once');      % get its domain name\r\n        dispUrls(ii) = durl(1:end-1);                       % add to dipUrls\r\n        furl = fake_news.statuses(ii).status.entities.urls.expanded_url;\r\n        furl = expandUrl(furl,'RemoveParams',1);            % expand links\r\n        expUrls(ii) = expandUrl(furl,'RemoveParams',1);     % one more time\r\n    end\r\nend\r\n\r\n%% \r\n% Now we can create the document term matrix. We will also do the same\r\n% thing for embedded links. \r\n\r\ndict = unique([tokens{:}]);                                 % unique words\r\ndomains = unique(dispUrls);                                 % unique domains\r\ndomains(domains == '') = [];                                % remove empty string\r\nlinks = unique(expUrls);                                    % unique links\r\nlinks(links == '') = [];                                    % remove empty string\r\nDTM = zeros(fake_news.tweetscnt,length(dict));              % Doc Term Matrix\r\nDDM = zeros(fake_news.tweetscnt,length(domains));           % Doc Domain Matrix\r\nDLM = zeros(fake_news.tweetscnt,length(links));             % Doc Link Matrix\r\nfor ii = 1:fake_news.tweetscnt                              % loop over tokens\r\n    [words,~,idx] = unique(tokens{ii});                     % get uniqe words\r\n    wcounts = accumarray(idx, 1);                           % get word counts\r\n    cols = ismember(dict, words);                           % find cols for words\r\n    DTM(ii,cols) = wcounts;                                 % unpdate DTM with word counts\r\n    cols = ismember(domains,dispUrls(ii));                  % find col for domain\r\n    DDM(ii,cols) = 1;                                       % increment DMM\r\n    expanded = expandUrl(expUrls(ii));                      % expand links\r\n    expanded = expandUrl(expanded);                         % one more time\r\n    cols = ismember(links,expanded);                        % find col for link\r\n    DLM(ii,cols) = 1;                                       % increment DLM\r\nend\r\nDTM(:,ismember(dict,{'#','@'})) = [];                       % remove # and @\r\ndict(ismember(dict,{'#','@'})) = [];                        % remove # and @\r\n\r\n%% Sentiment Analysis\r\n% One of the typical analyses you perform on Twitter feed is\r\n% <https:\/\/en.wikipedia.org\/wiki\/Sentiment_analysis sentiment analysis>.\r\n% The histogram shows, not surprisingly, that those tweets were mostly very\r\n% negative. We can summarize this by the Net Sentiment Rate (NSR), which is\r\n% based on the ratio of positive tweets to negative tweets.\r\n\r\nNSR = (sum(scores >= 0) - sum(scores < 0)) \/ length(scores);% net setiment rate\r\nfigure                                                      % new figure\r\nhistogram(scores,'Normalization','probability')             % positive tweets\r\nline([0 0], [0 .35],'Color','r');                           % reference line\r\ntitle(['Sentiment Score Distribution of \"Fake News\" ' ...   % add title\r\n    sprintf('(NSR: %.2f)',NSR)])\r\nxlabel('Sentiment Score')                                   % x-axis label\r\nylabel('% Tweets')                                          % y-axis label\r\nyticklabels(string(0:5:35))                                 % y-axis ticks\r\ntext(-10,.25,'Negative');text(3,.25,'Positive');            % annotate\r\n\r\n%% What Words Appear Frequently in Tweets?\r\n% Now let's plot the word frequency to visualize what was discussed in\r\n% those tweets. They seem to be about dominant news headlines at the time\r\n% the tweets were collected.\r\n\r\ncount = sum(DTM);                                           % get word count\r\nlabels = erase(dict(count >= 40),'@');                      % high freq words\r\npos = [find(count >= 40);count(count >= 40)] + 0.1;         % x y positions\r\nfigure                                                      % new figure\r\nscatter(1:length(dict),count)                               % scatter plot\r\ntext(pos(1,1),pos(2,1)+3,cellstr(labels(1)),...             % place labels\r\n    'HorizontalAlignment','center');\r\ntext(pos(1,2),pos(2,2)-2,cellstr(labels(2)),...\r\n    'HorizontalAlignment','right'); \r\ntext(pos(1,3),pos(2,3)-4,cellstr(labels(3)));\r\ntext(pos(1,3:end),pos(2,3:end),cellstr(labels(3:end))); \r\ntitle('Frequent Words in Tweets Mentioning Fake News')      % add title\r\nxlabel('Indices')                                           % x-axis label\r\nylabel(' Count')                                            % y-axis label\r\nylim([0 150])                                               % y-axis range\r\n\r\n%% What Hashtags Appear Frequently in Tweets?\r\n% Hashtags that start with \"#\" are often used to identify the main theme\r\n% of tweets, and we see those related to the dominant news again as you\r\n% would expect.\r\n\r\nis_hash = startsWith(dict,'#') & dict ~= '#';               % get indices\r\nhashes = erase(dict(is_hash),'#');                          % get hashtags\r\nhash_count = count(is_hash);                                % get count\r\nlabels = hashes(hash_count >= 4);                           % high freq tags\r\npos = [find(hash_count >= 4) + 1; ...                       % x y positions\r\n    hash_count(hash_count >= 4) + 0.1];         \r\nfigure                                                      % new figure\r\nscatter(1:length(hashes),hash_count)                        % scatter plot\r\ntext(pos(1,1),pos(2,1)- .5,cellstr(labels(1)),...           % place labels\r\n    'HorizontalAlignment','center');\r\ntext(pos(1,2:end-1),pos(2,2:end-1),cellstr(labels(2:end-1)));\r\ntext(pos(1,end),pos(2,end)-.5,cellstr(labels(end)),...\r\n    'HorizontalAlignment','right');\r\ntitle('Frequently Used Hashtags')                           % add title\r\nxlabel('Indices')                                           % x-axis label\r\nylabel('Count')                                             % y-axis label\r\nylim([0 15])                                                % y-axis range\r\n\r\n%% Who Got Frequent Mentions in Tweets?\r\n% Twitter is also a commmunication medium and people can direct their\r\n% tweets to specific users by including their screen names in the tweets\r\n% starting with \"@\". These are called \"mentions\". We can see there is one\r\n% particular user who got a lot of mentions.\r\n\r\nis_ment = startsWith(dict,'@') & dict ~= '@';               % get indices\r\nmentions = erase(dict(is_ment),'@');                        % get mentions\r\nment_count = count(is_ment);                                % get count\r\nlabels = mentions(ment_count >= 10);                        % high freq mentions\r\npos = [find(ment_count >= 10) + 1; ...                      % x y positions\r\n    ment_count(ment_count >= 10) + 0.1];     \r\nfigure                                                      % new figure\r\nscatter(1:length(mentions),ment_count)                      % scatter plot\r\ntext(pos(1,:),pos(2,:),cellstr(labels));                    % place labels\r\ntitle('Frequent Mentions')                                  % add title\r\nxlabel('Indices')                                           % x-axis label\r\nylabel('Count')                                             % y-axis label\r\nylim([0 100])                                               % y-axis range\r\n\r\n%% Frequently Cited Web Sites\r\n% You can also embed a link in a tweet, usually for citing sources and\r\n% directing people to get more details from those sources. This tends to\r\n% show where the original information came from.\r\n% \r\n% Twitter was the most frequently cited source. This was interesting to me.\r\n% Usually, if you want to cite other tweets, you retweet them. When you\r\n% retweet, the original user gets a credit. By embedding the link without\r\n% retweeting it, people circumvent this mechanism. Very curious.\r\n\r\ncount = sum(DDM);                                           % get domain count\r\nlabels = domains(count > 5);                                % high freq citations\r\npos = [find(count > 5) + 1;count(count > 5) + 0.1];         % x y positions    \r\nfigure                                                      % new figure\r\nscatter(1:length(domains),count)                            % scatter plot\r\ntext(pos(1,:),pos(2,:),cellstr(labels));                    % place labels\r\ntitle('Frequently Cited Web Sites')                         % add title\r\nxlabel('Indices')                                           % x-axis label\r\nylabel('Count')                                             % y-axis label\r\n\r\n%% Frequently Cited Sources\r\n% You can also see that many of the web sites are for url shortening\r\n% services. Let's find out the real urls linked from those short urls. \r\n\r\ncount = sum(DLM);                                           % get domain count\r\nlabels = links(count >= 15);                                % high freq citations\r\npos = [find(count >= 15) + 1;count(count >= 15)];           % x y positions    \r\nfigure                                                      % new figure\r\nscatter(1:length(links),count)                              % scatter plot\r\ntext(ones(size(pos(1,:))),pos(2,:)-2,cellstr(labels));      % place labels\r\ntitle('Frequently Cited Sources ')                          % add title\r\nxlabel('Indices')                                           % x-axis label\r\nylabel('Count')                                             % y-axis label\r\n\r\n%% Generating a Social Graph\r\n% Now let's think of a way to see the association between users to the\r\n% entities included in their tweets to reveal their relationships. We have\r\n% a matrix of words by tweets, and we can convert it into a matrix of users\r\n% vs. entities, such as hashtags, mentions and links.\r\n\r\nusers = arrayfun(@(x) x.status.user.screen_name, ...        % screen names\r\n    fake_news.statuses, 'UniformOutput', false);\r\nuniq = unique(users);                                       % remove duplicates\r\ncombo = [DTM DLM];                                          % combine matrices\r\nUEM = zeros(length(uniq),size(combo,2));                    % User Entity Matrix\r\nfor ii = 1:length(uniq)                                     % for unique user\r\n    UEM(ii,:) = sum(combo(ismember(users,uniq(ii)),:),1);   % sum cols\r\nend\r\ncols = is_hash | is_ment;                                   % hashtags, mentions\r\ncols = [cols true(1,length(links))];                        % add links\r\nUEM = UEM(:,cols);                                          % select those cols\r\nent = dict(is_hash | is_ment);                              % select entities\r\nent = [ent links'];                                         % add links\r\n\r\n%% Handling Mentions\r\n% Some of the mentions are for users of those tweets, and others are not.\r\n% When two users mention another, that forms an user-user edge, rather\r\n% than user-entity edge. To map such edges correctly, we want to treat\r\n% mentioned users separately.\r\n\r\nment_users = uniq(ismember(uniq,mentions));                 % mentioned users\r\nis_ment = ismember(ent,'@' + string(ment_users));           % their mentions\r\nent(is_ment) = erase(ent(is_ment),'@');                     % remove @\r\nUUM = zeros(length(uniq));                                  % User User Matrix\r\nfor ii =  1:length(ment_users)                              % for each ment user\r\n    row = string(uniq) == ment_users{ii};                   % get row\r\n    col = ent == ment_users{ii};                            % get col\r\n    UUM(row,ii) = UEM(row,col);                             % copy count\r\nend\r\n\r\n%% Creating the Edge List\r\n% Now we can add the user to user matrix to the existing user to entity\r\n% matrix, but we also need to remove the mentioned users from entities\r\n% since they are already included in the user to user matrix.\r\n%\r\n% All we need to do then is to\r\n% turn that into a <https:\/\/www.mathworks.com\/help\/matlab\/ref\/sparse.html\r\n% sparse> matrix and <https:\/\/www.mathworks.com\/help\/matlab\/ref\/find.html\r\n% find> indices of non zero elements. We can then use those indices as the\r\n% edge list.\r\n\r\nUEM(:,is_ment) = [];                                        % remove mentioned users\r\nUEM = [UUM, UEM];                                           % add UUM to adj\r\nnodes = [uniq; cellstr(ent(~is_ment))'];                    % create node list\r\ns = sparse(UEM);                                            % sparse matrix\r\n[i,j,s] = find(s);                                          % find indices\r\n\r\n%% Creating the Graph\r\n% Once you have the edge list, it is a piece of cake to make a social graph\r\n% from that. Since our relationships have directions (user REPLACE_WITH_DASH_DASH> entity), we\r\n% will create a directed graph with\r\n% |<https:\/\/www.mathworks.com\/help\/matlab\/ref\/digraph.html digraph>|. The\r\n% size of the nodes are scaled and colored based on the number of incoming\r\n% relationships called <https:\/\/en.wikipedia.org\/wiki\/Directed_graph\r\n% in-degrees>. As you can see, most tweets are disjointed but we see some\r\n% large clusters of tweets.\r\n\r\nG = digraph(i,j);                                           % directed graph\r\nG.Nodes.Name = nodes;                                       % add node names\r\nfigure                                                      % new figure\r\ncolormap cool                                               % set color map\r\ndeg = indegree(G);                                          % get indegrees\r\nmarkersize = log(deg + 2) * 2;                              % indeg for marker size\r\nplot(G,'MarkerSize',markersize,'NodeCData',deg)             % plot graph\r\nlabels = colorbar; labels.Label.String = 'Indegrees';                 % add colorbar\r\ntitle('Graph of Tweets containing \"Fake News\"')             % add title\r\nxticklabels(''); yticklabels('');                           % hide tick labels\r\n\r\n%% Zooming into the Largest Subgraph\r\n% Let's zoom into the largest subgraph to see the details. This gives a\r\n% much clearer idea about what those tweets were about because you see who\r\n% was mentioned and what sources were cited. You can see a New York Times\r\n% opinion column and an article from Sweden generated a lot of tweets along\r\n% with those who were mentioned in those tweets. \r\n\r\nbins = conncomp(G,'OutputForm','cell','Type','weak');       % get connected comps\r\nbinsizes = cellfun(@length,bins);                           % get bin sizes\r\n[~,idx] = max(binsizes);                                    % find biggest comp\r\nsubG = subgraph(G,bins{idx});                               % create sub graph\r\nfigure                                                      % new figure\r\ncolormap cool                                               % set color map\r\ndeg = indegree(subG);                                       % get indegrees\r\nmarkersize = log(deg + 2) * 2;                              % indeg for marker size\r\nh = plot(subG,'MarkerSize',markersize,'NodeCData',deg);     % plot graph\r\nc = colorbar; c.Label.String = 'In-degrees';                % add colorbar\r\ntitle('The Largest Subgraph (Close-up)')                    % add title\r\nxticklabels(''); yticklabels('');                           % hide tick labels\r\n[~,rank] = sort(deg,'descend');                             % get ranking\r\ntop15 = subG.Nodes.Name(rank(1:15));                        % get top 15\r\nlabelnode(h,top15,top15 );                               \t% label nodes\r\naxis([-.5 2.5 -1.6 -0.7]);                                  % define axis limits\r\n\r\n%% Using Twitty\r\n% If you want to analyze Twitter for different topics, you need to collect\r\n% your own tweets. For this analysis I used\r\n% <https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/34837-twitty\r\n% Twitty> by\r\n% <https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/1421693-vladimir-bondarenko\r\n% Vladimir Bondarenko>. It hasn't been updated since July 2013 but it still\r\n% works. Let's go over how you use Twitty. I am assuming that you already\r\n% have <https:\/\/developer.twitter.com\/en\/docs\/basics\/authentication\/guides\/access-tokens your\r\n% developer credentials> and downloaded Twitty into your curent folder. The\r\n% workspace variable |creds| should contain your credentials in a struct in\r\n% the following format:\r\n\r\ncreds = struct;                                             % example\r\ncreds.ConsumerKey = 'your consumer key';\r\ncreds.ConsumerSecret = 'your consumer secret';\r\ncreds.AccessToken = 'your token';\r\ncreds.AccessTokenSecret = 'your token secret';\r\n\r\n%% \r\n% Twitty by default expects the\r\n% <https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/20565-json-parser\r\n% JSON Parser> by\r\n% <https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/1286842-joel-feenstra\r\n% Joel Feenstra>. However, I would like to use the new built-in functions in\r\n% R2016 |<https:\/\/www.mathworks.com\/help\/matlab\/ref\/jsonencode.html\r\n% jsonencode>| and\r\n% |<https:\/\/www.mathworks.com\/help\/matlab\/ref\/jsondecode.html jsondecode>|\r\n% instead. To suppress the warning Twitty generates, I will use\r\n% |<https:\/\/www.mathworks.com\/help\/matlab\/ref\/warning.html warning>|.\r\n\r\nwarning('off')                                              % turn off warning\r\naddpath twitty_1.1.1;                                       % add Twitty folder to the path\r\nload creds                                                  % load my real credentials\r\ntw = twitty(creds);                                         % instantiate a Twitty object \r\nwarning('on')                                               % turn on warning\r\n\r\n%%  Twitter Search API Example\r\n% Since Twitty returns JSON as plain text if you don't specify the parser,\r\n% you can use |jsondecode| once you get the output from Twitty.  The number\r\n% of tweets you can get from the\r\n% <https:\/\/dev.twitter.com\/rest\/public\/search Search API> is limited to 100\r\n% per request. If you need more, you usually use the\r\n% <https:\/\/dev.twitter.com\/streaming\/overview Streaming API>.\r\n\r\nkeyword = 'nfl';                                            % keyword to search\r\ntweets = tw.search(keyword,'count',100,'include_entities','true','lang','en');\r\ntweets = jsondecode(tweets);                                % parse JSON\r\ntweet = tweets.statuses{1}.text;                            % index into text\r\ndisp([tweet(1:70) '...'])                                   % show 70 chars\r\n\r\n%% Twitter Trending Topic API Example\r\n% If you want to find a high volume topic with thousands of tweets, one way\r\n% to find such a topic is to use\r\n% <https:\/\/dev.twitter.com\/rest\/reference\/get\/trends\/place trending\r\n% topics>. Those topics will give you plenty of tweets to work with.\r\n\r\nus_woeid = 23424977;                                        % US as location\r\nus_trends = tw.trendsPlace(us_woeid);                       % get trending topics\r\nus_trends = jsondecode(us_trends);                          % parse JSON\r\ntrends = arrayfun(@(x) x.name, us_trends.trends, 'UniformOutput',false);\r\ndisp(trends(1:10))\r\n\r\n%% Twitter Streaming API Example\r\n% Once you find a high volume topic to work with, you can use the Streaming\r\n% API to get tweets that contain it. Twitty stores the retrieved tweets in\r\n% the |'data'| property. What you save is defined in an output function\r\n% like <https:\/\/blogs.mathworks.com\/images\/loren\/2016\/saveTweets.m\r\n% saveTweets.m>. |'S'| in this case will be a character array of JSON\r\n% formatted text and we need to use |jsondecode| to convert it into a\r\n% struct since we didn't specify the JSON parser.\r\n\r\ndbtype twitty_1.1.1\/saveTweets.m 17:24\r\n\r\n%% \r\n% Now let's give it a try. By default, Twitty will get 20 batches of 1000\r\n% tweets = 20,000 tweets, but that will take a long time. We will just get\r\n% 10 tweets in this example.\r\n\r\nkeyword = 'nfl';                                            % specify keyword\r\ntw.outFcn = @saveTweets;                                    % output function\r\ntw.sampleSize = 10;                                         % default 1000 \r\ntw.batchSize = 1;                                           % default 20 \r\ntw.filterStatuses('track',keyword);                         % Streaming API call\r\nresult = tw.data;                                           % save the data\r\nlength(result.statuses)                                     % number of tweets\r\ntweet = result.statuses(1).status.text;                     % get a tweet\r\ndisp([tweet(1:70) '...'])                                   % show 70 chars\r\n\r\n%% Summary - Visit Andy's Developer Zone for More\r\n% In this post you saw how you can analyze tweets using the more recent\r\n% features in MATLAB, such as the HTTP interface to expand short urls. You\r\n% also got a quick tutorial on how to use Twitty to collect tweets for\r\n% your own purpose.\r\n%\r\n% Twitty covers your basic needs. But you can go beyond Twitty and roll\r\n% your own tool by taking advantage of the new HTTP interface. I show you\r\n% how in a\r\n% <https:\/\/blogs.mathworks.com\/developer\/2017\/02\/07\/connect-to-twitter-with-oauth-over-http-interface\/\r\n% second blog post> I wrote for Andy's Developer Zone.\r\n%\r\n% Now that you understand how you can use Twitter to analyze social issues\r\n% like fake news, tell us how you would put it to good use\r\n% <https:\/\/blogs.mathworks.com\/loren\/?p=2209#respond here>.\r\n##### SOURCE END ##### 5eba22c4e7a047d68e4b431a2457d392\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2017\/twitter_revisited_08.png\" onError=\"this.style.display ='none';\" \/><\/div><!--introduction--><p>Social media has become an important part of modern life, and Twitter is again a center of focus in recent events. Today's guest blogger, <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/951521\">Toshi Takeuchi<\/a> gives us an update on how you can use MATLAB to analyze a Twitter feed.... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/loren\/2017\/02\/07\/analyzing-fake-news-with-twitter\/\">read more >><\/a><\/p>","protected":false},"author":39,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[62,33,61],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/2209"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/comments?post=2209"}],"version-history":[{"count":4,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/2209\/revisions"}],"predecessor-version":[{"id":2232,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/2209\/revisions\/2232"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/media?parent=2209"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/categories?post=2209"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/tags?post=2209"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}