{"id":1297,"date":"2016-02-03T09:30:45","date_gmt":"2016-02-03T14:30:45","guid":{"rendered":"https:\/\/blogs.mathworks.com\/loren\/?p=1297"},"modified":"2016-01-07T09:34:34","modified_gmt":"2016-01-07T14:34:34","slug":"visualizing-facebook-networks-with-matlab","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/loren\/2016\/02\/03\/visualizing-facebook-networks-with-matlab\/","title":{"rendered":"Visualizing Facebook Networks with MATLAB"},"content":{"rendered":"<div class=\"content\"><!--introduction--><p>When one of my guest bloggers, <a href=\"https:\/\/twitter.com\/toshi2fly\">Toshi<\/a> posted, <a href=\"https:\/\/blogs.mathworks.com\/loren\/2015\/09\/30\/can-we-predict-a-breakup-social-network-analysis-with-matlab\/\">Can We Predict a Breakup? Social Network Analysis with MATLAB<\/a>, he got several questions about the new <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/graph-and-network-algorithms.html\">graph and network algorithms<\/a> capabilities in MATLAB introduced in R2015b. He would like to do a follow-up post to address some of those questions.<\/p><!--\/introduction--><h3>Contents<\/h3><div><ul><li><a href=\"#b04e3f64-0177-4812-a876-71274a9afa93\">Questions from the Readers<\/a><\/li><li><a href=\"#f4e88310-f1ef-4bf7-bff2-2f86c9c278f3\">Plotting Large Scale Network in MATLAB<\/a><\/li><li><a href=\"#2f8e209c-0487-437a-8ce7-4684fa8db77f\">Facebook Ego Network Dataset<\/a><\/li><li><a href=\"#81cca8ce-9b3b-4786-939d-ee725f306054\">Visualize Combined Ego Networks<\/a><\/li><li><a href=\"#75388816-e8ec-4245-a574-23a4df1ad9e0\">Visualize a Single Ego Network - Degree Centrality<\/a><\/li><li><a href=\"#2f6f62fc-a7f7-423f-b5c3-ef6dc3e99ba9\">Ego Network Degree Distribution<\/a><\/li><li><a href=\"#eaaa74cb-dd1a-4408-aacc-2915d1c0c5d3\">Shortest Paths<\/a><\/li><li><a href=\"#816fc598-830d-46ab-9876-25e7d4506401\">Closeness Centrality<\/a><\/li><li><a href=\"#5e185696-152a-4867-bafc-1bed514731a5\">Can You Use Your Own Facebook Data?<\/a><\/li><li><a href=\"#18e9b049-aa7a-427c-9495-b6c9468edcfa\">Summary<\/a><\/li><\/ul><\/div><h4>Questions from the Readers<a name=\"b04e3f64-0177-4812-a876-71274a9afa93\"><\/a><\/h4><p>In <a href=\"https:\/\/blogs.mathworks.com\/loren\/2015\/09\/30\/can-we-predict-a-breakup-social-network-analysis-with-matlab\/#comment-45661\">the comment section<\/a> of my recent post about social network analysis, QC asked if there was any way to plot very large scale network (&gt;10000 nodes) with uniform degree, and <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/4781185-christine-tobler\">Christine Tobler<\/a> kindly provided an example and a link to a dataset collection. I would like to build on her comment in this post.<\/p><h4>Plotting Large Scale Network in MATLAB<a name=\"f4e88310-f1ef-4bf7-bff2-2f86c9c278f3\"><\/a><\/h4><p>Can MATLAB plot large scale network with more than 10,000 nodes? Let's start by reproducing Christine's example that plots a graph with 10,443 nodes and 20,650 edges, representing an L-shaped grid.<\/p><pre class=\"codeinput\">n = 120;\r\nA = delsq(numgrid(<span class=\"string\">'L'<\/span>,n));\r\nG = graph(A,<span class=\"string\">'OmitSelfLoops'<\/span>);\r\nplot(G)\r\ntitle(<span class=\"string\">'A graph with 10,443 nodes and 20,650 edges'<\/span>)\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/facebook_01.png\" alt=\"\"> <h4>Facebook Ego Network Dataset<a name=\"2f8e209c-0487-437a-8ce7-4684fa8db77f\"><\/a><\/h4><p>For datasets, Christine suggested the Stanford Large Network Dataset Collection. I decided to take her up on her suggestion using its <a href=\"https:\/\/snap.stanford.edu\/data\/egonets-Facebook.html\">Facebook dataset<\/a>. This dataset contains anonymized personal networks of connections between friends of survey participants. Such personal networks represent friendships of a focal node, known as \"ego\" node, and such networks are therefore called \"ego\" networks.<\/p><p>Let's download \"facebook.tar.gz\" and extract its content into a \"facebook\" directory in the current folder. Each file starts with a node id and ends with suffix like \".circle\", or \".edges\". Those are ids of the \"ego\" nodes. We can run <tt><a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/loadFBdata.m\">loadFBdata.m<\/a><\/tt> to load data from those files. I will just reload the <a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/facebook.mat\">preprocessed data<\/a>.<\/p><pre class=\"codeinput\">clearvars                                           <span class=\"comment\">% clear workspace<\/span>\r\n<span class=\"comment\">% loadFBdata                                        % run script<\/span>\r\nload <span class=\"string\">facebook<\/span>                                       <span class=\"comment\">% or load mat file<\/span>\r\nwho\r\n<\/pre><pre class=\"codeoutput\">\r\nYour variables are:\r\n\r\ncircles    egofeat    feat       graphs     \r\nedges      egoids     featnames  \r\n\r\n<\/pre><h4>Visualize Combined Ego Networks<a name=\"81cca8ce-9b3b-4786-939d-ee725f306054\"><\/a><\/h4><p>Let's first combine all 10 ego networks into a <tt><a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/graph.html\">graph<\/a><\/tt> and visualize them in a single <tt><a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/graph.plot.html\">plot<\/a><\/tt>. This combined network has a little fewer than 4,000 nodes but with over 84,000 edges. It also includes some ego nodes, which means those survey participants were not entirely unrelated to one another.<\/p><pre class=\"codeinput\">comb = vertcat(edges{:});                           <span class=\"comment\">% combine edges<\/span>\r\ncomb = sort(comb, 2);                               <span class=\"comment\">% sort edge order<\/span>\r\ncomb = unique(comb,<span class=\"string\">'rows'<\/span>);                         <span class=\"comment\">% remove duplicates<\/span>\r\ncomb = comb + 1;                                    <span class=\"comment\">% convert to 1-indexing<\/span>\r\ncombG = graph(comb(:,1),comb(:,2));                 <span class=\"comment\">% create undirected graph<\/span>\r\nnotConnected = find(degree(combG) == 0);            <span class=\"comment\">% find unconnected nodes<\/span>\r\ncombG = rmnode(combG, notConnected);                <span class=\"comment\">% remove them<\/span>\r\n\r\nedgeC = [.7 .7 .7];                                 <span class=\"comment\">% gray color<\/span>\r\n\r\nfigure\r\nH = plot(combG,<span class=\"string\">'MarkerSize'<\/span>,1,<span class=\"string\">'EdgeColor'<\/span>,edgeC, <span class=\"keyword\">...<\/span><span class=\"comment\">% plot graph<\/span>\r\n    <span class=\"string\">'EdgeAlpha'<\/span>,0.3);\r\ntitle(<span class=\"string\">'Combined Ego Networks'<\/span>)                      <span class=\"comment\">% add title<\/span>\r\ntext(17,13,sprintf(<span class=\"string\">'Total %d nodes'<\/span>, <span class=\"keyword\">...<\/span><span class=\"comment\">            % add node metric<\/span>\r\n    numnodes(combG)))\r\ntext(17,12,sprintf(<span class=\"string\">'Total %d edges'<\/span>, <span class=\"keyword\">...<\/span><span class=\"comment\">            % add edge metric<\/span>\r\n    numedges(combG)))\r\ntext(17,11,<span class=\"string\">'Ego nodes shown in red'<\/span>)                <span class=\"comment\">% add edge metric<\/span>\r\n\r\negos = intersect(egoids + 1, unique(comb));         <span class=\"comment\">% find egos in the graph<\/span>\r\nhighlight(H,egos,<span class=\"string\">'NodeColor'<\/span>,<span class=\"string\">'r'<\/span>,<span class=\"string\">'MarkerSize'<\/span>,3)    <span class=\"comment\">% highlight them in red<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/facebook_02.png\" alt=\"\"> <h4>Visualize a Single Ego Network - Degree Centrality<a name=\"75388816-e8ec-4245-a574-23a4df1ad9e0\"><\/a><\/h4><p>One of the most basic analyses you can perform on a network is link analysis. Let's figure out who are the most well connected in this graph. To make it easy to see, we can change the color by number of connections, also known as <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/graph.degree.html\">degree<\/a>, and therefore this is a metric known as degree centrality. The top 3 nodes by degree are highlighted in the plot and they all belong to the same cluster. They are very closely connected friends!<\/p><p>Please note that the ego node is not included in this analysis as <a href=\"https:\/\/snap.stanford.edu\/data\/readme-Ego.txt\">readme-Ego.txt<\/a> says:<\/p><pre class=\"language-matlab\">The <span class=\"string\">'ego'<\/span> <span class=\"string\">node<\/span> <span class=\"string\">does<\/span> <span class=\"string\">not<\/span> <span class=\"string\">appear<\/span> <span class=\"string\">(in the edge list)<\/span>, but <span class=\"string\">it<\/span> <span class=\"string\">is<\/span> <span class=\"string\">assumed<\/span>\r\nthat <span class=\"string\">they<\/span> <span class=\"string\">follow<\/span> <span class=\"string\">every<\/span> <span class=\"string\">node<\/span> <span class=\"string\">id<\/span> <span class=\"string\">that<\/span> <span class=\"string\">appears<\/span> <span class=\"string\">in<\/span> <span class=\"string\">this<\/span> <span class=\"string\">file.<\/span>\r\n<\/pre><p>By nature the ego node will always be the top node, so there is no point including it.<\/p><pre class=\"codeinput\">idx = 2;                                            <span class=\"comment\">% pick an ego node<\/span>\r\negonode = num2str(egoids(idx));                     <span class=\"comment\">% ego node name as string<\/span>\r\nG = graphs{idx};                                    <span class=\"comment\">% get its graph<\/span>\r\ndeg = degree(G);                                    <span class=\"comment\">% get node degrees<\/span>\r\nnotConnected = find(deg &lt; 2);                       <span class=\"comment\">% weakly connected nodes<\/span>\r\ndeg(notConnected) = [];                             <span class=\"comment\">% drop them from deg<\/span>\r\nG = rmnode(G, notConnected);                        <span class=\"comment\">% drop them from graph<\/span>\r\n[~, ranking] = sort(deg,<span class=\"string\">'descend'<\/span>);                 <span class=\"comment\">% get ranking by degree<\/span>\r\ntop3 = G.Nodes.Name(ranking(1:3));                  <span class=\"comment\">% get top 3 node names<\/span>\r\n\r\nfigure\r\ncolormap <span class=\"string\">cool<\/span>                                       <span class=\"comment\">% set color map<\/span>\r\nH = plot(G,<span class=\"string\">'MarkerSize'<\/span>,log(deg), <span class=\"keyword\">...<\/span><span class=\"comment\">               % node size in log scale<\/span>\r\n    <span class=\"string\">'NodeCData'<\/span>,deg,<span class=\"keyword\">...<\/span><span class=\"comment\">                             % node color by degree<\/span>\r\n    <span class=\"string\">'EdgeColor'<\/span>,edgeC,<span class=\"string\">'EdgeAlpha'<\/span>,0.3);\r\nlabelnode(H,top3,{<span class=\"string\">'#1'<\/span>,<span class=\"string\">'#2'<\/span>,<span class=\"string\">'#3'<\/span>});                 <span class=\"comment\">% label top 3 nodes<\/span>\r\ntitle({sprintf(<span class=\"string\">'Ego Network of Node %d'<\/span>, <span class=\"keyword\">...<\/span><span class=\"comment\">        % add title<\/span>\r\n    egoids(idx)); <span class=\"string\">'colored by Degree Centrality'<\/span>})\r\ntext(-1,-3,[<span class=\"string\">'top 3 nodes: '<\/span>,strjoin(top3)])         <span class=\"comment\">% annotate<\/span>\r\nH = colorbar;                                       <span class=\"comment\">% add colorbar<\/span>\r\nylabel(H, <span class=\"string\">'degrees'<\/span>)                                <span class=\"comment\">% add metric as ylabel<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/facebook_03.png\" alt=\"\"> <h4>Ego Network Degree Distribution<a name=\"2f6f62fc-a7f7-423f-b5c3-ef6dc3e99ba9\"><\/a><\/h4><p>Let's check out the <a title=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/histogram.html (link no longer works)\">histogram<\/a> of degrees between the ego network we just looked and the combined ego networks. People active on Facebook will have more edges than those not, but a few people have a large number of degrees and the majority have small number of degrees, and difference is large and looks exponential.<\/p><pre class=\"codeinput\">figure\r\nhistogram(degree(combG))                            <span class=\"comment\">% plot histogram<\/span>\r\nhold <span class=\"string\">on<\/span>                                             <span class=\"comment\">% don't overwrite<\/span>\r\nhistogram(degree(G))                                <span class=\"comment\">% overlay histogram<\/span>\r\nhold <span class=\"string\">off<\/span>                                            <span class=\"comment\">% restore default<\/span>\r\nxlabel(<span class=\"string\">'Degrees'<\/span>)                                   <span class=\"comment\">% add x axis label<\/span>\r\nylabel(<span class=\"string\">'Number of Nodes'<\/span>)                           <span class=\"comment\">% add y axis label<\/span>\r\ntitle(<span class=\"string\">'Degree Distribution'<\/span>)                        <span class=\"comment\">% add title<\/span>\r\nlegend(<span class=\"string\">'The Combined Ego Networks'<\/span>, <span class=\"keyword\">...<\/span><span class=\"comment\">             % add legend<\/span>\r\n    sprintf(<span class=\"string\">'Ego Network of Node %d'<\/span>,egoids(idx)))\r\ntext(150,700,<span class=\"string\">'Median Degrees'<\/span>)                      <span class=\"comment\">% annotate<\/span>\r\ntext(160,650,sprintf(<span class=\"string\">'* Node %d: %d'<\/span>, <span class=\"keyword\">...<\/span><span class=\"comment\">           % annotate<\/span>\r\n    egoids(idx),median(degree(G))));\r\ntext(160,600,sprintf(<span class=\"string\">'* Combo    : %d'<\/span>, <span class=\"keyword\">...<\/span><span class=\"comment\">         % annotate<\/span>\r\n    median(degree(combG))));\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/facebook_04.png\" alt=\"\"> <p>You'll notice that median degrees seem a bit small. That's because those are from nodes included in ego networks that contain nodes that are connected to ego nodes only. So we don't see other nodes that are not connected to the ego nodes (friends of friends). If you count the <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/graph.numnodes.html\">number of nodes<\/a> in each ego network, you can see the degrees of each ego node, because ego nodes are supposed to be connected to all other nodes in their respective ego networks. The median is now 1404. Is this larger or smaller than the number of your Facebook friends?<\/p><pre class=\"codeinput\">deg = cellfun(@(x) numnodes(x), graphs);            <span class=\"comment\">% degrees of all graphs<\/span>\r\nmedian_deg = median(deg)                            <span class=\"comment\">% median degrees<\/span>\r\n<\/pre><pre class=\"codeoutput\">median_deg =\r\n        1404\r\n<\/pre><h4>Shortest Paths<a name=\"eaaa74cb-dd1a-4408-aacc-2915d1c0c5d3\"><\/a><\/h4><p>We looked at degrees as a metric to evaluate nodes, and it makes sense - the more friends a node has, the better connected it is. Another common metric is <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/graph.shortestpath.html\">shortest paths<\/a>. While degrees measure direct connections only, shortest paths consider how many hops at mininum you need to make to traverse from one node to another. Let's look at an example of the shortest path between the top node 1888 and another node 483.<\/p><pre class=\"codeinput\">[path, d] = shortestpath(G,top3{1},<span class=\"string\">'483'<\/span>);          <span class=\"comment\">% get shortest path<\/span>\r\n\r\nfigure\r\nH = plot(G,<span class=\"string\">'MarkerSize'<\/span>,1,<span class=\"string\">'EdgeColor'<\/span>,edgeC);       <span class=\"comment\">% plot graph<\/span>\r\nhighlight(H,path,<span class=\"string\">'NodeColor'<\/span>,<span class=\"string\">'r'<\/span>, <span class=\"keyword\">...<\/span><span class=\"comment\">               % highlight path<\/span>\r\n    <span class=\"string\">'MarkerSize'<\/span>,3,<span class=\"string\">'EdgeColor'<\/span>,<span class=\"string\">'r'<\/span>,<span class=\"string\">'LineWidth'<\/span>,2)\r\nlabelnode(H,path, [{<span class=\"string\">'Top node'<\/span>} path(2:end)])       <span class=\"comment\">% label nodes<\/span>\r\ntitle(<span class=\"string\">'Shortest Path between Top Node and Node 483'<\/span>)<span class=\"comment\">% add title<\/span>\r\ntext(1,-3,sprintf(<span class=\"string\">'Distance: %d hops'<\/span>,d))           <span class=\"comment\">% annotate<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/facebook_05.png\" alt=\"\"> <h4>Closeness Centrality<a name=\"816fc598-830d-46ab-9876-25e7d4506401\"><\/a><\/h4><p>Distances measured by shortest paths can be used to compute closeness centrality, as defined in <a href=\"https:\/\/en.wikipedia.org\/wiki\/Centrality\">Wikipedia<\/a>. Let's reload the pre-computed <tt><a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/dist.mat\">distances<\/a><\/tt> using the <tt><a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/spdist.m\">spdist<\/a><\/tt> function I wrote. Those with high closeness scores are the ones you want to start with when you want to spread news through your ego network.<\/p><pre class=\"codeinput\">load <span class=\"string\">dist<\/span>\r\n\r\ncloseness = 1.\/sum(dist);                           <span class=\"comment\">% compute closeness<\/span>\r\n[~, ranking] = sort(closeness, <span class=\"string\">'descend'<\/span>);          <span class=\"comment\">% get ranking by closeness<\/span>\r\ntop3 = G.Nodes.Name(ranking(1:3));                  <span class=\"comment\">% get top 3 node names<\/span>\r\n\r\nfigure\r\ncolormap <span class=\"string\">cool<\/span>                                       <span class=\"comment\">% set color map<\/span>\r\nH = plot(G,<span class=\"string\">'MarkerSize'<\/span>,closeness*10000, <span class=\"keyword\">...<\/span><span class=\"comment\">        % node size by closeness<\/span>\r\n    <span class=\"string\">'NodeCData'<\/span>,closeness,<span class=\"keyword\">...<\/span><span class=\"comment\">                       % node color by closeness<\/span>\r\n    <span class=\"string\">'EdgeColor'<\/span>,edgeC,<span class=\"string\">'EdgeAlpha'<\/span>,0.3);\r\nlabelnode(H,top3,{<span class=\"string\">'#1'<\/span>,<span class=\"string\">'#2'<\/span>,<span class=\"string\">'#3'<\/span>});                 <span class=\"comment\">% label top 3 nodes<\/span>\r\ntitle({sprintf(<span class=\"string\">'Ego Network of Node %d'<\/span>, <span class=\"keyword\">...<\/span><span class=\"comment\">        % add title<\/span>\r\n    egoids(idx)); <span class=\"string\">'colored by Closeness Centrality'<\/span>})\r\ntext(-1,-3,[<span class=\"string\">'top 3 nodes: '<\/span>,strjoin(top3)])         <span class=\"comment\">% annotate<\/span>\r\nH = colorbar;                                       <span class=\"comment\">% add colorbar<\/span>\r\nylabel(H, <span class=\"string\">'closeness'<\/span>)                              <span class=\"comment\">% add metric as ylabel<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/facebook_06.png\" alt=\"\"> <h4>Can You Use Your Own Facebook Data?<a name=\"5e185696-152a-4867-bafc-1bed514731a5\"><\/a><\/h4><p>Hopefully this post has provided a sufficient basis for further exploration with the SNAP Collection dataset. You may also want to try this on your own data. Unfortunately, you can't analyze your own Facebook friends graph because Facebook discontinued this API service. You can, however, use apps like <a href=\"https:\/\/apps.facebook.com\/netvizz\/\">Netvizz<\/a> to extract a \"page like network\", which represents Facebook pages connected through likes. Here is the plot that shows the network of Facebook pages connected to the MATLAB page through likes using a <tt><a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/fbpagegraph.mat\">pre-computed graph<\/a><\/tt>. Because this is a directed graph, we will use <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/digraph.indegree.html\">in-degree<\/a> as the metric. It means we only count when a page is liked by other pages, but not when it likes others.<\/p><pre class=\"codeinput\">load <span class=\"string\">fbpagegraph<\/span>                                    <span class=\"comment\">% reload data<\/span>\r\ndeg = indegree(G);                                  <span class=\"comment\">% get in-degrees<\/span>\r\n[~,ranking] = sort(deg,<span class=\"string\">'descend'<\/span>);                  <span class=\"comment\">% rank by in-degrees<\/span>\r\ntop5 = G.Nodes.Name(ranking(1:5));                  <span class=\"comment\">% get top 5<\/span>\r\n\r\nfigure\r\ncolormap <span class=\"string\">cool<\/span>                                       <span class=\"comment\">% set colormap to cool<\/span>\r\nH = plot(G,<span class=\"string\">'MarkerSize'<\/span>,log(deg+2)*2, <span class=\"keyword\">...<\/span><span class=\"comment\">           % log scale node size by in-degree<\/span>\r\n    <span class=\"string\">'NodeCData'<\/span>,deg, <span class=\"keyword\">...<\/span><span class=\"comment\">                            % color by in-degree<\/span>\r\n    <span class=\"string\">'EdgeColor'<\/span>,edgeC,<span class=\"string\">'EdgeAlpha'<\/span>, 0.3);\r\nlabelnode(H,<span class=\"string\">'MATLAB'<\/span>,<span class=\"string\">'MATLAB'<\/span>)                      <span class=\"comment\">% highlight MATLAB<\/span>\r\nlabelnode(H,top5,{<span class=\"string\">'Make: Magazine'<\/span>,<span class=\"string\">'NOAA'<\/span>,<span class=\"string\">'NWS'<\/span>,<span class=\"string\">'Maker Faire Rome'<\/span>,<span class=\"string\">'Maker Faire'<\/span>})\r\nH = colorbar;                                       <span class=\"comment\">% add colorbar<\/span>\r\nylabel(H, <span class=\"string\">'in-degrees'<\/span>)                             <span class=\"comment\">% add metric<\/span>\r\ntitle(<span class=\"string\">'Facebook Page Like Network Colored by In-Degree'<\/span>)\r\ntext(-2.8,3.5,<span class=\"string\">'a network of pages connected through likes (directed)'<\/span>)\r\nann = {lab,top5};                                   <span class=\"comment\">% generate label<\/span>\r\ntext(pos(:,1),pos(:,2),strcat(ann{:}))              <span class=\"comment\">% add annotations<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/facebook_07.png\" alt=\"\"> <h4>Summary<a name=\"18e9b049-aa7a-427c-9495-b6c9468edcfa\"><\/a><\/h4><p>We only scratched the surface with the SNAP Collection - just one ego network out of 10 for Facebook, and each comes with more anonymized meta data, such as eduction, hometown, etc. and you can figure out what binds those close-knit friends by analyzing common attributes. Furthermore, the <a href=\"https:\/\/snap.stanford.edu\/data\/index.html\">SNAP Collection<\/a> also includes datasets from other sources, sugh as Twitter and Google Plus. You can also use Netvizz to extract data on Facebook pages you liked. Play around with those datasets and <a href=\"https:\/\/blogs.mathworks.com\/loren\/?p=1297#respond\">let us know<\/a> what you find!<\/p><script language=\"JavaScript\"> <!-- \r\n    function grabCode_0362d178e33e4befb1a3dc794f4eb778() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='0362d178e33e4befb1a3dc794f4eb778 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 0362d178e33e4befb1a3dc794f4eb778';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2016 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_0362d178e33e4befb1a3dc794f4eb778()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2015b<br><\/p><\/div><!--\r\n0362d178e33e4befb1a3dc794f4eb778 ##### SOURCE BEGIN #####\r\n%% Visualizing Facebook Networks with MATLAB\r\n% When one of my guest bloggers, <https:\/\/twitter.com\/toshi2fly Toshi>\r\n% posted,\r\n% <https:\/\/blogs.mathworks.com\/loren\/2015\/09\/30\/can-we-predict-a-breakup-social-network-analysis-with-matlab\/\r\n% Can We Predict a Breakup? Social Network Analysis with MATLAB>, he got\r\n% several questions about the new\r\n% <https:\/\/www.mathworks.com\/help\/matlab\/graph-and-network-algorithms.html\r\n% graph and network algorithms> capabilities in MATLAB introduced in\r\n% R2015b. He would like to do a follow-up post to address some of those\r\n% questions.\r\n%\r\n%% Questions from the Readers\r\n% In\r\n% <https:\/\/blogs.mathworks.com\/loren\/2015\/09\/30\/can-we-predict-a-breakup-social-network-analysis-with-matlab\/#comment-45661\r\n% the comment section> of my recent post about social network analysis, QC\r\n% asked if there was any way to plot very large scale network (>10000\r\n% nodes) with uniform degree, and\r\n% <https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/4781185-christine-tobler\r\n% Christine Tobler> kindly provided an example and a link to a dataset\r\n% collection. I would like to build on her comment in this post.\r\n%\r\n%% Plotting Large Scale Network in MATLAB\r\n% Can MATLAB plot large scale network with more than 10,000 nodes? Let's\r\n% start by reproducing Christine's example that plots a graph with 10,443\r\n% nodes and 20,650 edges, representing an L-shaped grid.\r\n\r\nn = 120;\r\nA = delsq(numgrid('L',n));\r\nG = graph(A,'OmitSelfLoops');\r\nplot(G)\r\ntitle('A graph with 10,443 nodes and 20,650 edges')\r\n\r\n%% Facebook Ego Network Dataset\r\n% For datasets, Christine suggested the Stanford Large Network Dataset\r\n% Collection. I decided to take her up on her suggestion using its\r\n% <https:\/\/snap.stanford.edu\/data\/egonets-Facebook.html Facebook dataset>.\r\n% This dataset contains anonymized personal networks of connections between\r\n% friends of survey participants. Such personal networks represent\r\n% friendships of a focal node, known as \"ego\" node, and such networks are\r\n% therefore called \"ego\" networks.\r\n%\r\n% Let's download \"facebook.tar.gz\" and extract its content into a\r\n% \"facebook\" directory in the current folder. Each file starts with a node\r\n% id and ends with suffix like \".circle\", or \".edges\". Those are ids of the\r\n% \"ego\" nodes. We can run\r\n% |<https:\/\/blogs.mathworks.com\/images\/loren\/2016\/loadFBdata.m\r\n% loadFBdata.m>|\r\n% to load data from those files. I will just reload the\r\n% <https:\/\/blogs.mathworks.com\/images\/loren\/2016\/facebook.mat preprocessed\r\n% data>.\r\n\r\nclearvars                                           % clear workspace\r\n% loadFBdata                                        % run script\r\nload facebook                                       % or load mat file\r\nwho\r\n\r\n%% Visualize Combined Ego Networks\r\n% Let's first combine all 10 ego networks into a\r\n% |<https:\/\/www.mathworks.com\/help\/matlab\/ref\/graph.html graph>| and visualize\r\n% them in a single\r\n% |<https:\/\/www.mathworks.com\/help\/matlab\/ref\/graph.plot.html plot>|. This\r\n% combined network has a little fewer than 4,000 nodes but with over 84,000\r\n% edges. It also includes some ego nodes, which means those survey\r\n% participants were not entirely unrelated to one another.\r\n\r\ncomb = vertcat(edges{:});                           % combine edges\r\ncomb = sort(comb, 2);                               % sort edge order\r\ncomb = unique(comb,'rows');                         % remove duplicates\r\ncomb = comb + 1;                                    % convert to 1-indexing\r\ncombG = graph(comb(:,1),comb(:,2));                 % create undirected graph\r\nnotConnected = find(degree(combG) == 0);            % find unconnected nodes\r\ncombG = rmnode(combG, notConnected);                % remove them\r\n\r\nedgeC = [.7 .7 .7];                                 % gray color\r\n\r\nfigure\r\nH = plot(combG,'MarkerSize',1,'EdgeColor',edgeC, ...% plot graph\r\n    'EdgeAlpha',0.3); \r\ntitle('Combined Ego Networks')                      % add title\r\ntext(17,13,sprintf('Total %d nodes', ...            % add node metric\r\n    numnodes(combG)))     \r\ntext(17,12,sprintf('Total %d edges', ...            % add edge metric\r\n    numedges(combG)))\r\ntext(17,11,'Ego nodes shown in red')                % add edge metric\r\n  \r\negos = intersect(egoids + 1, unique(comb));         % find egos in the graph\r\nhighlight(H,egos,'NodeColor','r','MarkerSize',3)    % highlight them in red \r\n\r\n%% Visualize a Single Ego Network - Degree Centrality\r\n% One of the most basic analyses you can perform on a network is link\r\n% analysis. Let's figure out who are the most well connected in this graph.\r\n% To make it easy to see, we can change the color by number of connections,\r\n% also known as <https:\/\/www.mathworks.com\/help\/matlab\/ref\/graph.degree.html\r\n% degree>, and therefore this is a metric known as degree centrality. The\r\n% top 3 nodes by degree are highlighted in the plot and they all belong to\r\n% the same cluster. They are very closely connected friends!\r\n% \r\n% Please note that the ego node is not included in this analysis as\r\n% <https:\/\/snap.stanford.edu\/data\/readme-Ego.txt readme-Ego.txt> says:\r\n% \r\n%   The 'ego' node does not appear (in the edge list), but it is assumed\r\n%   that they follow every node id that appears in this file.\r\n%\r\n% By nature the ego node will always be the top node, so there is no point\r\n% including it.\r\n\r\nidx = 2;                                            % pick an ego node\r\negonode = num2str(egoids(idx));                     % ego node name as string\r\nG = graphs{idx};                                    % get its graph\r\ndeg = degree(G);                                    % get node degrees\r\nnotConnected = find(deg < 2);                       % weakly connected nodes\r\ndeg(notConnected) = [];                             % drop them from deg           \r\nG = rmnode(G, notConnected);                        % drop them from graph\r\n[~, ranking] = sort(deg,'descend');                 % get ranking by degree\r\ntop3 = G.Nodes.Name(ranking(1:3));                  % get top 3 node names\r\n\r\nfigure\r\ncolormap cool                                       % set color map\r\nH = plot(G,'MarkerSize',log(deg), ...               % node size in log scale\r\n    'NodeCData',deg,...                             % node color by degree\r\n    'EdgeColor',edgeC,'EdgeAlpha',0.3);\r\nlabelnode(H,top3,{'#1','#2','#3'});                 % label top 3 nodes\r\ntitle({sprintf('Ego Network of Node %d', ...        % add title\r\n    egoids(idx)); 'colored by Degree Centrality'})\r\ntext(-1,-3,['top 3 nodes: ',strjoin(top3)])         % annotate                                    \r\nH = colorbar;                                       % add colorbar\r\nylabel(H, 'degrees')                                % add metric as ylabel\r\n\r\n%% Ego Network Degree Distribution\r\n% Let's check out the\r\n% <https:\/\/www.mathworks.com\/help\/matlab\/ref\/histogram.html histogram> of\r\n% degrees between the ego network we just looked and the combined ego\r\n% networks. People active on Facebook will have more edges than those not,\r\n% but a few people have a large number of degrees and the majority have small\r\n% number of degrees, and difference is large and looks exponential.\r\n\r\nfigure\r\nhistogram(degree(combG))                            % plot histogram\r\nhold on                                             % don't overwrite\r\nhistogram(degree(G))                                % overlay histogram\r\nhold off                                            % restore default\r\nxlabel('Degrees')                                   % add x axis label\r\nylabel('Number of Nodes')                           % add y axis label\r\ntitle('Degree Distribution')                        % add title\r\nlegend('The Combined Ego Networks', ...             % add legend\r\n    sprintf('Ego Network of Node %d',egoids(idx)))\r\ntext(150,700,'Median Degrees')                      % annotate\r\ntext(160,650,sprintf('* Node %d: %d', ...           % annotate\r\n    egoids(idx),median(degree(G))));\r\ntext(160,600,sprintf('* Combo    : %d', ...         % annotate\r\n    median(degree(combG))));\r\n\r\n%%\r\n% You'll notice that median degrees seem a bit small. That's because those\r\n% are from nodes included in ego networks that contain nodes that are\r\n% connected to ego nodes only. So we don't see other nodes that are not\r\n% connected to the ego nodes (friends of friends). If you count the\r\n% <https:\/\/www.mathworks.com\/help\/matlab\/ref\/graph.numnodes.html number of\r\n% nodes> in each ego network, you can see the degrees of each ego node,\r\n% because ego nodes are supposed to be connected to all other nodes in\r\n% their respective ego networks. The median is now 1404. Is this larger or\r\n% smaller than the number of your Facebook friends?\r\n\r\ndeg = cellfun(@(x) numnodes(x), graphs);            % degrees of all graphs\r\nmedian_deg = median(deg)                            % median degrees\r\n\r\n%% Shortest Paths \r\n% We looked at degrees as a metric to evaluate nodes, and it makes sense -\r\n% the more friends a node has, the better connected it is. Another common\r\n% metric is\r\n% <https:\/\/www.mathworks.com\/help\/matlab\/ref\/graph.shortestpath.html\r\n% shortest paths>. While degrees measure direct connections only, shortest\r\n% paths consider how many hops at mininum you need to make to traverse from\r\n% one node to another. Let's look at an example of the shortest path\r\n% between the top node 1888 and another node 483.\r\n\r\n[path, d] = shortestpath(G,top3{1},'483');          % get shortest path \r\n\r\nfigure\r\nH = plot(G,'MarkerSize',1,'EdgeColor',edgeC);       % plot graph\r\nhighlight(H,path,'NodeColor','r', ...               % highlight path\r\n    'MarkerSize',3,'EdgeColor','r','LineWidth',2)\r\nlabelnode(H,path, [{'Top node'} path(2:end)])       % label nodes\r\ntitle('Shortest Path between Top Node and Node 483')% add title\r\ntext(1,-3,sprintf('Distance: %d hops',d))           % annotate\r\n\r\n%% Closeness Centrality\r\n% Distances measured by shortest paths can be used to compute closeness\r\n% centrality, as defined in <https:\/\/en.wikipedia.org\/wiki\/Centrality\r\n% Wikipedia>. Let's reload the pre-computed\r\n% |<https:\/\/blogs.mathworks.com\/images\/loren\/2016\/dist.mat distances>| using\r\n% the |<https:\/\/blogs.mathworks.com\/images\/loren\/2016\/spdist.m spdist>|\r\n% function I wrote. Those with high closeness scores are the ones you want\r\n% to start with when you want to spread news through your ego network.\r\n\r\nload dist\r\n\r\ncloseness = 1.\/sum(dist);                           % compute closeness\r\n[~, ranking] = sort(closeness, 'descend');          % get ranking by closeness\r\ntop3 = G.Nodes.Name(ranking(1:3));                  % get top 3 node names              \r\n\r\nfigure\r\ncolormap cool                                       % set color map\r\nH = plot(G,'MarkerSize',closeness*10000, ...        % node size by closeness\r\n    'NodeCData',closeness,...                       % node color by closeness\r\n    'EdgeColor',edgeC,'EdgeAlpha',0.3);\r\nlabelnode(H,top3,{'#1','#2','#3'});                 % label top 3 nodes\r\ntitle({sprintf('Ego Network of Node %d', ...        % add title\r\n    egoids(idx)); 'colored by Closeness Centrality'})\r\ntext(-1,-3,['top 3 nodes: ',strjoin(top3)])         % annotate                                    \r\nH = colorbar;                                       % add colorbar\r\nylabel(H, 'closeness')                              % add metric as ylabel\r\n\r\n%% Can You Use Your Own Facebook Data?\r\n% Hopefully this post has provided a sufficient basis for further\r\n% exploration with the SNAP Collection dataset. You may also want to try\r\n% this on your own data. Unfortunately, you can't analyze your own Facebook\r\n% friends graph because Facebook discontinued this API service. You can,\r\n% however, use apps like <https:\/\/apps.facebook.com\/netvizz\/ Netvizz> to\r\n% extract a \"page like network\", which represents Facebook pages connected\r\n% through likes. Here is the plot that shows the network of Facebook pages\r\n% connected to the MATLAB page through likes using a\r\n% |<https:\/\/blogs.mathworks.com\/images\/loren\/2016\/fbpagegraph.mat\r\n% pre-computed graph>|. Because this is a directed graph, we will use\r\n% <https:\/\/www.mathworks.com\/help\/matlab\/ref\/digraph.indegree.html\r\n% in-degree> as the metric. It means we only count when a page is liked by\r\n% other pages, but not when it likes others.\r\n\r\nload fbpagegraph                                    % reload data\r\ndeg = indegree(G);                                  % get in-degrees\r\n[~,ranking] = sort(deg,'descend');                  % rank by in-degrees\r\ntop5 = G.Nodes.Name(ranking(1:5));                  % get top 5\r\n\r\nfigure\r\ncolormap cool                                       % set colormap to cool\r\nH = plot(G,'MarkerSize',log(deg+2)*2, ...           % log scale node size by in-degree\r\n    'NodeCData',deg, ...                            % color by in-degree\r\n    'EdgeColor',edgeC,'EdgeAlpha', 0.3);\r\nlabelnode(H,'MATLAB','MATLAB')                      % highlight MATLAB\r\nlabelnode(H,top5,{'Make: Magazine','NOAA','NWS','Maker Faire Rome','Maker Faire'})\r\nH = colorbar;                                       % add colorbar\r\nylabel(H, 'in-degrees')                             % add metric\r\ntitle('Facebook Page Like Network Colored by In-Degree')\r\ntext(-2.8,3.5,'a network of pages connected through likes (directed)')\r\nann = {lab,top5};                                   % generate label\r\ntext(pos(:,1),pos(:,2),strcat(ann{:}))              % add annotations\r\n\r\n%% Summary\r\n% We only scratched the surface with the SNAP Collection - just one ego\r\n% network out of 10 for Facebook, and each comes with more anonymized meta\r\n% data, such as eduction, hometown, etc. and you can figure out what binds\r\n% those close-knit friends by analyzing common attributes. Furthermore, the\r\n% <https:\/\/snap.stanford.edu\/data\/index.html SNAP Collection> also includes\r\n% datasets from other sources, sugh as Twitter and Google Plus. You can\r\n% also use Netvizz to extract data on Facebook pages you liked. Play around\r\n% with those datasets and <https:\/\/blogs.mathworks.com\/loren\/?p=1297#respond\r\n% let us know> what you find!\r\n\r\n##### SOURCE END ##### 0362d178e33e4befb1a3dc794f4eb778\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/facebook_07.png\" onError=\"this.style.display ='none';\" \/><\/div><!--introduction--><p>When one of my guest bloggers, <a href=\"https:\/\/twitter.com\/toshi2fly\">Toshi<\/a> posted, <a href=\"https:\/\/blogs.mathworks.com\/loren\/2015\/09\/30\/can-we-predict-a-breakup-social-network-analysis-with-matlab\/\">Can We Predict a Breakup? Social Network Analysis with MATLAB<\/a>, he got several questions about the new <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/graph-and-network-algorithms.html\">graph and network algorithms<\/a> capabilities in MATLAB introduced in R2015b. He would like to do a follow-up post to address some of those questions.... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/loren\/2016\/02\/03\/visualizing-facebook-networks-with-matlab\/\">read more >><\/a><\/p>","protected":false},"author":39,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[63,66,61],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/1297"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/comments?post=1297"}],"version-history":[{"count":2,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/1297\/revisions"}],"predecessor-version":[{"id":1300,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/1297\/revisions\/1300"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/media?parent=1297"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/categories?post=1297"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/tags?post=1297"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}