{"id":688,"date":"2013-06-10T12:48:46","date_gmt":"2013-06-10T17:48:46","guid":{"rendered":"https:\/\/blogs.mathworks.com\/cleve\/?p=688"},"modified":"2018-07-01T15:06:28","modified_gmt":"2018-07-01T20:06:28","slug":"lake-arrowhead-coauthor-graph","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/cleve\/2013\/06\/10\/lake-arrowhead-coauthor-graph\/","title":{"rendered":"Lake Arrowhead Coauthor Graph"},"content":{"rendered":"<div class=\"content\"><!--introduction--><p>Twenty years ago, during the Householder Symposium at the Lake Arrowhead conference center , John Gilbert carried out one of the world's first computational social network analyses.<\/p><!--\/introduction--><h3>Contents<\/h3><div><ul><li><a href=\"#9f7001e9-5ac7-4ea5-b959-7877b19e22ac\">Householder XII<\/a><\/li><li><a href=\"#1cc01ee1-1d78-45f9-b93e-c829caa37255\">John Gilbert Remembers<\/a><\/li><li><a href=\"#71394432-39d6-4743-8887-1652bb678be0\">Coauthor Matrix<\/a><\/li><li><a href=\"#795aad32-9a4f-4652-98c6-9036487d5300\">Most Prolific<\/a><\/li><li><a href=\"#20fe731d-494e-40bd-bb5e-2996b68ed295\">Reorder the Authors<\/a><\/li><li><a href=\"#9a7d7140-bf22-471a-97da-3fdeed08f7f7\">Reverse Cuthill McKee<\/a><\/li><li><a href=\"#95747658-b5ad-4892-8675-c59edbb8f193\">Coauthor Graph<\/a><\/li><li><a href=\"#6d8175a8-b5fd-40f0-bdf9-30e37b577c00\">My Connectivity<\/a><\/li><li><a href=\"#452e0423-b532-4021-a9fa-5af47cc8de03\">Arrowhead<\/a><\/li><\/ul><\/div><h4>Householder XII<a name=\"9f7001e9-5ac7-4ea5-b959-7877b19e22ac\"><\/a><\/h4><p>Twenty years ago, in June, 1993, the UCLA Lake Arrowhead Conference Center in the San Bernardino Mountains, 90 miles east of Los Angeles, was the site of the Householder XII Symposium on Numerical Linear Algebra, organized by Gene Golub and Tony Chan.<\/p><h4>John Gilbert Remembers<a name=\"1cc01ee1-1d78-45f9-b93e-c829caa37255\"><\/a><\/h4><p>John Gilbert is now a Professor at U. C. Santa Barbara. A couple of years before the Lake Arrowhead meeting, in 1990 and 1991, John, Rob Schreiber, and I had developed the initial implementation of sparse matrices in MATLAB.  John's research activities now include sparse matrix techniques for graph and social network problems.  He recently wrote to me in an email, \"It was at Householder 1993 in Lake Arrowhead that we used MATLAB's sparse matrices to do the first computational social network analysis I ever saw.\"<\/p><p>John used to be at PARC, the Xerox Palo Alto Research Center.  A file containing his work from 1993 was archived at PARC, but is no longer available,(ftp:\/\/ftp.parc.xerox.com\/pub\/gilbert\/graph.html). Most of this blog post was derived from that archived work.<\/p><h4>Coauthor Matrix<a name=\"71394432-39d6-4743-8887-1652bb678be0\"><\/a><\/h4><p>Nick Trefethen posted a flip chart with Gene Golub's name in the center. He invited everyone present to add their name to the chart and draw lines connecting their name with the names of all their coauthors.  The diagram grew denser throughout the week.  At the end it was a graph with 104 vertices (or people) and 211 edges. We entered the names and coauthor connections into MATLAB, creating an adjacency matrix <tt>A<\/tt>.  Let's retrieve the names and matrix as they are stored in the PARC archive.<\/p><pre class=\"codeinput\">load <span class=\"string\">housegraph<\/span>\r\nsize(A)\r\n<\/pre><pre class=\"codeoutput\">ans =\r\n   104   104\r\n<\/pre><p>The matrix <tt>A<\/tt> is 104-by-104 and symmetric. Elements <tt>A(i,j)<\/tt> and <tt>A(j,i)<\/tt> are both equal to one if <tt>i<\/tt> and <tt>j<\/tt> are the indices of coauthors and equal to zero otherwise.  Most of the elements are zero because most pairs of attendees are not coauthors. So the matrix is sparse.  In fact, the sparsity, the fraction of nonzero elements, is less that five percent.<\/p><pre class=\"codeinput\">format <span class=\"string\">short<\/span>\r\nsparsity = nnz(A)\/prod(size(A))\r\n<\/pre><pre class=\"codeoutput\">sparsity =\r\n    0.0486\r\n<\/pre><p>Here is a picture of the matrix, a <tt>spy<\/tt> plot that shows the location of the nonzeros, with Golub in the first column and the other authors in the fairly arbitrary order in which they appeared on Trefethen's flip chart.<\/p><pre class=\"codeinput\">spy(A);\r\ntitle(<span class=\"string\">'Coauthor matrix'<\/span>);\r\nsnapnow\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/cleve\/arrowhead_01.png\" alt=\"\"> <h4>Most Prolific<a name=\"795aad32-9a4f-4652-98c6-9036487d5300\"><\/a><\/h4><p>As the adjacency matrix is loaded from the archive, the most prolific coauthor is already in the first column. It was not a surprise to those of us at the conference to find that the most prolific coauthor is Gene Golub.<\/p><pre class=\"codeinput\">m = find(sum(A) == max(sum(A)))\r\nname(m,:)\r\n<\/pre><pre class=\"codeoutput\">m =\r\n     1\r\nans =\r\nGolub               \r\n<\/pre><h4>Reorder the Authors<a name=\"20fe731d-494e-40bd-bb5e-2996b68ed295\"><\/a><\/h4><p>We want to create a circular plot with Golub in the center, the other authors around the circumference, and edges connecting the coauthors. If we were to place the authors around the circumference in the order we retrieve them from the chart, the edges would cross the circle pretty much randomly.  We want to rearrange the authors so that the coauthor connections are as close as possible to the circumference of the circle. This corresponds to a symmetric permutation of the matrix that minimizes, or at least reduces, its bandwidth.<\/p><h4>Reverse Cuthill McKee<a name=\"9a7d7140-bf22-471a-97da-3fdeed08f7f7\"><\/a><\/h4><p>The Reverse Cuthill McKee algorithm is the only algorithm I know that is usually used backwards.  In 1969 Elizabeth Cuthill and J. McKee described a heuristic for reordering the rows and columns of a matrix to reduce its bandwidth.  In 1991 J. Alan George pointed out that reversing the Cuthill McKee ordering almost always leads to fewer arithmetic operations in Gaussian elimination.  The algorithm applies to symmetric matrices, so MATLAB has a <tt>symrcm<\/tt> function, but no <tt>symcm<\/tt> function. Here we are reducing bandwidth; we are not doing any arithmetic, so the difference between Cuthill-McKee and Reverse-Cuthill-McKee is purely cosmetic.<\/p><h4>Coauthor Graph<a name=\"95747658-b5ad-4892-8675-c59edbb8f193\"><\/a><\/h4><pre class=\"codeinput\">r = symrcm(A(2:end,2:end));\r\nprcm = [1 r+1];\r\nspy(A(prcm,prcm))\r\ntitle(<span class=\"string\">'Coauthor matrix with reduced bandwidth'<\/span>);\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/cleve\/arrowhead_02.png\" alt=\"\"> <p>Now we are were able to actually plot the coauthor graph.  Fortunately, plotting text at arbitrary angles had just been provided in MATLAB.<\/p><pre class=\"codeinput\">drawit;\r\nsnapnow\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/cleve\/arrowhead_03.png\" alt=\"\"> <h4>My Connectivity<a name=\"6d8175a8-b5fd-40f0-bdf9-30e37b577c00\"><\/a><\/h4><p>You will find my name at about 5 o'clock on the coauthor graph.  There is no edge connecting me to Golub because Gene and I were (not yet) coauthors. But how many coauthors do we have in common? Taking the second power of the connectivity matrix gives the paths of length two.<\/p><pre class=\"codeinput\">A2 = A^2;\r\ncount = A2(Golub,Moler)\r\n<\/pre><pre class=\"codeoutput\">count =\r\n   (1,1)        2\r\n<\/pre><p>And who are those common coauthors?<\/p><pre class=\"codeinput\">twos = name( find ( A(:,Golub) .* A(:,Moler) ), : )\r\n<\/pre><pre class=\"codeoutput\">twos =\r\nWilkinson           \r\nVanLoan             \r\n<\/pre><h4>Arrowhead<a name=\"452e0423-b532-4021-a9fa-5af47cc8de03\"><\/a><\/h4><p>Finally, we find a minimum degree reordering of the matrix and discover its true nature as, of course, ....<\/p><pre class=\"codeinput\">pmd = amd(A);\r\nspy(A(pmd,pmd))\r\ntitle(<span class=\"string\">'The (Lake) Arrowhead Matrix'<\/span>);\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/cleve\/arrowhead_04.png\" alt=\"\"> <script language=\"JavaScript\"> <!-- \r\n    function grabCode_ed73abde325d4daaabfc5c9e97cf02a4() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='ed73abde325d4daaabfc5c9e97cf02a4 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' ed73abde325d4daaabfc5c9e97cf02a4';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2013 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_ed73abde325d4daaabfc5c9e97cf02a4()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2013b<br><\/p><p class=\"footer\"><br>\r\n      Published with MATLAB&reg; R2013b<br><\/p><\/div><!--\r\ned73abde325d4daaabfc5c9e97cf02a4 ##### SOURCE BEGIN #####\r\n%% Lake Arrowhead Coauthor Graph \r\n% Twenty years ago, during the Householder Symposium at the Lake Arrowhead\r\n% conference center , John Gilbert carried out one of the world's first\r\n% computational social network analyses. \r\n\r\n%% Householder XII\r\n% Twenty years ago, in June, 1993, the UCLA Lake Arrowhead Conference Center\r\n% in the San Bernardino Mountains, 90 miles east of Los Angeles, was the site\r\n% of the Householder XII Symposium on Numerical Linear Algebra, organized by\r\n% Gene Golub and Tony Chan.\r\n\r\n%% John Gilbert Remembers\r\n% John Gilbert is now a Professor at U. C. Santa Barbara.\r\n% A couple of years before the Lake Arrowhead meeting, in 1990 and 1991,\r\n% John, Rob Schreiber, and I had developed the initial implementation of\r\n% sparse matrices in MATLAB.  John's research activities now include sparse\r\n% matrix techniques for graph and social network problems.  He recently wrote\r\n% to me in an email, \"It was at Householder 1993 in Lake Arrowhead that we\r\n% used MATLAB's sparse matrices to do the first computational social network\r\n% analysis I ever saw.\"\r\n%\r\n% John used to be at PARC, the Xerox Palo Alto Research Center.  A file\r\n% containing his work from 1993 is still archived at PARC,\r\n% <ftp:\/\/ftp.parc.xerox.com\/pub\/gilbert\/graph.html\r\n% ftp:\/\/ftp.parc.xerox.com\/pub\/gilbert\/graph.html>.\r\n% Most of this blog post is derived from that archived work.\r\n\r\n%% Coauthor Matrix\r\n% Nick Trefethen posted a flip chart with Gene Golub's name in the center.\r\n% He invited everyone present to add their name to the chart and draw lines\r\n% connecting their name with the names of all their coauthors.  The diagram\r\n% grew denser throughout the week.  At the end it was a graph with 104 vertices\r\n% (or people) and 211 edges. \r\n% We entered the names and coauthor connections into MATLAB, creating an\r\n% adjacency matrix |A|.  Let's retrieve the names and matrix as they are\r\n% stored in the PARC archive.\r\n \r\nload housegraph\r\nsize(A)\r\n\r\n%%\r\n% The matrix |A| is 104-by-104 and symmetric.\r\n% Elements |A(i,j)| and |A(j,i)| are both equal to one if |i| and |j|\r\n% are the indices of coauthors and equal to zero otherwise.  Most of the\r\n% elements are zero because most pairs of attendees are not coauthors.\r\n% So the matrix is sparse.  In fact, the sparsity, the fraction of nonzero\r\n% elements, is less that five percent.\r\n\r\nformat short\r\nsparsity = nnz(A)\/prod(size(A))\r\n\r\n%%\r\n% Here is a picture of the matrix, a |spy| plot that shows the location of\r\n% the nonzeros, with Golub in the first column and the other authors\r\n% in the fairly arbitrary order in which they appeared on Trefethen's\r\n% flip chart.\r\n\r\nspy(A);\r\ntitle('Coauthor matrix');\r\nsnapnow\r\n\r\n%% Most Prolific\r\n% As the adjacency matrix is loaded from the archive, the most prolific\r\n% coauthor is already in the first column.\r\n% It was not a surprise to those of us at the conference to find that\r\n% the most prolific coauthor is Gene Golub.\r\n\r\nm = find(sum(A) == max(sum(A)))\r\nname(m,:)\r\n\r\n%% Reorder the Authors\r\n% We want to create a circular plot with Golub in the center, the\r\n% other authors around the circumference, and edges connecting the coauthors.\r\n% If we were to place the authors around the circumference in the order we\r\n% retrieve them from the chart, the edges would cross the circle pretty\r\n% much randomly.  We want to rearrange the authors so that the coauthor\r\n% connections are as close as possible to the circumference of the circle.\r\n% This corresponds to a symmetric permutation of the matrix that minimizes,\r\n% or at least reduces, its bandwidth.\r\n\r\n%% Reverse Cuthill McKee\r\n% The Reverse Cuthill McKee algorithm is the only algorithm I know\r\n% that is usually used backwards.  In 1969 Elizabeth Cuthill and J. McKee\r\n% described a heuristic for reordering the rows and columns\r\n% of a matrix to reduce its bandwidth.  In 1991 J. Alan George pointed out\r\n% that reversing the Cuthill McKee ordering almost always leads to fewer\r\n% arithmetic operations in Gaussian elimination.  The algorithm applies\r\n% to symmetric matrices, so MATLAB has a |symrcm| function, but no |symcm|\r\n% function.\r\n% Here we are reducing bandwidth; we are not doing any arithmetic,\r\n% so the difference between Cuthill-McKee and Reverse-Cuthill-McKee\r\n% is purely cosmetic.\r\n\r\n%% Coauthor Graph\r\n\r\nr = symrcm(A(2:end,2:end));\r\nprcm = [1 r+1];\r\nspy(A(prcm,prcm))\r\ntitle('Coauthor matrix with reduced bandwidth');\r\n\r\n%%\r\n% Now we are were able to actually plot the coauthor graph.  Fortunately,\r\n% plotting text at arbitrary angles had just been provided in MATLAB.\r\n\r\ndrawit;\r\nsnapnow\r\n\r\n%% My Connectivity\r\n% You will find my name at about 5 o'clock on the coauthor graph.  There is\r\n% no edge connecting me to Golub because Gene and I were (not yet) coauthors.\r\n% But how many coauthors do we have in common?\r\n% Taking the second power of the connectivity matrix gives the paths\r\n% of length two.\r\n\r\nA2 = A^2;\r\ncount = A2(Golub,Moler)\r\n\r\n%%\r\n% And who are those common coauthors?\r\n\r\ntwos = name( find ( A(:,Golub) .* A(:,Moler) ), : )\r\n\r\n%% Arrowhead\r\n% Finally, we find a minimum degree reordering of the matrix\r\n% and discover its true nature as, of course, ....\r\n\r\npmd = amd(A);\r\nspy(A(pmd,pmd))\r\ntitle('The (Lake) Arrowhead Matrix');\r\n\r\n##### SOURCE END ##### ed73abde325d4daaabfc5c9e97cf02a4\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/images\/cleve\/arrowhead_01.png\" onError=\"this.style.display ='none';\" \/><\/div><!--introduction--><p>Twenty years ago, during the Householder Symposium at the Lake Arrowhead conference center , John Gilbert carried out one of the world's first computational social network analyses.... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/cleve\/2013\/06\/10\/lake-arrowhead-coauthor-graph\/\">read more >><\/a><\/p>","protected":false},"author":78,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[4],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/posts\/688"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/users\/78"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/comments?post=688"}],"version-history":[{"count":11,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/posts\/688\/revisions"}],"predecessor-version":[{"id":3552,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/posts\/688\/revisions\/3552"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/media?parent=688"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/categories?post=688"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/cleve\/wp-json\/wp\/v2\/tags?post=688"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}