Twenty years ago, during the Householder Symposium at the Lake Arrowhead conference center , John Gilbert carried out one of the world's first computational social network analyses.
Twenty years ago, in June, 1993, the UCLA Lake Arrowhead Conference Center in the San Bernardino Mountains, 90 miles east of Los Angeles, was the site of the Householder XII Symposium on Numerical Linear Algebra, organized by Gene Golub and Tony Chan.
John Gilbert is now a Professor at U. C. Santa Barbara. A couple of years before the Lake Arrowhead meeting, in 1990 and 1991, John, Rob Schreiber, and I had developed the initial implementation of sparse matrices in MATLAB. John's research activities now include sparse matrix techniques for graph and social network problems. He recently wrote to me in an email, "It was at Householder 1993 in Lake Arrowhead that we used MATLAB's sparse matrices to do the first computational social network analysis I ever saw."
John used to be at PARC, the Xerox Palo Alto Research Center. A file containing his work from 1993 is still archived at PARC, <ftp://ftp.parc.xerox.com/pub/gilbert/graph.html>. Most of this blog post is derived from that archived work.
Nick Trefethen posted a flip chart with Gene Golub's name in the center. He invited everyone present to add their name to the chart and draw lines connecting their name with the names of all their coauthors. The diagram grew denser throughout the week. At the end it was a graph with 104 vertices (or people) and 211 edges. We entered the names and coauthor connections into MATLAB, creating an adjacency matrix A. Let's retrieve the names and matrix as they are stored in the PARC archive.
load housegraph size(A)
ans = 104 104
The matrix A is 104-by-104 and symmetric. Elements A(i,j) and A(j,i) are both equal to one if i and j are the indices of coauthors and equal to zero otherwise. Most of the elements are zero because most pairs of attendees are not coauthors. So the matrix is sparse. In fact, the sparsity, the fraction of nonzero elements, is less that five percent.
format short sparsity = nnz(A)/prod(size(A))
sparsity = 0.0486
Here is a picture of the matrix, a spy plot that shows the location of the nonzeros, with Golub in the first column and the other authors in the fairly arbitrary order in which they appeared on Trefethen's flip chart.
spy(A); title('Coauthor matrix'); snapnow
As the adjacency matrix is loaded from the archive, the most prolific coauthor is already in the first column. It was not a surprise to those of us at the conference to find that the most prolific coauthor is Gene Golub.
m = find(sum(A) == max(sum(A))) name(m,:)
m = 1 ans = Golub
We want to create a circular plot with Golub in the center, the other authors around the circumference, and edges connecting the coauthors. If we were to place the authors around the circumference in the order we retrieve them from the chart, the edges would cross the circle pretty much randomly. We want to rearrange the authors so that the coauthor connections are as close as possible to the circumference of the circle. This corresponds to a symmetric permutation of the matrix that minimizes, or at least reduces, its bandwidth.
The Reverse Cuthill McKee algorithm is the only algorithm I know that is usually used backwards. In 1969 Elizabeth Cuthill and J. McKee described a heuristic for reordering the rows and columns of a matrix to reduce its bandwidth. In 1991 J. Alan George pointed out that reversing the Cuthill McKee ordering almost always leads to fewer arithmetic operations in Gaussian elimination. The algorithm applies to symmetric matrices, so MATLAB has a symrcm function, but no symcm function. Here we are reducing bandwidth; we are not doing any arithmetic, so the difference between Cuthill-McKee and Reverse-Cuthill-McKee is purely cosmetic.
r = symrcm(A(2:end,2:end)); prcm = [1 r+1]; spy(A(prcm,prcm)) title('Coauthor matrix with reduced bandwidth');
Now we are were able to actually plot the coauthor graph. Fortunately, plotting text at arbitrary angles had just been provided in MATLAB.
You will find my name at about 5 o'clock on the coauthor graph. There is no edge connecting me to Golub because Gene and I were (not yet) coauthors. But how many coauthors do we have in common? Taking the second power of the connectivity matrix gives the paths of length two.
A2 = A^2; count = A2(Golub,Moler)
count = (1,1) 2
And who are those common coauthors?
twos = name( find ( A(:,Golub) .* A(:,Moler) ), : )
twos = Wilkinson VanLoan
Finally, we find a minimum degree reordering of the matrix and discover its true nature as, of course, ....
pmd = amd(A); spy(A(pmd,pmd)) title('The (Lake) Arrowhead Matrix');
Get the MATLAB code
Published with MATLAB® R2013b
7 CommentsOldest to Newest
Along the same line, this just happened to be posted today.
“Using Metadata to find Paul Revere” by Kieran Healy
Cleve, you might be interested in taking a look at a current question on StackOverflow, regarding a type of plot very similar to the circular graph you use here:
It would be great to see the graph updated to use the great visualization code that one of the answerers has developed – it really shows how much MATLAB graphics have improved since 93 – especially since one of the answerers has applied HG2.
Thanks for the pointer, Sam. It’s great to see the interaction between this blog and StackOverflow. I agree that the new plots are gorgeous. But we can’t make one with an adjacency matrix; the edges aren’t weighted, as they are in a correlation matrix.
The Lake Arrowhead graph was the first time I’d seen computation on matrices and graphs used to analyze a social network, but the sociologists have been thinking about such things since the 1960s and 70s. An interesting read is W. W. Zachary’s 1977 paper on what’s become colloquially known as “Zachary’s Karate club”.
The junior analytical scribe’s post is dated almost halfway today and Dürer’s Melencoliah 🙂 It’s nice the way he just uses organization membership to get the Bigge Data, and how centrality highlights the key terr.., sorry, players. In the capacitated model for the karate club, the bottleneck nodes where used to (correctly :-|) predict the minimum cut.
I like the ending comment in runme.m (and the comment for symrcm, soothing for an untaught in linear algebra like me… thousands of rungs). I didn’t see it in the first run, until I noticed you changed p = symmmd(A) to amd(A). The earliest date the WayBack Machine could capture MATLAB manuals was Feb 29th 2000. The documentation for symmmd mentions the reference connecting you three (or e), and “Dedicated to Gene Golub on the occasion of his 60th birthday” (or 15). You also wrote the Dedication.
And I’ve finally found out who’s Pete in “Pete wanted it that way”! Re-reading your corner on Prof.SVD I cannot explain why I didn’t notice it then… well, maybe this time I was focused on identifying attendants to HHXII’93. So the junior scribe’s Nat.Phil. friends were right all the way since 1772 up to 1965-70! Now I see the expm manual has one of the length-2 paths (the dubious one) between you and Prof.SVD, and also understand one of the puzzle-poems, mentioned in the second page here. Now I see the second photo here and understand a little bit more what I’m looking at. I read the caption and remember the dialup connection and wireless card. Looking in this interview for the first coauthor mentioned in the Dedication, I’ve found: I got a call one day from Forsythe. He said, “Don’t you read your mail?”, and also: I was invited at that early age. I guess Varga was one of the organizers, so he was very nice to bring me in., which also makes me remember about if Moler could not come in a previous post.
I’ve had to force myself to stop following links. It becomes addictive! Thanks for a very inspiring article.
I ran a whole bunch of link predictions metrics on this network. I’d like to score them by how many links they predicted (including Golub and Moler!) I’ve made a quick list of Golub’s new links on my blog but we need the rest of the network too!
So let’s build the updated Lake Arrowhead matrix for 2013! Please post new links as a comment to this thread or as a comment to my blog post or email a CV to Cleve or myself!
David, I’m curious what the graphs would look like if you modified SHOWPREDS to color/widen the prediction lines by their respective prediction strength?