It was a great show at last night's Bay Area R User Group Meeting, with over 100 people in attendance at LinkedIn's fantastic digs down in Mountain View. Clearly there's a lot of interest in Social Network Analysis with R in Silicon Valley as demonstrated by the attendance of folks from Twitter, Google, and LinkedIn amongst many others.
Annie Wang started off with a lightning talk on the behavior of players in the online massively multiplayer game Everquest 2. Using one week's worth of chat data from the game and R, Annie with her collaborators investigated whether expert players differed from other players in the way they communicated with other players using the chat feature. It turns out that the answer appears to depend on how you define "expert": players who had accumulated many experience points in the game indeed do appear to make more use of chat, but not so for players who are rapidly accumulating experience points.
Drew Conway followed up with a reprise of his talk to the New York user group on Social Network Analysis in R. Drew cut his teeth analyzing chatter within terrorist networks for the US government, and so his perspective on SNA is firmly grounded in reality with plenty of practical tips (example: ignore the pendant chains of singleton relationships, and focus on the core). One of the surprising things from a self-described Python-junkie like Drew is his observation that R (with the igraph package) is astonishingly fast compared to Python for many SNA analyses. Drew reports analyzing networks with 10,000 nodes in a matter of seconds using only a 4Gb laptop. He also mentioned that one of the benefits of doing SNA in R is that you also have immediate access to all of the "traditional" statistical methods as well: for example, if you want to do a regression and look at the residuals from the networks, you can use the lm command right in R instead of having to export the data to a different application.
But the real tour de force of Drew's talk was a live demonstration of social network analysis based on Twitter data. Drew asked audience members to Tweet from their smartphones using the hashtag #DrewSNA, and then ran an R script to interface with the Twitter API to download the tweets with that tag and the follow relationships between the audience members who posted them. Here's the chart he produced about 5 seconds after running the script:
Now that's Social Network Analysis on demand! There's me, @revodavid, above and to the left of center. (Despite not being at the meeting, #rstats mainstay @CMastication got in on the action too by retweeting the #DrewSNA tag.) Drew then followed up with several analyses of the network, identifying the most distantly connected audience members and a block model analysis of the connected groups.
Last nights session really opened my eyes to the power of social analysis in R. I'm sure we'll see much more activity in this area in the near future.
Update: Drew has made the R code he used for the live Twitter analysis available for download.
Update: Annie Wang (whose name I originally misspelled -- sorry) wanted to make sure that her collaborators on the large Everquest 2 project were recognized. They are from Northwestern University, University of Southern California, University of Illinois, and University of Minnesota. Happy to oblige, Annie!
Has anyone out there tried Drew's code or attempted to get data from twitter before and gotten the following error:
Error: Rate limit exceeded. Clients may not make more than 150 requests per hour
If so is there a way I can tweak ( his code) to stay within this limit - note I am a brand new R-user ( SAS convert!) and probably shouldn't even be attempting this kind of thing- but I couldn't resist.
Posted by: Matt | November 12, 2009 at 18:52