Join this small group for a step-by-step approach to learn the language R. Each session will be filled with examples and participants are welcome to suggest and present topics. If you have just started with R this is the perfect chance to find out about data analysis with a group of people rather than on your own. The first sessions will be based on tutorials available on r-project.org.
The meetings are limited to a small number of participants, but there are still a couple of spots open for upcoming sessions on Data Manipulation in R on Feb 21 and Feb 27.
It turns out there's another local R user group in Cambridge, UK. It's called CambR, and organizing committee member Laurent Gatto described its history to me in an email:
After meeting repeatedly at several R related conferences (Bioconductor meetings, useR 2011), some R enthusiasts thought Cambridge deserved a local R user group and founded CambR in September 2011. Since then, we gathered 70+ members (and growing) that registered on our google group. In early 2012, we organised our first meeting, aiming at getting to know each other and set future plans. 16 people showed up; feedback of the participants was positive and we decided to organise a second, larger meeting in April/May 2012.
CambR is joinging forces with the organizers of the nascent Cambridge RUG to be the "official" local R user group in Cambridge. If you're in the area, be sure to join their Google Group to be informed of upcoming meetings.
It's awesome to see so many local R user groups kicking off in 2011! Yet another is the Austin R User Group in Austin, Texas. They've already held their first informal get-together, and the first formal meeting on February 23 will be devoted to data management techniques in R. Props to Sandy Donlon for organizing the group!
And I'm so pleased to report that a local R user group has started in my old hometown of Adelaide, South Australia. Organized by Jonathan Tuke at the University of Adelaide (my alma mater), the Adelaide R-users Group has already had several successful meetups and their next meeting is on February 28. Australia now has an R user group in every major city except Perth, which I'd venture makes it the country with the most R groups on a per-capita basis. (And yet, ironically, still no groups in New Zealand -- c'mon Kiwis!)
Yet another new local R user group has launched this month, this time in Cambridge, UK. Cambridge RUG was created by data analyst Andrew Caines to promote the use of R in the Cambridge area.
The group aims to encourage people try the R language, act as an advice centre to help people get where they want to with R and host an annual gathering for R-based talks and workshops. The group's just getting started, if you're interested you can join the group email list to be notified of upcoming events.
Update Feb 10 2011: Cambridge RUG has joined forces with CambR, which is now the official local R user group in Cambridge.
Another new local R user group has just started up, this time in Cleveland, OH. The Cleveland R User Group is the brainchild of R user Nicholas Hermez, and their first meeting on February 22 is a get-together to plan future topics, presenters and venues. If you're in the Cleveland area why not drop by and contribute your ideas?
RBelgium is the latest local R user group to join the R community. Led by R user Jean-Baptiste Poullet, the group will host meetings on the first Friday of each month at the Alfot Hotel in Brussels, as well as weekly coffee get-togethers. The group also provides an on-line discussion forum on statistics and applications with R.
You can find more information and join the group at MeetUp site linked below.
On of Revolution Analytics' main missions is to support and foster the growth of the R community, and in 2011 we sponsored more than 25 local R user groups and meetings with cash donations for for venue hire, meetup.com dues, refreshments, and other group needs. We also sent hundreds of "I love R" T-shirts and stickers to group members to share and spread the word about R.
Now, with over 75 local R user groups active worldwide, the Revolution Analytics User Group Sponsorship Program is open for applications for 2012 funding. For established user groups who already have had meetings, the deadline for applications is March 31. (If you're thinking about starting a new local R user group, seed funding for new groups is available through September 30, 2012.) Sponsored groups will receive a cash donation according to the size of the group, plus a box of R-related T-shirts, stickers, and other goodies to distribute to members from Revolution Analytics.
Group leaders can find more information about the sponsorship program, and the application form, at the link below.
Put up a poster that says something like “Data Mining with R” anywhere in the Bay Area and you will surely draw a crowd. But it was still a bit of a surprise that the monthly meeting of the Bay Area R User’s group was so well attended. At one point there were 160 people on the meetup list signed-up to attend the event, and 79 people on the waiting list. (BARUG members are either excessively optimistic or they have some good models of the dynamics of waiting lists.)
George Roumeliotis, our host for the evening and Data Scientist at Intuit, began the meeting by welcoming the attendees. The announcementsincluded a request for BARUG members to submit ideas for speakers and topics for 2012 to the BARUG organizers at firstname.lastname@example.org as well as an offer from BARUG sponsor Revolution Analytics allow BARUG members to test-drive Revolution R Enterprise 5.0 free of charge for 90 days.
Sanjiv Das, a speaker who would make the lead-off spot on anybody’s R lecture team, delivered the first talk, a summary of his 60 page paper on the identification and analysis of Venture Capital communities. In 10 minutes, with skills no doubt honed by lecturing to twitter-tuned students, Professor Das presented simple R code wrapped around an implementation of the walk trap algorithm he showed how to identify the communities and then glided through the econometrics arguments that the communities to be influential. (Sanjiv Das's excellent talk from the November meetup, Using R in Academic Finance, is also available online.)
Next up, Anthony Sabbadini, founder of Economic Risk Management, a San Francisco start-up, presented different ways to visualize a company’s supply chain. In addition to highlighting the ease with which R code can be made to work with other systems, Anthony’s mash-up of NOAA weather data with truck and rail shipments showed the kind of aesthetic sensibility that grabs your attention and draws you into the data.
The third speaker was Nicholas-Lewin-Koh, a statistician from Genentech and one of the BARUG organizers. In his 10 minutes, Nicholas covered 10 years of the history of optimization algorithms in R with the authority of someone who has been grappling with the nitty-gritty details of optimization challenges in statistical applications for at least that long. The big take away for people not working in this area is that R now has a rich variety of easy to use optimizers to choose from.
Batting “clean-up” Giovanni Seni, now a Data Scientist at Intuit, provided an introduction to the cutting edge work on Rule Ensembles being done in R. Starting with an example of decision trees viewed as conjunctive rule ensembles from the book “Ensemble Methods in Data Mining” that he and John Elder co-authored Givovanni moved quickly to more complex examples. Giovanni showed an example of the kinds of regularized models with mixed linear and non-linear terms that can be fit with the Stanford R package RuleFit, and The Toolkit for Multivariate Data Analysis with ROOT (TMVA). Low key, but eye opening, Giovanni’s presentation provided a window into this research area.
Houtao Deng, also a data scientist at Intuit, followed with an overview of the general problem of feature selection in classification models, the pros and cons of both univariate and multivariate filter methods, and R packages that implement these methods. Pointers to the work of Isabelle Guyon on Support vector machines with recursive feature elimination and Ramon Diaz-Uriarte on random forests with recursive feature selection opened up a whole new area for me. I think that anyone trying to sort through the literature in this field will find Houtao’s guidance on which feature selection methods are appropriate for various types of data sets; linearly separable, non-linearly separable etc. to be very valuable.
Last up, Thomson Nguyen, Data Scientist from Lookout Mobile Security, delivered an engaging and informative talk on his work with the Heritage Health Care Kaggle competition. Any way you look at them, the raw HHC data sets are pretty ugly, but Thompson’s exposition of the preliminary data preparation and cleaning steps he worked through was so thoughtfully done and rationally laid out that it ought to be a paradigm for how to go about data cleaning. One might make different choices than Thomson on some of the big issues (throw away obviously bad observation or impute) but his over process and parting advice to seek the help of domain experts in the cleaning process were spot on. Anybody serious about chasing the $3M prize should have a good look at Thomson’s work.
All of the speakers showed remarkable knowledge and discipline in presenting their topics within the very tight time limits and Houtao Deng and his colleagues at Intuit did a first class job of providing and preparing the venue.
The Orange County R Users Group is hosting a free webinar presented by Hadley Wickham, author of the ggplot2 graphics package for R. The webinar, "Advanced Visualizations in R with Hadley Wickham" is live from 6PM-7PM Pacific Time tomorrow, December 1. You can register at the LinkedIn event page below, as long as there are spaces left (it's limited to 100).