by Joseph Rickert
We can declare 2015 the year that R went mainstream at the JSM. There is no doubt about it, the calculations, visualizations and deep thinking of a great many of the world's statisticians are rendered or expressed in R and the JSM is with the program. In 2013 I was happy to have stumbled into a talk where an FDA statistician confirmed that R was indeed a much used and trusted tool. Last year, while preparing to attend the conference, I was delighted to find a substantial list of R and data science related talks. This year, talks not only mentioned R: they were about R.
The conference began with several R focused pre-conference tutorials including Statistical Analysis of Financial Data Using R, The Art and Science of Data Visualization Using R, and Hadley Wickham’s sold out Advanced R. The Sunday afternoon session on Advances in R Software played to a full room. Highlights of that session included Gabe Becker’s presentation on the switchr package for reproducible research, Mark Seligman’s update on the new work being done on the Arborist implementation of the random forest algorithm and my colleague’s Andrie de Vries presentation of some work we did on the network structure of R packages. (See yesterday’s post.)
The enthusiasm expressed by the overflowing crowd for Monday’s invited session on Recent Advances in Interactive Graphics for Data Analysis was contagious. Talks revolved around several packages linking R graphics to d3 and JavaScript in order to provide interactive graphics which are not only visually stunning but also open up new possibilities for exploratory data analysis. Hadley Wickham, the substitute chair for the session, characterized the various approaches to achieving interactive graphics in R with a bit of humor and much insight that I think brings some clarity to this chaotic whorl of development. Hadley places current efforts to provide interactive R graphics in one of three categories:
- Speaking in tongues: interfacing to low level specialized languages (examples: iplots and rggobi)
- Hacking existing graphics (examples: Animint and using ggplot2 with Shiny)
- Abusing the browser (examples: R/qtlcharts, leaflet and htmlwidgets)
Other highlights of the session included Kenney Shirley’s presentation on interactively visualizing trees with his summarytrees package that interfaces R to D3, Susan VanderPlas’ presentation of Animint (This package adds interactive aesthetics to ggplot2. Here is a nice tutorial.), and Karl Bowman’s discussion of visualizing high-dimensional genomic data (See qtlcharts and d3examples.)
In addition to visualization, education was another thread that stitched together various R related topics. Waller's talk, Evaluating Data Science Contributions in Teaching and Research, in the section of invited papers: The Statistics Identity Crisis: Are We Really Data Scientists provided some advice on how software developed by academics could be “packaged” to look like the more traditional work product traditionally valued for academic advancement. Progress along these lines would go a long way towards helping some of the most productive R contributors achieve career advancing recognition. There was also some considerable discussion about the kind of practical R and data science skills that should supplement the theoretical training of statisticians to help them be effective in academia as well as in industry. To get some insight into the relevant issues have a look at Jennifer Bryan’s slides for her talk Teach Data Science and They Will Come.
The following list contains 20 JSM talks with interesting package, educational or application R content.
- Animint: Interactive Web-Based Animations Using Ggplot2's Grammar of Graphics
Susan Ruth VanderPlas, Iowa State University; Carson Sievert, Iowa State University; Toby Hocking, McGill University - Applying the R Language in Streaming and Business Intelligence Applications
Louis Bajuk, TIBCO Software Inc. - A Bayesian Test of Independence of Two Categorical Variables with Covariates
Dilli Bhatta, Truman State University - Comparison of R and Vowpal Wabbit for Click Prediction in Display Advertising
Jaimyoung Kwon, AOL Advertising; Bin Ren, AOL Platforms; Rajasekhar Cherukuri, AOL Platforms; Marius Holtan, AOL Platforms - Demonstration of Statistical Concepts with Animated Graphics and Simulations in R
Andrej Blejec, National Institute of Biology - The Dendextend R Package for Manipulation, Visualization, and Comparison of Dendograms
Tal Galili, Tel Aviv University - Enhancing Reproducibility and Collaboration via Management of R Package Cohorts
Gabriel Becker, Genentech Research; Cory Barr, Anticlockwork Arts; Robert Gentleman, Genentech Research; Michael Lawrence, Genentech Research - GMM Versus GQL Logistic Regression Models for Multi-Level Correlated Data
Bei Wang, Arizona State University; Jeffrey Wilson, W. P. Carey School of Business/Arizona State University - Increasing the Accuracy of Gene Expression Classifiers by Incorporating Pathway Information: A Latent Group Selection Approach
Yaohui Zeng, The University of Iowa; Patrick Breheny, The University of Iowa - Learning statistics with R, from the Ground Up Xiaofei Wang
- Mining an R Bug Database with R
Stephen Kaluzny, TIBCO Software Inc. - Multinomial Regression for Correlated Data Using the Bootstrap in R
Jennifer Thompson, Vanderbilt University; Timothy Girard, Vanderbilt University Medical Center; Pratik Pandharipande, Vanderbilt University Medical Center; E. Wesley Ely, Vanderbilt University Medical Center; Rameela Chandrasekhar, Vanderbilt University - The Network Structure of R Packages
Andrie de Vries, Revolution Analytics Limited; Joseph Rickert - Online PCA in High Dimension: A Comparative Study
David Degras, DePaul University; Hervé Cardot, Université de Bourgogne - Perils and Solutions for Comparative Effectiveness Research in Massive Observational Databases
Marc A. Suchard, UCLA - R Package PRIMsrc: Bump Hunting by Patient Rule Induction Method for Survival, Regression, and Classification
Jean-Eudes Dazard, Case Western Reserve University; Michael Choe, Case Western Reserve University; Michael LeBlanc, Fred Hutchinson Cancer Research Center; J. Sunil Rao, University of Miami - An R Package That Collects and Archives Files and Other Details to Support Reproducible Computing
Stan Pounds, St. Jude Children's Research Hospital; Zhifa Liu, St. Jude Children's Research Hospital - SimcAusal R Package: Conducting Transparent and Reproducible Simulation Studies of Causal Effect Estimation with Complex Longitudinal Data
Oleg Sofrygin, Kaiser Permanente Northern California/UC Berkeley; Mark Johannes van der Laan, UC Berkeley; Romain Neugebauer, Kaiser Permanente Northern California Statistical Computation Using Student Collaborative Work John D. Emerson, Middlebury College - Teaching Introductory Regression with R Using Package Regclass
Adam Petrie - Using Software to Search for Optimal Cross-Over Designs
Byron Jones
Will the full presentations be released?
Posted by: Neal C | August 13, 2015 at 10:51
I second Neal's comment/request. I have seen individual speakers release their slides on github and elsewhere, but it sure would be nice if someone brought all those links together.
Posted by: David Kane | August 14, 2015 at 06:24
Thanks for the excellent summary. I wonder if the link for #10 is broken because of the type (staistics instead of statistics) or do I really need to login to typepad? :-(
Posted by: Madeline | August 14, 2015 at 19:33