I just got back from the EARL conference in Boston, and just as with the London event back in September, there were many excellent presentations from companies using R for real-world applications. A few of the highlights I saw included:
- Verizon, the telecommunications giant, uses R to analyze cybersecurity data for its annual (and well-resepcted) Data Breach Investigations Report. According to presenter Bob Rudis, Verizon Enterprise Solutions' Security Data Scientist, R is used "soup to nuts" to create the report. Verizon even makes the data and code available for open research via the verisr package on GitHub. You can view Bob's slides here, and learn more in this Computerworld article by Sharon Machlis.
- Pfizer, a leading pharmaceutical company, uses R to review dose-response data from Phase 1 clinical trials, converting a laborious spreadsheet-based process into a an automated process to review R-generated charts. Presenter Bill Denney, Director of Clinical Pharmacology at Pfizer, noted that "many R analyses go into FDA submissions" (a point also reinforced by this 2014 presentation from Pfizer colleague Mike Smith).
- Vincent Warmerdam, a data mining scientist from GoDataDriven (an Amsterdam-based consulting company) used R and Spark to analyze the economy of the popular MMO World of Warcraft, and found that 1% of the players control 25% of the in-game economy.
- CARD.com, the custom prepaid credit card company, uses R to drive its advertising strategy on Facebook. Lead R Developer Gergely Daróczi described using the fbRads package to identify target segments and interface with the Facebook API to run campaigns (view the slides here).
- Oliver Keyes from Wikipedia described using the webreadr, urltools, and rgeolocate packages to analyze weblog data from one of the top-10 most trafficked websites in the world.
- The Educational Advisory Board uses R to preduct the success of students at educational institutions. Director of Data Science Harlan Harris descibed a complete analytics workflow based on R and YHat Science Ops to deliver analysis from data scientists throughout the organization.
There were many other great talks, including from the likes of Google, Syngenta and Celgene, that I wasn't able to attend — there were too many great talks going on in parallel sessions! Kudos to the organizers Mango Solutions for putting together such a great program, and thanks to them also for inviting me to present one of the keynote sessions. My talk was about how open-source software (particularly R) and data science, coupled with the advent of big data platforms and cloud computing, have disrupted the economics of every industry and provided opportunities for brand-new applications thay deliver data science as a service. I've included my slides on The Business Economics and Opportunity of Open Source Data Science below: