How do you summarize fashion? For New York Fashion Week, the New York Times used the idea of "Fashion Fingerprints", distilling a designer's collections into small fragments highlighting the palette. Here's what Marc Jacobs' current collection looks like:

Click through for an interactive version where you can explore each design, and scroll down to the bottom where you can see even greater distillation: each designer represented as abstract color blocks, with colors represented from head to toe.

R user Giuseppe Paleologo noted that R is ideally suited to a task like this. In fact, it took him less than 10 minutes to create a prototype function in R, and less than an hour to create a clean working function. The actual code is less than 20 lines, and you can use it to process 2000 images in less than a minute. Here's the result:

You can find the R code that created this image, and more information about how it works, at the link below.

gappy3000: R is in fashion (via the author)

The militarization of local police departments here in the US has been much in the news lately, and the *New York Times* published in June an in-depth article on how materiel from wars has ended up in the hands of US counties. Besides the traditional reporting it's a fantastic piece of data journalism: the *Times* submitted a freedom-of-information request to the Defense Department for the items, value and the date they were provided to each county, and published the data on GitHub. Here's a small snippet of the data:

The Times also published an interactive map of the data, aggregated by county. What that map doesn't show is the element of time, and the rate at which materials were being supplied from 2006 to 2012. Andrew Cooper, an associate professor at Simon Fraser University, used the Times data and the R programming language to create this animation of the supply of materials throughout the US over the past 8 years:

You can find a link to the complete NYT article on the topic below.

New York Times: War Gear Flows to Police Departments

by Matt Sundquist, Plotly Co-founder

It's delightfully smooth to publish R code, plots, and presentations to the web. For example:

- Shiny makes interactive apps from R.
- Pretty R highlights R code for HTML.
- Slidify makes slides from R Markdown.
- Knitr and RPubs let you publish R Markdown docs.
- GitHub and devtools let you quickly release packages and collaborate.

Now, Plotly lets you collaboratively edit and publish interactive ggplot2 graphs using these tools. This post shows how. Find us on GitHub, at feedback at plot ly, and @plotlygraphs. For more on our ggplot2 and R support, see our API docs.

You can copy and paste the code below — highlighted with Pretty R — into your R console to install Plotly and make an interactive, web-based plot. Or sign-up and generate your own key to add to the script. You control the privacy of your data and plots, own your work, and public sharing is free and unlimited.

install.packages("devtools") # so we can install from github library("devtools") install_github("ropensci/plotly") # plotly is part of ropensci library(plotly) py <- plotly(username="r_user_guide", key="mw5isa4yqp") # open plotly connection ggiris <- qplot(Petal.Width, Sepal.Length, data = iris, color = Species) py$ggplotly(ggiris) # send to plotly

Adding

`py$ggplotly()`

to your ggplot2 plot creates a Plotly graph online, drawn with D3.js, a popular JavaScript visualization library. The plot, data, and code for making the plot in Julia, Python, R, and MATLAB are all online and editable by you and your collaborators. In this case, it's here: https://plot.ly/~r_user_guide/2; if you forked the plot and wanted to tweak and share it, a new version of the plot would be saved into your profile. We can share the URL over email or Twitter, add collaborators, export the image and data, or embed the plot in an iframe in this blog post. Click and drag to zoom or hover to see data.

Our iframe points to https://plot.ly/~r_user_guide/1.embed. For more, here is how to embed plots.

Plotting is interoperable, meaning you can make a plot with ggplot2, add data with Python or our Excel plug-in, and edit the plot with someone on your team who uses MATLAB. Your iframe will always show the most up to date version of your plots. For all plots you can edit, share, and download data and plots from within a web GUI, adding fits, styling, and more. Thus, if your ggplot2 plot doesn't precisely translate through to Plotly, you and your team can use the web app to tweak, edit, and style.

Now let's make a plot in a knitr doc (here's a knitr and RPubs tutorial). First, you'll want to open a new R Markdown doc within RStudio.

You can copy and paste this code into RStudio and press "Knit HTML":

## 1. Putting Plotly Graphs in Knitr ```{r} library("knitr") library("devtools") url<-"https://plot.ly/~MattSundquist/1971" plotly_iframe <- paste("<center><iframe scrolling='no' seamless='seamless' style='border:none' src='", url, "/800/1200' width='800' height='1200'></iframe><center>", sep = "") ``` `r I(plotly_iframe)`

You'll want to press the "publish" button on the generated RPub preview to push the RPub online with a live graph. A published RPub from the code above is here. You can see how the embedded plot looks in the screenshot below. The RPub shows a Plotly graph of the ggplot2 NBA heatmap from a Learning R post.

Thus, we have three general options to publish interactive plots with your favorite R tools. First, use iframes to embed in RPubs, blogs, and on websites. Or in slides, as seen in Karthik Ram's Slidify presentation from useR 2014.

Second, you can make plots as part of an `.Rmd` document or in IPython Notebooks using R. For a `.Rmd` doc, you specify the

``plotly=TRUE``

chunk option. Here is an example and source to see the process in action. Third, you can publish a plot in an iframe in a Shiny app, defining how users interact with your plot. Here is an example with the same plot, and here's how it looks:

.

A final note. For any Plotly graph, you can call the figure:

py <- plotly("ggplot2examples", "3gazttckd7") # key and username for your call figure <- py$get_figure("r_user_guide", 1) # graph id for plot you want to access str(figure)

or the data. That means you don't have to store data, plots, and code in different places. It's together and editable on Plotly.

figure$data[]

by Joseph Rickert

The ancient Egyptians were a people with long memories. The lists of their pharaohs went back thousands of years, and we still have the names and tax assessments for certain persons and institutions from the time of Ramesses II. When Herodotus began writing about Egypt and the Nile (~ 450 BC), the Egyptians who knew that their prosperity depended on the river’s annual overflow, had been keeping records of the Nile’s high water mark for more than three millennia. So, it seems reasonable, and maybe even appropriate, that one of the first attempts to understand long memory in time series was motivated by the Nile.

The story of British hydrologist and civil servant H.E. Hurst who earned the nickname “Abu Nil”, Father of the Nile, for his 62 year career of measuring and studying the river is now fairly well known. Pondering an 847 year record of Nile overflow data, Hurst noticed that the series was persistent in the sense that heavy flood years tended to be followed by heavier than average flood years while below average flood years were typically followed by light flood years. Working from an ancient formula on optimal dam design he devised the equation: log(R/S) = K*log(N/2) where R is the range of the time series, S is the standard deviation of year-to-year flood measurements and N is the number of years in the series. Hurst noticed that the value of K for the Nile series and other series related to climate phenomena tended to be about 0.73, consistently much higher than the 0.5 value that one would expect from independent observations and short autocorrelations.

Today, mostly due to the work of Benoit Mandelbrot who rediscovered and popularized Hurst work in the early 1960s, Hurst’s Rescale/Range Analysis, and the calculation of the Hurst exponent (Mandlebrot renamed “K” to “H”) is the demarcation point for the modern study of Long Memory Time Series. To investigate let’s look at some monthly flow data taken at the Dongola measurement station that is just upstream from the high dam at Aswan. (Look here for the data, and here for map and nice Python-based analysis that covers some of the same ground as is presented below.) The data consists of average monthly flow measurements from January 1869 to December 1984.

head(nile_dat) Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 1 1869 NA NA NA NA NA NA 2606 8885 10918 8699 NA NA 2 1870 NA NA 1146 682 545 540 3027 10304 10802 8288 5709 3576 3 1871 2606 1992 1485 933 731 760 2475 8960 9953 6571 3522 2419 4 1872 1672 1033 728 605 560 879 3121 8811 10532 7952 4976 3102 5 1873 2187 1851 1235 756 556 1392 2296 7093 8410 5675 3070 2049 6 1874 1340 847 664 516 466 964 3061 10790 11805 8064 4282 2904

To get a feel for the data we plot a portion of the time series.

The pattern is very regular and the short term correlations are apparent. The following boxplots show the variation in monthly flow.

Herodotus clearly knew what he was talking about when he wrote (*The Histories*: Book 2, 19):

I was particularly eager to find out from them (the Egyptian priests) why the Nile starts coming down in a flood at the summer solstice and continues flooding for a hundred days, but when the hundred days are over the water starts to recede and decreases in volume, with the result that it remains low for the whole winter, until the summer solstice comes round again.

To construct a long memory time series we aggregate the monthly flows to produce a yearly time series of total flow (droppng the years 1869 and 1870 because of the missing values).

Plotting the ACF, we see that the autocorelations persist for nearly 20 years!!

So, let's compute the Hurst exponent. For our first try, we use a simple function suggested by an example in Bernard Pfaff's classic text: *Analysis of Integrated and Cointegrated Time Series* with R.

Bingo! 0.73 - just what we were expecting for a long memory time series. Unfortunately, things are not so simple. The function hurst() from the pracma package which is a much more careful calculation than simpleHurst() yields:

hurst(nile.yr.ts) [1] 1.041862

This is midly distressing since H is supposed to be bounded above by 1. The function hurstexp() from the same package which is based on Weron's MatLab code and implements the small sample correction seems to solve that problem.

> hurstexp(nile.yr.ts) Corrected R over S Hurst exponent: 1.041862 Theoretical Hurst exponent: 0.5244148 Corrected empirical Hurst exponent: 0.7136607 Empirical Hurst exponent: 0.6975531

0.71 is more reasonable result. However, as a post on the Timely Portfolio blog pointed out a few years ago, computing the Hurst exponent is an estimation problem not merely a calculation. So, where are the error bars?

I am afraid that confidence intervals and a look at several other methods available in R for estimating the Hurst exponent will have to wait for another time. In the meantime, the following references may be of interest. The first two are light reading from early champions of applying Rescale/Range analysis and the Hurst exponent to Financial time series. The book by Mandelbrot and Hudson is especially interesting for its sketch of the historical background. The last two are relatively early papers on the subject.

- Edgar E. Peters - Fractal Market Analysis: Applying Chaos Theory to Investments and Economics
- Benoit B. Mandelbrot & Richard L Hudson - The (Mis)behavior of Markets
- Davies & Hart - Tests for the Hurst Effect
- Rafeal Weron - Estimating long range dependence: finite sample properties and confidence intervals

My code for this post may be downloaded here: Long_memory_Post.

*The latest in a series by Daniel Hanson*

**Introduction**

Correlations between holdings in a portfolio are of course a key component in financial risk management. Borrowing a tool common in fields such as bioinformatics and genetics, we will look at how to use heat maps in R for visualizing correlations among financial returns, and examine behavior in both a stable and down market.

While base R contains its own heatmap(.) function, the reader will likely find the heatmap.2(.) function in the R package gplots to be a bit more user friendly. A very nicely written companion article entitled A short tutorial for decent heat maps in R (Sebastian Raschka, 2013), which covers more details and features, is available on the web; we will also refer to it in the discussion below.

We will present the topic in the form of an example.

**Sample Data**

As in previous articles, we will make use of R packages Quandl and xts to acquire and manage our market data. Here, in a simple example, we will use returns from the following global equity indices over the period 1998-01-05 to the present, and then examine correlations between them:

S&P 500 (US)

RUSSELL 2000 (US Small Cap)

NIKKEI (Japan)

HANG SENG (Hong Kong)

DAX (Germany)

CAC (France)

KOSPI (Korea)

First, we gather the index values and convert to returns:

library(xts) library(Quandl) my_start_date <- "1998-01-05" SP500.Q <- Quandl("YAHOO/INDEX_GSPC", start_date = my_start_date, type = "xts") RUSS2000.Q <- Quandl("YAHOO/INDEX_RUT", start_date = my_start_date, type = "xts") NIKKEI.Q <- Quandl("NIKKEI/INDEX", start_date = my_start_date, type = "xts") HANG_SENG.Q <- Quandl("YAHOO/INDEX_HSI", start_date = my_start_date, type = "xts") DAX.Q <- Quandl("YAHOO/INDEX_GDAXI", start_date = my_start_date, type = "xts") CAC.Q <- Quandl("YAHOO/INDEX_FCHI", start_date = my_start_date, type = "xts") KOSPI.Q <- Quandl("YAHOO/INDEX_KS11", start_date = my_start_date, type = "xts") # Depending on the index, the final price for each day is either # "Adjusted Close" or "Close Price". Extract this single column for each: SP500 <- SP500.Q[,"Adjusted Close"] RUSS2000 <- RUSS2000.Q[,"Adjusted Close"] DAX <- DAX.Q[,"Adjusted Close"] CAC <- CAC.Q[,"Adjusted Close"] KOSPI <- KOSPI.Q[,"Adjusted Close"] NIKKEI <- NIKKEI.Q[,"Close Price"] HANG_SENG <- HANG_SENG.Q[,"Adjusted Close"] # The xts merge(.) function will only accept two series at a time. # We can, however, merge multiple columns by downcasting to *zoo* objects. # Remark: "all = FALSE" uses an inner join to merge the data. z <- merge(as.zoo(SP500), as.zoo(RUSS2000), as.zoo(DAX), as.zoo(CAC), as.zoo(KOSPI), as.zoo(NIKKEI), as.zoo(HANG_SENG), all = FALSE) # Set the column names; these will be used in the heat maps: myColnames <- c("SP500","RUSS2000","DAX","CAC","KOSPI","NIKKEI","HANG_SENG") colnames(z) <- myColnames # Cast back to an xts object: mktPrices <- as.xts(z) # Next, calculate log returns: mktRtns <- diff(log(mktPrices), lag = 1) head(mktRtns) mktRtns <- mktRtns[-1, ] # Remove resulting NA in the 1st row

**Generate Heat Maps**

As noted above, heatmap.2(.) is the function in the gplots package that we will use. For convenience, we’ll wrap this function inside our own generate_heat_map(.) function, as we will call this parameterization several times to compare market conditions.

As for the parameterization, the comments should be self-explanatory, but we’re keeping things simple by eliminating the dendogram, and leaving out the trace lines inside the heat map and density plot inside the color legend. Note also the setting Rowv = FALSE, this ensures the ordering of the rows and columns remains consistent from plot to plot. We’re also just using the default color settings; for customized colors, see the Raschka tutorial linked above.

require(gplots) generate_heat_map <- function(correlationMatrix, title) { heatmap.2(x = correlationMatrix, # the correlation matrix input cellnote = correlationMatrix # places correlation value in each cell main = title, # heat map title symm = TRUE, # configure diagram as standard correlation matrix dendrogram="none", # do not draw a row dendrogram Rowv = FALSE, # keep ordering consistent trace="none", # turns off trace lines inside the heat map density.info="none", # turns off density plot inside color legend notecol="black") # set font color of cell labels to black }

Next, let’s calculate three correlation matrices using the data we have obtained:

- Correlations based on the entire data set from 1998-01-05 to the present
- Correlations of market indices during a reasonably calm period -- January through December 2004
- Correlations of falling market indices in the midst of the financial crisis - October 2008 through May 2009

Now, let’s call our heat map function using the total market data set:

generate_heat_map(corr1, "Correlations of World Market Returns, Jan 1998 - Present")

And then, examine the result:

As expected, we trivially have correlations of 100% down the main diagonal. Note that, as shown in the color key, the darker the color, the lower the correlation. By design, using the parameters of the heatmap.2(.) function, we set the title with the main = title parameter setting, and the correlations shown in black by using the notecol="black" setting.

Next, let’s look at a period of relative calm in the markets, namely the year 2004:

generate_heat_map(corr2, "Correlations of World Market Returns, Jan - Dec 2004")

This gives us:

generate_heat_map(corr2, "Correlations of World Market Returns, Jan - Dec 2004")

Note that in this case, at a glance of the darker colors in each of the cells, we can see that we have even lower correlations than those from our entire data set. This may of course be verified by comparing the numerical values.

Finally, let’s look at the opposite extreme, during the upheaval of the financial crisis in 2008-2009:

generate_heat_map(corr3, "Correlations of World Market Returns, Oct 2008 - May 2009")

This yields the following heat map:

Note that in this case, again just at first glance, we can tell the correlations have increased compared to 2004, by the colors changing from dark to light nearly across the board. While there are some correlations that do not increase all that much, such as the SP500/Nikkei and the Russell 2000/Kospi values, there are others across international and capitalization categories that jump quite significantly, such as the SP500/Hang Seng correlation going from about 21% to 41%, and that of the Russell 2000/DAX moving from 43% to over 57%. So, in other words, portfolio diversification can take a hit in down markets.

**Conclusion**

In this example, we only looked at seven market indices, but for a closer look at how correlations were affected during 2008-09 -- and how heat maps among a greater number of market sectors compared -- this article, entitled* Diversification is Broken*, is a recommended and interesting read.

by Joseph Rickert

Last week, I posted a list of sessions at the Joint Statistical Meetings related to R. As it turned out, that list was only the tip of the iceberg. In some areas of statistics, such as graphics, simulation and computational statistics the use of R is so prevalent that people working in the field often don't think to mention it. For example, in the session New Approaches to Data Exploration and Discovery which included the presentation on the Glassbox package that figured in my original list, R was important to the analyses underlying nearly all of the talks in one way or another. The following are synopses of the talks in that session along with some pointers to relevant R resources.

Exploring Huge Collections of Scatterplots

Statistics and visualization legend Leland Wilkinson of Skytree showed off ScagExployer, a tool he built with Tuan Dang of the University of Illinois at Chicago to explore scagnostics (a contraction for “Scatter Plot Diagnostics” made up by John Hartigan and Paul Tukey in the 1980’s). ScagExployer makes it possible to look for anomalies and search for similar distributions in a huge collections of scatter plots. (The example Leland showed contained 124K plots).The ideas and many of the visuals for the talk can be found in the paper ScagExplorer: Exploring Scatterplots by Their Scagnostics. ScagExployer is Java based tool, but R users can work with the scagnostics package written by Lee Wilkinson and Anushka Anand in 2007.

Glassbox: An R Package for Visualizing Algorithmic Models:

Google’s Max Ghenis presented work he did with fellow Googlers Ben Ogorek; and Estevan Flores. Glassbox is an R application that attempts to provide transparency to “blackbox” algorithmic models such as Random Forests. Among other things, it calculates and plots the collective importance of groups of variables in such a model. The slides for the presentation are available, as is the package itself. Google is using predictive modeling and tools such as glassbox to better understand the characteristics of its workforce and to ask important, reflective questions such a “How can we better understand diversity?” The company also does HR modeling to see if what they know about people can give them a competitive edge in hiring. For example, Google uses data collected from people who have interviewed at the company in the past, but who have not received offers from Google, to try and understand Google's future hiring needs. The coolest thing about this presentation was that these guys work for the Human Resources Department! If you think that you work for a tech company go down to HR and see if you can get some help with Random Forests.

A Web Application for Efficient Analysis of Peptide Libraries

Eric Hare of Iowa State University introduced PeLica, work he did with colleagues Timo Sieber of University Medical Center Hamburg-Eppendorf and Heike Hofmann of Iowa State University. PeLica is an interactive, Shiny application to help assess the statistical properties of peptide libraries. PeLica’s creators refer to it as a Peptide Library Calculator that acts as a front end to the R package peptider which contains functions for evaluating the diversity of peptide libraries. The authors have done an exceptional job of using the documentation features available in Shiny to make their app a teaching tool.

To Merge or Not to Merge: An Interactive Visualization Tool for Local Merges of Mixture Model Components Elizabeth Lorenzi of Carnegie Mellon showed the prototype for an interactive visualization tool that she is working on with Rebecca Nugent of Carnegie Mellon and Nema Dean of the University of Glasgow. The software calculates inter-component similarities of mixture model component trees and displays them as hierarchical dendrograms. Elizabeth and her colleagues are implementing this tool as an R package.

An Interactive Visualization Platform for Interpreting Topic Models

Carson Sievert of Iowa State University presented LDAvis, a general framework for visualizing topic models that he is building with Kenny Shirley of AT&T Labs. LDAvis is interactive R software that enables users to interpret and compare topics by highlighting keywords. The theory is nicely described in a recent paper, and the examples on Carson’s Github page are instructive and fun to play with. In this plot below, circle 26 representing a topic has been selected. The bar chart on the right displays the 30 most relevant terms for this topic. The red bars represent the frequency of a term in a given topic, (proportional to p(term | topic)), and the gray bars represent a term's frequency across the entire corpus, (proportional to p(term)).

Gravicom: A Web-Based Tool for Community Detection in Networks

Andrea Kaplan showed off an interactive application that she and her Iowa State University team members, Heike Hofmann and Daniel Nordman are building. GRavicom is an interactive web application based on Shiny and the D3 JavaScript library that lets a user manually collect nodes into clusters in a social network graph and then save this grouping information for subsequent processing. The idea is that eyeballing a large social network and selecting “obvious” groups may be an efficient way to initialize a machine learning algorithm. Have a look at a Live demo.

Human Factors Influencing Visual Statistical Inference

Mahbubul Majumder of the University of Nebraska presented joint work done with Heike Hofmann and Dianne Cook, both of Iowa State University, on identifying key factors such as demographics, experience, training, of even the placement of figures in an array of plots, that may be important for the human analysis of visual data.

R is a very powerful language for creating custom data visualizations, but during the development process sometimes you make a mistake and things go horribly wrong. But sometime serendipity intervenes, and the (unintended) result can be quite beautiful. Accidental aRt, if you will. Curated by Kara Woo and Erika Mudrak, this fantastic Tumblr captures beautiful but unintended examples from R and other data visualization tools. Here are a couple of recent favourites (click for the originals):

I could definitely see some of these hanging on my wall. (Hmm ... I have an idea for this impressive-looking gadget.) Check out the full gallery at the link below.

Tumblr: Accidental aRt

Statistics has many canonical data sets. For classification statistics, we have the Fisher's iris data. For Big Data statistics, the canonical data set used in many examples is the Airlines data. And for dotplots, we have the barley data, first popularized by Bill Cleveland in the landmark 1993 text *Visualizing Data*. Cleveland's innovations in data visualiation were hugely influential in the S language and (later) R's lattice and ggplot2 packages, and the panel chart of the barley data shown below is one of the best known.

The chart above shows the yields for several different varieties of barley (Trebi, Glabron and so on) planted at each of six different sites in Minnesota (Duluth, Grand Rapids, etc.) in the years 1931 (pink) and 1932 (blue). The reason this data set has become legendary appears in the "Morris" panel, where unlike all other sites the yields in 1931 exceeded those in 1932 for all barley varieties. This is a great demonstration of the power of dotplots and panel graphics. In his book, Cleveland said that "either an extraordinary natural event, such as disease or a local weather anomaly, produced a strange coincidence, or the years for Morris were inadvertently reversed", and "on the basis of the evidence, the mistake hypothesis would appear to be the more likely."

But it now looks that despite Cleveland's suggestion, the data are correct after all. In a paper in the American Statistician published last year, Kevin Wright notes that in that time period local effects of weather (especially drought), insects and disease had greater impact on barley yields than any overall year-to-year effects on yield, and that the results at Morris were not surprising. Kevin offers as evidence extended barley yield data (available in his R package agridat) covering 10 years and 18 varieties. As you can see in the chart below, there is significant variation across years and within sites. Take a look at 1934 for example: a bounty of barley in Duluth, but a meagre crop in St Paul:

So it goes to show that in Cleveland's original example, it wasn't a data error that led to the "unusual" results at the Morris site. Rather, it's an expected consequence of the year-to-year variation of yields in each of the growing sites. But it's no less of an interesting data set to show off the power of dot plots and panel charts — as you can see from several other examples included in Kevin Wright's paper linked below. (With thanks to Kevin for describing this example to me at the useR! 2014 poster session. You can see a version of his poster here.)

American Statistician: Revisiting Immer's Barley Data. The American Statistician, 67(3), 129–133.

by Joseph Rickert

UserR! 2014 got under way this past Monday with a very impressive array of tutorials delivered on the day that the conference organizers were struggling to cope with a record breaking crowd. My guess is that conference attendance is somewhere in the 700 range. Moreover, this the first year that I can remember that tutorials were free. The combination made for jam-packed tutorials.

The first thing that jumps out just by looking at the tutorial schedule is the effort that RStudio made to teach at the conference. Of the sixteen tutorials given, four were presented by the RStudio team. Winston Chang conducted an introduction to interactive graphics with ggvis; Yihui Xie presented Dynamic Documents with R and knitr; Garrett Grolemund Interactive data display with Shiny and R and Hadley Wickham taught data manipulation with dplyr. Shiny is still captivating R users. I was particularly struck by a conversation I had with a undergraduate Stats major who seemed to be genuinely pleased and excited about being able to build her own Shiny apps. Kudos to the RStudio team.

Bob Muenchen brought this same kind of energy to his introduction to managing data with R. Bob has extensive experience with SAS and Stata and he seems to have a gift anticipating areas where someone proficient in one of these packages, but new to R, might have difficulty.

Matt Dowle presented his tutorial on Data Table and, from the chatter I heard, increased his growing list of data table converts. Dirk Eddelbuettel presented An Example-Driven, Hands-on Introduction to Rcpp and Romain Francois taught C++ and Rcpp11 for beginners. Both Dirk and Romain work hard to make interfacing to C++ a real option for people who themselves are willing to make an effort.

I came late to Martin Morgan’s tutorial on Bioconductor but couldn’t get in the room. Fortunately, Martin prepared an extensive set of materials for his course which I hope to be able to work through.

Max Kuhn taught an introduction to applied predictive based on the recent book he wrote with Kjell Johnson. Both the slides and code are available. Virgilio Gomez Rubio presented a tutorial on applied spatial data analysis with R. His materials are available here. Ramnath Vaidyanathan Interactive Documents with R Have a look at both Ramnath’s slides, the map embedded in his abstract and his screencast.

Drew Schmidt taught a workshop on Programming with Big Data in R based on the pdbR package. The course page points to a rich set of introductory resources.

I very much wanted to attend Søren Højsgaard’s tutorial on Graphical Models and Bayesian Networks with R, but couldn’t make it. I did attend John Nash's tutorial on Nonlinear parameter optimization and modeling in R and I am glad that I did. This is a new field for me and I was fortunate to see something of John’s meticulous approach to the subject.

I was disappointed that I didn’t get to attend Thomas Petzoldt’s tutorial on Simulating differential equation models in R. This is an area that is not usually associated with R. The webpage for the tutorial is really worth a look.

I don't know if the conference organizers planned it that way, but as it turned out, the tutorial subjects chosen are an excellent showcase for the depth and diversity of applications that can be approached through R. Many thanks to the tutorial teachers and congratulations to the UseR! 2014 conference organizers for a great start to the week. I hope to have more to say about the conference in future posts.

Hadley Wickham's been working on the next-generation update to ggplot2 for a while, and now it's available on CRAN. The ggvis package is completely new, and combines a chaining syntax reminiscent of dplyr with the grammar of graphics concepts of ggplot2. The resulting charts are web-ready in scalable SVG format, and can easily be made interactive thanks to RStudio's shiny package.

For example, here's the code to create a scatterplot with a smoothing line from the mtcars data set:

```
mtcars %>%
ggvis(~wt, ~mpg) %>%
layer_points() %>%
layer_smooths()
```

And here's the corresponding SVG image:

SVG graphics are great online, because they're compact (this one's just 25Kb) and look great whatever size they're displayed at (it's a vector format, so you never get pixellation). The only think SVGs don't work well for is charts with millions of elements (points, lines, etc.) because then they can be large and slow to render. (The only other downside is that our blogging platform, TypePad, doesn't support SVG with its image tools, so I had to insert an <image> element into the HTML directly.)

You can easily add interactivity to a chart, by specifying parameters as input controls rather than numbers. Here's the code for the same chart, with a slider to specify the smoothing parameter and point size:

```
mtcars %>%
ggvis(~wt, ~mpg) %>%
layer_smooths(span = input_slider(0.5, 1, value = 1)) %>%
layer_points(size := input_slider(100, 1000, value = 100))
```

If you run that code in RStudio you'll get an interactive chart, or go here to see the same interactivity on a web page, rendered with RStudio's Shiny. For more details, check out the ggvis website linked below.

RStudio: ggvis 0.3 overview