It's not an overstatement to say that, at least for me personally, Edward Tufte's book The Visual Display of Quantitative Information was transformative. Reading this book got me and, I feel confident saying, many many other data scientists passionate about visualizing data. This is the book that popularized Minard's chart depicting Napoleon's march on Russia, introduced the world to the concepts of chartjunk and the data-ink ratio, and demonstrated many times over the value of telling stories with data (as opposed to merely displaying it).
Tufte's book was also a direct influence on the graphics system of the S Language (and also its successor, R): it was the first statistical programming language where Tufte's concepts could easily be expressed in small amounts of code (even if his principles weren't fully adhered to in the default settings). So it's great to find Lukasz Piwek's Tufte in R page, where many of the examples from The Visual Display are recreated in base R code (and sometimes using lattice and ggplot2 as well). Here for example is Tufte's famous rugplot, where the axis tickmarks are replaced by dashes at the data points, giving a sense of the marginal distributions while also marking the data:
And here is Tufte's original sparklines chart: minimal time series presented as small multiples of individual data units (and now all the range in business intelligence tools and spreadsheets).
Each of the examples comes with corresponding R code, usually just a dozen lines or so. Even the document itself is laid out in the style of Tufte's book, with footnotes presented as sidenotes in the margins of the text, right where they're referenced. (RStudio has an RMarkdown style for Tufte handouts like this.) Check out all the examples at the link below.
Lukasz Piwek: Tufte in R
re:sparklines
no disrespect to Tufte, Piwek, or thyself, but Shewart devised the control chart at Bell Labs in the 1920s. just a few miles down/up the road from Princeton.
Posted by: Robert Young | April 29, 2016 at 10:13