« Webinar: Changing Lives with Data Science and R at Microsoft | Main | Introducing R-hub, the R package builder service »

October 25, 2016

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Thanks for the helpful tips.

The one suggestion I have concerns the distinction between the piping operator of magrittr & dplyr, "%>%", and the layer addition operator of ggplot2, "+". Given their different semantics, I've taken to placing the ggplot() call and it's added layers inside an expression block of its own within the flow of the "pipe". For example:

ds %>%
group_by(location) %>%
mutate(rainfall=cumsum(risk_mm)) %>%
{
ggplot(., aes(date, rainfall)) +
geom_line() +
facet_wrap(~location) +
theme(axis.text.x=element_text(angle=90))
}

Note that this requires explicit specification of the data as first argument to ggplot() using the special ".".

Also the pipe can simply continue with processing of the plot object after the expression block by adding another "%>%" and whatever function call is required or another expression block to add more layers.

(As I preview this comment, the example above loses its indentation, which I use and which complies with your suggested style. To see the intended indentation, paste the example into RStudio's editor, select it, and type Ctrl-I.)

Great stuff.

Although my background is largely as a SAS programmer in a variety of environments, I'm struck by the closeness of our evolutionary paths. For example, the use of ##s to clearly indicate the "run-order" of a series of related modules. Including the use of 00 to indicate the setup section, that's exactly what I do.

One thing you don't explicitly say is that the use of single-character or abbreviated cryptic names is very bad practice, unless the use context is one where the the norms are well established. (Such as x,y as positional parms for a function.) Every object should have a name that is self-evident to the next person who inherits the code. Which could be you the original author, three years later. Self-documentation is the easiest and best approach.

Hi Michael,

Nice idea to include the ggplot() and layers within an expression block. When properly formatted (as you note) the plot story stands out nicely as a separate stanza amongst the narrative rather than otherwise getting lost amongst the trees!

Thanks for the suggestion.

Hi Doug,

Thanks for the feedback.

The ##_stepname.R concept has certainly been around for a long time. Useful also in scripts and configuration files and over the years it has been used to order system initialisations in Linux/Unix and no doubt elsewhere.

Good point about short variable names. Agree completely that each object be named in a self explanatory way, though don't tell the whole story in the name itself (i.e., avoid too long variable names).

The comments to this entry are closed.

Search Revolutions Blog




Got comments or suggestions for the blog editor?
Email David Smith.
Follow revodavid on Twitter Follow David on Twitter: @revodavid
Get this blog via email with Blogtrottr