R is a functional language, which means that your code often contains a lot of ( parentheses ). And complex code often means nesting those parentheses together, which make code hard to read and understand. But there's a very handy R package — magrittr, by Stefan Milton Bache — which lets you transform nested function calls into a simple pipeline of operations that's easier to write and understand.
Hadley Wickham's dplyr package benefits from the %>% pipeline operator provided by magrittr. Hadley showed at useR! 2014 an example of a data transformation operation using traditional R function calls:
Here's the same code, but rather than nesting one function call inside the next, data is passed from one function to the next using the %>% operator:
You can read this version aloud to easily get a sense of what it does: the flights data frame is filtered (to remove missing values of the dep_delay variable), grouped by hours within days, the mean delay is calculated withn groups, and returns the mean delay for those hours with more than 10 flights.
You can use the %>% operator with standard R functions — and even your own functions — too. The rules are simple: the object on the left hand side is passed as the first argument to the function on the right hand side. So:
- my.data %>% my.function is the same as my.function(my.data)
- my.data %>% my.function(arg=value) is the same as my.function(my.data, arg=value)
It's even possible to pass in data to something other than the first argument of the function using a . (dot) operator to mark the place where the object goes — see the magrittr vignette for details.
This new "pipelining" operation is a really useful addition to the R language, and R developers are starting to use it to make their code simpler to write and maintain. Hadley Wickham's newest R package, tidyr, makes it easy to clean up data sets for analysis by stringing together operations like "gather" and "spread" using the %>% operator.
And speaking of pipelining, you may have been wondering where the name "magrittr" comes from. Here's the answer:
The only other question is: will Stefan be making this coffee mug available?
magrittr vignette: Ceci n'est pas un pipe
R isn't a functional programming language. It may have functions but that doesn't make it a functional language, in the technical sense of that term.
Posted by: David Heffernan | July 23, 2014 at 15:34
This is an interesting package. For me (self styled intermediate R user without substantial knowledge of other programming languages), however, there would seem to be a large investment necessary to wrap my head around the fact that the order of functions is reversed. In the original code the code is written as Filter10, Summarize, Group, FilterNA; in the %>% version we have FilterNA, Group, Summarize, Filter10.
Posted by: Nate | July 24, 2014 at 03:13
@Hefferman, well R is not pure functional, but mixed-paradigm. It certainly has functional elements; in particular functions are first-class objects in R.
@Nate I'd say the "regular" code is reversed, as the pipe (%>%) aligns code with the order of execution.
Posted by: Stefan | July 24, 2014 at 09:10
Posted by: P2004r | August 15, 2014 at 07:43
This has the same power as extension methods in C#, allowing a fluent syntax. This style of programming really works in practice. Respect to the authors of this package!
Posted by: Gravitas | August 16, 2014 at 10:36