REvolution Computing has just released three new packages for R to CRAN (under the open-source Apache 2.0 license): foreach, iterators, and doMC. Together, they provide a simple, scalable parallel computing framework for R that lets you take advantage of your multicore or multiprocessor workstation to program loops that run faster than traditional loops in R.
The three packages build on each other to implement a new loop construct for R -- foreach -- where iterations may run in parallel on separate cores or processors.
iterators implements the
iterator object familiar to users of languages like Java, C# and Python to make it easy to program useful sequences - from all the prime numbers to the columns of a matrix or the rows of an external database. Iterators objects are used as the index variable for the parallel loops.
foreach builds on the "iterators" package to introduce a new way of programming loops in R. Unlike the traditional for loop, a foreach loop can run multiple iterations simultaneously, in parallel. If you only have a single-processor machine, foreach runs iterations sequentially. But with a multiprocessor workstation and a connection to a parallel programming backend, multiple iterations of the loop will run in parallel. This means that without any changes to your code, the loop will run faster (and with a speedup scaled by the number of available cores or processors, potentially much faster).
doMC provides such a link between "foreach" and a parallel programming backend -- in this case, the
multicore package (from Simon Urbanek). With this connection,
foreach loops on MacOS and Unix/Linux systems will make use of all the available cores/processors on the local workstation to run iterations in parallel.
The "iterators" and "foreach" packages were previously available only in
REvolution R Enterprise as part of the
ParallelR suite: now, we've made versions available on CRAN for all R users.
In REvolution R Enterprise, "iterators" and "foreach" connect with our NetworkSpaces parallel backend (which comes pre-installed and pre-configured in our distribution). Not only does this additionally enable parallel programming on Windows 32-bit and Windows 64-bit systems, it also allows iterations of foreach loops to run on separate machines on a cluster, or in a cloud environment like Amazon EC2. The cluster can even be a mixed collection of Windows, MacOS and/or Unix/Linux machines. REvolution R Enterprise with NetworkSpaces also provides some other benefits, including fault tolerance (so a parallel computation will complete correctly even if some nodes in the cluster fail), and high-performance mathematics libraries to make the computations run even faster.
You can see an example of
foreach in action in this
financial backtesting example. That example will run as written in REvolution R Enterprise, and in the latest version of R by installing the "multicore" package and replacing the call to
registerDoNWS() with
registerDoMC(). You'll need the relevant packages installed, which you can do easily in R as follows:
install.packages("foreach")
install.packages("doMC")
I'll be posting more examples using the "foreach" and "iterators" packages in the coming weeks. I'll also be talking about them at
UseR! later this month. You can also read more in today's
press release. In the meantime, feedback and comments are always welcome to
packages@revolution-computing.com.
Iterators! Wow! I was working on an iterators package back in 2002. Nobody seemed interested at the time, and I was stuck between using S3, S4 or R.oo methods for it. Other things got in the way and I shelved it.
I started it because I didn't want to create a vector of 1:10000000 just as a loop control variable in an MCMC run. I wrote a simple iterator and then tried to sub-class it to make an MCMC loop index iterator class. This one had things like burn-in and thinning built in, so you could query the iterator to see if you were past the burn-in period or whether this iteration was thinned.
I also added some timing methods to my iterators so you could query when they expected to finish - it was just a simple linear time algorithm based on how many iterations had gone in how long, and how many more iterations were needed. You did "cat(predictEnd(iter))" and it showed you its best guess.
So, some ideas for your next version!
Posted by: Barry Rowlingson | July 01, 2009 at 10:40
Or even more easily: install.packages(c("foreach", "doMC"))
;)
Posted by: Hadley | July 01, 2009 at 12:40
These are great ... I saw Steve give a talk about the foreach library a while ago at the NY R meetup group and was totally waiting for these packages to land.
Thanks for all the good work and sharing it with the community!
Posted by: Steve Lianoglou | July 01, 2009 at 13:38
I am not able to install doMC:
> install.packages("doMC")
Warning message:
In getDependencies(pkgs, dependencies, available, lib) :
package ‘doMC’ is not available
Posted by: Roger J. Bos | September 15, 2009 at 12:11
Hmm, it works OK for me. Perhaps the mirror you're using isn't up-to-date? Another mirror might work (cran.revolution-computing.com works for me).
Posted by: David Smith | September 15, 2009 at 12:39
Just stumbled on this, seems like a big upgrade over previous solutions for R parallel.
Don't forget after installing: library(foreach);library(doMC);
Posted by: NS | May 12, 2010 at 12:45