« R makes a T-shirt | Main | In case you missed it: June roundup »

July 01, 2009

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a010534b1db25970b011571997d09970b

Listed below are links to weblogs that reference Simple, scalable parallel computing in R:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Iterators! Wow! I was working on an iterators package back in 2002. Nobody seemed interested at the time, and I was stuck between using S3, S4 or R.oo methods for it. Other things got in the way and I shelved it.

I started it because I didn't want to create a vector of 1:10000000 just as a loop control variable in an MCMC run. I wrote a simple iterator and then tried to sub-class it to make an MCMC loop index iterator class. This one had things like burn-in and thinning built in, so you could query the iterator to see if you were past the burn-in period or whether this iteration was thinned.

I also added some timing methods to my iterators so you could query when they expected to finish - it was just a simple linear time algorithm based on how many iterations had gone in how long, and how many more iterations were needed. You did "cat(predictEnd(iter))" and it showed you its best guess.

So, some ideas for your next version!

Or even more easily: install.packages(c("foreach", "doMC"))

;)

These are great ... I saw Steve give a talk about the foreach library a while ago at the NY R meetup group and was totally waiting for these packages to land.

Thanks for all the good work and sharing it with the community!

I am not able to install doMC:

> install.packages("doMC")
Warning message:
In getDependencies(pkgs, dependencies, available, lib) :
package ‘doMC’ is not available

Hmm, it works OK for me. Perhaps the mirror you're using isn't up-to-date? Another mirror might work (cran.revolution-computing.com works for me).

Just stumbled on this, seems like a big upgrade over previous solutions for R parallel.

Don't forget after installing: library(foreach);library(doMC);

I've written a simulation that re-runs a huge block (about fifteen pages) of code 128 times. Each iteration takes about eight minutes, but I'm currently only using one processor (and a simple for loop for everything). What's the easiest way to convert a giant for loop over to foreach and take advantage of my multiple processors?

Anthony,

That should be a pretty simple conversion. The trick is to encapsulate each simulation into a function (say, do_simulate) that returns the simulation. Then the code would be something like:

Nsims <- 128
sims <- foreach(i = 1:128) %*%dopar%*% do_simulate()

and then sims[[77]] is the 77'th simulation, for example.

If you're using Revolution R and are on windows, then preceding the above with:

library(foreach)
library(doSMP)
workers <- startWorkers(2)
registerDoSMP(workers)

then two simulations will run at a time in parallel. Tune this code according to the number of processes available.

Hi David, thanks for responding so quickly. I'm unfortunately not getting this code to work. I'm just using the standard R 2.11.1 x64 Windows build, not the Revolution R software.

When I try to install the doSMP or doMC packages, I a 'cannot find package' error, even when I use the cran.revolution repo.

Then, assuming I don't need either of those packages and only need the foreach package, I'm trying this code out:


library(foreach)

b<-Sys.time()

Nsims<-24
z<-rep(0.5,Nsims)
for(i in 1:Nsims) {
x <- runif(10^8 , 0 , 1)
z[i] <- mean(x)
}
print(Sys.time() - b)


b<-Sys.time()

z<-rep(0.5,Nsims)
foreach(i=1:Nsims) %dopar% {
x <- runif(10^8 , 0 , 1)
z[i] <- mean(x)
}
print(Sys.time() - b)


on both a dual core and quad core machine. It's unfortunately not any faster using the foreach operator.

I noticed it gives an error when I use %*%dopar%*% so I changed it to %dopar%..

Any idea what I'm doing wrong? This would be a tremendous time savings for me if I could get it working..

Thanks!

Hi Anthony,

You do need a parallel backend to make foreach run faster. It's not surprising that your "for" and "foreach" examples run in the same time, because foreach doesn't run in parallel until you install a parallel backend. doSMP works on Windows, but is only available in Revolution R. doMC works on Mac and Linux, but not Windows.

Hope this helps!

Hi again,

Is there any other way to run a program in parallel on a Windows 64 bit build? I work for a non-profit but not academia, so I can't use the Revolution Enterprise version right now..

Thanks :)

you may try to insert 2 lines of statements in the beginning of the program as follows:

library(doMC)
registerDoMC()

I'm really impressed with the Foreach package which I use with doSNOW. Works like a charm.

I was wondering if the good people @ REvolution are planning to do similar sort of stuff to take advantage of parallel processing power on GPUs.

Currently, I'm working the pants off my dual core CPU running 100% for the past 4 hours (expected to last another 16 hours) using Foreach & doSNOW; doing some exponential smoothing on 100,000 time series. Each takes about 1-1.5 secs.

Glad it's working out well for you, Sashi! We've got some other HPC stuff in the works, but GPU processing isn't in the near future. One suggestion: you might want to dry doMC (or doSMP on Windows) over doSNOW -- it's a bit more efficient when you're just working on a single machine. And another hint: try doing 10 or 100 of those smooths in each foreach loop (with a regular for loop) to reduce overhead.

The foreach-package is exactly what I was looking for, thanks a lot. I'm running plain old R (not Revolution R) on a Mac and have a little trouble when trying to work with both the foreach- and the mecdf-package. As far as I understand it there's some conflict between the iterators-package (which is loaded with foreach) and the s3x-package (which is loaded with mecdf).

Here's a little example:

library(foreach)
library(doMC)
registerDoMC()

#this works just fine
foreach(i=1:4) %dopar% {sqrt(i)}

library(mecdf)
#now this doesn't work
foreach(i=1:4) %dopar% {sqrt(i)}

detach(package:mecdf)
detach(package:ofp)
detach(package:s3x)

#without the s3x-package everything is OK again
foreach(i=1:4) %dopar% {sqrt(i)}

#strange enough, this works
foreach(i=1:4, .packages="s3x") %dopar% {sqrt(i)}
#but not the second time
foreach(i=1:4, .packages="s3x") %dopar% {sqrt(i)}

detach(package:mecdf)
detach(package:ofp)
detach(package:s3x)

#this works (but only once)
foreach(i=1:4, .packages=c("s3x","ofp","mecdf")) %dopar% {
thisdat <- matrix(rnorm(100000),ncol=10)
fn <- mecdf(thisdat, expand=NA)
sqrt(fn(rep(0.3,10)))
}

detach(package:mecdf)
detach(package:ofp)
detach(package:s3x)

detach(package:doMC)
detach(package:foreach)
detach(package:iterators)

#More problems when first loading
library(mecdf)
#and then
library(foreach)
library(doMC)
registerDoMC()
#this produces an error
thisdat <- matrix(rnorm(100000),ncol=10)
fn <- mecdf(thisdat)
sqrt(fn(rep(0.3,10)))

#so does this
foreach(i=1:4, .packages=c("s3x","ofp","mecdf")) %dopar% {
thisdat <- matrix(rnorm(100000),ncol=10)
fn <- mecdf(thisdat, expand=NA)
sqrt(fn(rep(0.3,10)))
}

Sorry, my example got a bit lengthy. I'm aware that most of the above is not what you would use foreach for, but tried to stick to the simple kind of hello-world-example from the (very useful) vignette-pdf.

At least, as long as you run your foreach loop with the .packages="mecdf" option just once, everything seems to work.

Michael, not sure what's going on there -- I can only guess that the s3x package is overloading something which causes foreach to fail as you describe. I'll pass this on to the package maintainers to take a look as well, but for now it seems like the .packages="medcf" option is a good workaround.

I also wrote my problem to the maintainer of the mecdf-packages and with the latest update, she fixed it. Now the syntax is slightly different, but the combination with foreach (and my previous example) works just fine.

Would you mind posting an example that actually works?

Thanks!

I keep getting a "Error in { : task 1 failed - "missing value where TRUE/FALSE needed"


Any ideas?

The comments to this entry are closed.


R for the Enterprise

Got comments or suggestions for the blog editor?
Email David Smith.
Follow revodavid on Twitter Follow David on Twitter: @revodavid

Search Revolutions Blog