The doSMP package (and its companion package, revoIPC), previously bundled only with Revolution R, is now available on CRAN for use with open source R under the GPL2 license.
In short, doSMP makes it easy to do SMP parallel processing on a Windows box with multiple processors. (It works on Mac and Linux too, but it's been relatively easy to do parallel processing on those systems for a while with doMC/multicore package combo. Windows, not so much.) Basically, you tell it how many processors you have, write a loop using the foreach function, and the iterations of the loop run in parallel, using multiple processors. For embarassingly parallel problems like simulations and optimizations and such, if you have 2 processors you can get close to halving the processing time; reduce it to near 25% with 4 processors, and so on. (Whether these are true, independent CPUs or cores within a processor matters a little, but not much.)
You can see some examples in the doSMP vignette, from which I adapted the following example. Suppose you want to bootstrap parameter estimates from a logistic regression using 1000 samples:
x <- iris[which(iris[, 5] != "setosa"), c(1, 5)]
trials <- 10000
chunkSize <- ceiling(trials/getDoParWorkers())
smpopts <- list(chunkSize = chunkSize)
r <- foreach(icount(trials), .combine = cbind, .options.smp = smpopts)
%dopar% {
ind <- sample(100, 100, replace = TRUE)
result1 <- glm(x[ind, 2] ~ x[ind, 1], family = binomial(logit))
coefficients(result1)
}
Created by Pretty R at inside-R.org
Note the use of foreach to run the bootstrap models in parallel. On a 4-core machine, you could reduce processing time from 104 seconds to 57 seconds compared to using a regular for loop. Not quite a fourfold reduction, but a significant reduction in time nonetheless. (Tip: if you're using Revolution R, you might want to try turning off MKL multithreading when using doSMP/foreach, to avoid contention between the small-grain threading of MKL, and the large-grain parallelism of foreach.)
I've written about foreach several times before (here, here and here for example) using other parallel backends like doMC and doSNOW. Now you can use those same examples on Windows with open-source R and the doSMP package.
doSMP package: Getting Started with doSMP and foreach
Way to go David/REvolution!
Posted by: Tal Galili | March 04, 2011 at 23:17
w <- startWorkers(workerCount = 4)
registerDoSMP(w)
should be added or it will not run!
Posted by: ypan5 | March 05, 2011 at 19:49
Hi, does doSMP speeda up everything or only foreach loops?
for example, would it speed up a complex GLM as well? Me in particular have a gamm() package mgcv which takes about 5 minutes (once it took 160minutes) and I would like to make it much more faster. :)
do you think doSMP could help?
Posted by: Jens | July 09, 2011 at 06:27