It all started off as a simple question from Scott Chamberlain on Twitter:
Make m x n matrix with randomly assigned 0/1 -> apply(m, c(1,2), function(x) sample(c(0,1),1)) -- Better/faster solution? #rstats
— Scott Chamberlain (@recology_) August 28, 2012
The goal was to create a matrix with randomly selected binary elements, and a predetermined number of rows and columns, that looks something like this:
[,1] [,2] [,3] [,4] [1,] 0 1 1 0 [2,] 0 0 0 1 [3,] 1 0 1 1
Many suggestions followed (including one from me). There were several different ways suggested of creating the random binary values:
- Use the runif function to create random numbers between 0 and 1, and round to the nearest whole number.
- Use ifelse on the output of runif, and assign 0 if it's below 0.5, and 1 otherwise.
- Use the rbinom function to sample from a binomial distribution with a size of 1 and probability 0.5
- Use the sample function with the replace=TRUE option to simulate selections of 0 and 1.
There were also different ways suggested for generating the matrix:
- Use a for loop to fill each element of the matrix individually.
- Generate random numbers row by row, and fill the matrix using apply.
- Generate all the random numbers at once, and use the "matrix" function to create the matrix directly.
Luis Apiolaza reviews the suggested methods. Each has its merits: in clarity of code, in elegance, and especially in performance. On that front, Dirk Eddelbuettel benchmarked several of the solutions, including translating the code into C++ using Rcpp. One surprising outcome: translating the problem into C++ is only somewhat faster than using one call to sample. As Dirk says, this shows that "well-written R code can be competitive" with machine code.
Update Sep 5: A late R-only solution from Josh Ulrich using sample.int is only 10% slower than the compiled C++ version. That's fast!
Thinking inside the Box: Faster creation of binomial matrices
Comments
You can follow this conversation by subscribing to the comment feed for this post.