As a language for statistical analysis, R has a comprehensive library of functions for generating random numbers from various statistical distributions. In this post, I want to focus on the simplest of questions: How do I generate a random number?
The answer depends on what kind of random number you want to generate. Let's illustrate by example.
Generate a random number between 5.0 and 7.5
If you want to generate a decimal number where any value (including fractional values) between the stated minimum and maximum is equally likely, use the runif function. This function generates values from the Uniform distribution. Here's how to generate one random number between 5.0 and 7.5:
> x1 <- runif(1, 5.0, 7.5)
> x1
[1] 6.715697
Of course, when you run this, you'll get a different number, but it will definitely be between 5.0 and 7.5. You won't get the values 5.0 or 7.5 exactly, either.
If you want to generate multiple random values, don't use a loop. You can generate several values at once by specifying the number of values you want as the first argument to runif. Here's how to generate 10 values between 5.0 and 7.5:
> x2 <- runif(10, 5.0, 7.5)
> x2
[1] 6.339188 5.311788 7.099009 5.746380 6.720383 7.433535 7.159988
[8] 5.047628 7.011670 7.030854
Generate a random integer between 1 and 10
This looks like the same exercise as the last one, but now we only want whole numbers, not fractional values. For that, we use the sample function:
> x3 <- sample(1:10, 1)
> x3
[1] 4
The first argument is a vector of valid numbers to generate (here, the numbers 1 to 10), and the second argument indicates one number should be returned. If we want to generate more than one random number, we have to add an additional argument to indicate that repeats are allowed:
> x4 <- sample(1:10, 5, replace=T)
> x4
[1] 6 9 7 6 5
Note the number 6 appears twice in the 5 numbers generated. (Here's a fun exercise: what is the probability of running this command and having no repeats in the 5 numbers generated?)
Select 6 random numbers between 1 and 40, without replacement
If you wanted to simulate the lotto game common to many countries, where you randomly select 6 balls from 40 (each labelled with a number from 1 to 40), you'd again use the sample function, but this time without replacement:
> x5 <- sample(1:40, 6, replace=F)
> x5
[1] 10 21 29 12 7 31
You'll get a different 6 numbers when you run this, but they'll all be between 1 and 40 (inclusive), and no number will repeat. Also, you don't actually need to include the replace=F option -- sampling without replacement is the default -- but it doesn't hurt to include it for clarity.
Select 10 items from a list of 50
You can use this same idea to generate a random subset of any vector, even one that doesn't contain numbers. For example, to select 10 distinct states of the US at random:
> sample(state.name, 10)
[1] "Virginia" "Oklahoma" "Maryland" "Michigan"
[5] "Alaska" "South Dakota" "Minnesota" "Idaho"
[9] "Indiana" "Connecticut"
You can't sample more values than you have without allowing replacements:
> sample(state.name, 52)
Error in sample(state.name, 52) :
cannot take a sample larger than the population when 'replace = FALSE'
... but sampling exactly the number you do have is a great way to randomize the order of a vector. Here are the 50 states of the US, in random order:
> sample(state.name, 50)
[1] "California" "Iowa" "Hawaii"
[4] "Montana" "South Dakota" "North Dakota"
[7] "Louisiana" "Maine" "Maryland"
[10] "New Hampshire" "Rhode Island" "Texas"
[13] "Florida" "North Carolina" "Minnesota"
[16] "Arkansas" "Pennsylvania" "Colorado"
[19] "Idaho" "Connecticut" "Utah"
[22] "South Carolina" "Illinois" "Ohio"
[25] "New Jersey" "Indiana" "Wisconsin"
[28] "Mississippi" "Michigan" "Wyoming"
[31] "West Virginia" "Alaska" "Georgia"
[34] "Vermont" "Virginia" "Oklahoma"
[37] "Washington" "New Mexico" "New York"
[40] "Delaware" "Nevada" "Alabama"
[43] "Kentucky" "Missouri" "Oregon"
[46] "Tennessee" "Arizona" "Massachusetts"
[49] "Kansas" "Nebraska"
You could also have just used sample(state.name) for the same result -- sampling as many values as provided is the default.
Further reading
For more information about how R generates random numbers, check out the following help pages:
> ?runif
> ?sample
> ?.Random.seed
The last of these provides technical detail on the random number generator R uses, and how you can set the random seed to recreate strings of random numbers.
great, really helpful. Is there a way to give the random numbers a mean? (ideally with a normal distribution as well). These functions don't seem to have arguments that can do that.
Cheers! Joel.
Posted by: Joel | July 10, 2009 at 10:10
@Joel: read the relevant help pages carefully. The arguments of rnorm() are n [# of values to pick], mean (default 0) and sd (1). If you want uniform deviates with a specified mean, you have to know that mean = (max+min)/2 and go from there ...
Posted by: Ben Bolker | July 11, 2009 at 07:11
thank you! very clear.
Posted by: Kim | May 26, 2010 at 14:53
Thank you! This helped me more to learn R!
Posted by: Steve | September 15, 2010 at 14:51
Thanks :D
Posted by: hadi | December 04, 2010 at 09:11
thanks. helped me a lot. well explained
Posted by: rmr | March 03, 2011 at 14:07
Algorithm A1. Central limit theorum method. (Generates a single value, X).
1. For i = 1 to 12:
(a) Generate Ui uniform(0; 1)
2. Let X =sum(random numbers i)-6
3. Deliver X
Is this correct?
r<-function(x,y){
z<- 1:12
x<-runif(z)
sum(x)-6
}
r(1,2)
Posted by: darren | March 04, 2011 at 13:41
i have generated a sample of 100items but i wish to create sub-samples of these items of length 5 starting from zero to five. i tried
sample(c(runif(100, min=0, max=5)), size=5, replace=T) but is definitely obtaining only a single sample. i wish to obtain 100 subsamples of size five from the pseudorandom sample of 100 earlier generated
Posted by: Asonganyi | July 08, 2012 at 06:39
Thank you for posting this! Looking to see if you have one that generates column headers.
Posted by: Will Banks 3 | February 20, 2013 at 10:26
Assume there are 5 variables: x,y,z,q,w.
Assume you want to assign 2000 obs to each of the variables above. Hence: 5*2000 = 10000 obs.
Then the way to perform this in R is:
> a10000data<-as.vector(runif(10000))
> x<-a10000data[1:2000]
> y<-a10000data[2001:4000]
> z<-a10000data[4001:6000]
> q<-a10000data[6001:8000]
> w<-a10000data[8001:10000]
I do not know whether these assignments can be performed at once or not. Anyway, my above code is just one way. Though it would be best if we learn from R community how one-step assignment is done for 5 variables.
Erdogan CEVHER
The Ministry of Science, Industry and Technology of Turkey
Posted by: Erdogan CEVHER | March 16, 2013 at 04:21
I have a yearly time series of extreme events i.e, time series of extreme value (largest). But the problem is that the series has many missing values in it. How can I impute values in there and perform EVA? Please help me regarding it and I am a beginner using R.
Posted by: NArun | September 07, 2013 at 05:41
How to do a randomization test in R. I explain: I have two mean of number of fish captured. Since the data are not parametric, I would like to use,instead of one of these non parametric tests, the randomization... It means if I try to compare the difference of the two means is this value an extreme, by means of constructing a statistic distribution and comparing the mean value difference from that distribution....Regards
Posted by: Aristide | September 08, 2013 at 06:20