If you regularly have to deal with specific versions of R, or different package combinations, or getting R set up to work with other databases or applications then, well, it can be a pain. You could dedicate a special machine for each configuration you need, I guess, but that's expensive and impractical. You could set up virtual machines in the cloud which works well for one-off situations, but gets tedious having to re-configure a new VM each time. Or, you could use Docker containers, which were expressly designed to make it quick easy to configure and launch an independent and secure collection of software and services.
If you're new to the concept of Docker containers, here's a docker tutorial for data scientists. But the concepts are pretty simple. At Docker hub, you can search "images" — basically, bundles of software with pre-configured settings — contributed by the community and by vendors. (You'll be referring to the images by name, for example: rocker/r-base
.) You can then create a "container" (a running instance of that image) on your machine with the docker application, or in the cloud using the tools offered by your provider of choice.
For R users, there's a wide array of pre-configured Docker images for R available since 2014, thanks to the Rocker project. You can browse the rocker repository at Docker Hub to see everything available, but it includes:
- Simple images containing just the latest official R release or the latest daily R build.
- Images containing both R and RStudio Server.
- Images with the tidyverse suite of packages pre-installed.
- Version-stable images, snapshotted to specific R (and RStudio versions) and the R package ecosystem at specific points in time. If you retrieve one of these images using a tag, your docker image will always include the same software, even months or years down the line. These are perfect for production instances, where reproducibility is paramount.
I find the images containing RStudio Server super convenient whenever I need to try out something in a specific R version. All I need to do is provide the image name to Azure Container Instances, and make sure port 8787 is open:
That's it for the configuration, and after the instance is ready (about 2 minutes) I can use a web browser to visit http://40.121.205.121:8787/ to find a completely fresh R instance and the RStudio IDE. (The actual IP address will be provided for you by Container Instances, and can be found in the Overview section for your instance in the Azure Portal.)
You can of course use other cloud providers as well: Andre Heiss provides this guide for setting up a rocker image in Digital Ocean, and also provides some handy tips for creating your own Docker Files to create custom images of your own design. For more on the Rocker project, follow the link below.
The Rocker Project: Docker Containers for the R Environment
A question - I know RStudio Server won't run on a PowerPC architecture, which is what a couple of our high-end servers intended for research & development have. It has been suggested to me that Docker would be a way around this, e.g., via one of the pre-configured images you describe, but I can't see how as the s/w still has to run on the same hardware. Would using an appropriate Rocker image in fact allow us to circumvent the hardware restriction of RStudio Server?
Posted by: John Bowman | March 21, 2018 at 13:42
It might, but you'd need to get Docker running on the PowerPC servers first. Surprisingly (to me) there does seem to be some PowerPC support for Docker, so you might get lucky. If that doesn't work, you might have more luck with traditional VMs (which bring along a complete OS, unlike Docker containers).
Posted by: David Smith | March 21, 2018 at 13:59