R is designed to make it easy to clearly express statistical ideas in code, but when it come to writing code that runs as fast as possible, there are a few tips, tricks and caveats to be aware of. As part of the BioConductor conference this past summer, Martin Morgan prepared a tutorial on efficient R programming. (Patrick Abouyen presented the tutorial on the day.) The slides (PDF) include lots of handy guidelines, including:
- Common performance pitfalls, and solutions
- How to measure performance and memory use
- How to work with large data files (handy if you don't have RevoScaleR)
- How to use parallel computing to speed up "embarassingly parallel" jobs (for example, use foreach/doSMP)
There's also a collection of exercises (PDF) you can use to test out your efficient programming skills. Many of the examples come from the Genomics domain (as befits the BioConductor conference), but the advice is relevant to any R user.
BioConductor.org: BioC 2010 Course Materials (see section Efficient R Programming)
Update Sep 28: Corrected spelling of Patrick Abouyen.
netcdf4 can be used for windows contrary to the slides contents. I found the following instructions which worked well for me on
http://www.unidata.ucar.edu/mailing_lists/archives/netcdfgroup/2008/msg00250.html
Starting from
http://www.unidata.ucar.edu/software/netcdf/docs/faq.html#windows_netcdf4
download
ftp://ftp.unidata.ucar.edu/pub/netcdf/contrib/win32/netcdf-4.0_dlls_snapshot2008092909.zip
ftp://ftp.unidata.ucar.edu/pub/netcdf/contrib/win32/5-181-win-vs2005.zip
(or wherever it points to - im sure there will be updates)
Unzip those and copy the following dlls to a directory in your PATH or
LD_LIBRARY_PATH:
Netcdf4 DLLs:
netcdf.dll
HDF5 DLLs:
hdf5_hldll.dll
hdf5dll.dll
(also see http://www.hdfgroup.uiuc.edu/windows/faq.html)
Go to ftp://ftp.hdfgroup.org/lib-external/ and get the szip and zlib DLLS. I
took
ftp://ftp.hdfgroup.org/lib-external/szip/2.1/bin/windows/szip21-vs2005-enc.zip
ftp://ftp.hdfgroup.org/lib-external/zlib/1.2/bin/windows/?? (missing - i have
reported to THG)
Or get zlib from http://www.winimage.com/zLibDll/.
unzip to get:
szlibdll.dll
zlib1.dll
These also depend on Microsoft DLLs (typically found in C:\WINDOWS\system32),
but they will already be in your path:
kernel32.dll
msvcr80.dll
msvcrt.dll
ntdll.dll
All of these DLLs are pure win32 "unmanaged code", as opposed to .Net "managed
code". win32 will work on both 32 bit and 64 bit system
Posted by: E jjunju | September 23, 2010 at 23:54
Interesting
You can also have a look at
http://www.burns-stat.com/pages/Tutor/R_inferno.pdf
Posted by: skan | September 24, 2010 at 00:47