The yhat blog lists 10 R packages they wish they'd known about earlier. Drew Conway calls them "10 reasons to always start your analysis in R". They're all very useful R packages that every data scientist should be aware of. They are:
- sqldf (for selecting from data frames using SQL)
- forecast (for easy forecasting of time series)
- plyr (data aggregation)
- stringr (string manipulation)
- Database connection packages RPostgreSQL, RMYSQL, RMongo, RODBC, RSQLite
- lubridate (time and date manipulation)
- ggplot2 (data visulization)
- qcc (statistical quality control and QC charts)
- reshape2 (data restructuring)
- randomForest (random forest predictive models)
You can find links to all of these packages and tips on how to use them at link below.
yhat blog: 10 R packages I wish I knew about earlier
I would add data.table to that list, perhaps in place of plyr if it has to be one or the other. data.table is very fast.
Posted by: Jeremy | February 19, 2013 at 17:34
Very informative blog. I just used RODBC and SQLDF for the first time. Are there any R packages to handle big data?
Posted by: Abhishek | February 19, 2013 at 19:48
I would add package ff to that list, perhaps in place of sqldf. sqldf allows to write a subset of native R queries in yet another language at a huge cost in terms of RAM (twice) and CPU (e.g. factor 40 slower). ff enhances R for big datasets and operations that are not possible in pure R given its RAM needs. For Revolution R Enterprise users, ff is less attractive on that list, because it has its own methods for big datasets on disk.
Posted by: Jens Oehlschlägel (maintainer of package ff) | February 20, 2013 at 10:41