Writing an R script is one thing. Organizing your process: where to put the data, how to refer to files in scripts, how to run the scripts, and how to produce and collect and report the results; that's quite another. Every R user has their own workflow for doing data analysis with R, but the best workflows achieve the following goals:
- Transparency: A good workflow organizes the elements of the project logically and clearly, to make it easy for an observer (including yourself) to understand how the pieces come together.
- Maintainability: A good workflow makes it easy to modify and adapt the project. Standardized script names and good commenting practices (in the code, as well as things like README files) are key here.
- Modularity: A good workflow encapsulates discrete tasks into separate components (e.g. scripts), so that it's always clear where modifications need to be made (and only made in one place), and components are re-usable for other projects.
- Portability: A good workflow makes it easy to move the project to another system, or hand it over to another person to work on, in such a way that it can still easily be run elsewhere. (By using relative (not absolute) pathnames, and remote access to sharedWorkflow for statistical analysis and report writing data, are two examples.)
- Reproducibility: A good workflow makes it easy for you, or others, to reproduce your results.
- Efficiency: Here I'm referring to the efficiency of you, the programmer, not computational efficiency. A good workflow saves you time, by making it easier to work on the project, and by automating as much of the process as possible.
Other than the package system (which is great, but can be overkill for many projects), R doesn't have any formal standards for designing a workflow. But here are a couple of suggestions from the R community:
- For projects to create a complete report from R code, see answers to the question Workflow for statistical analysis and report writing on StackOverflow.
- For more general development projects in R, John Myles White is developing the ProjectTemplate package to help standardize the structure of a project.
If you have other suggestions for organizing an R workflow, let us know in the comments.
I've organized some "best practices" guidelines in a post I recently published here:
http://www.r-statistics.com/2010/09/managing-a-statistical-analysis-project-guidelines-and-best-practices/
Cheers,
Tal
Posted by: Tal Galili | October 23, 2010 at 01:01
Use
http://www.stat.uni-muenchen.de/~leisch/Sweave/
or for interoperability
http://rss.acs.unt.edu/Rdoc/library/odfWeave/html/odfWeave.html
Posted by: Sergei | November 24, 2010 at 00:24