by Stephen Weller, Senior Support Engineer at Revolution Analytics, and Joseph Rickert
For someone trying to learn any new technology getting help with a problem on a public forum can be stressful. Knowing where to go, deciding how to pose a question and figuring out how to deal with a response can be challenging. Moreover, an unpleasant interaction could be ego bruising and a real set-back to learning. Before posting a question on an internet forum do everything you can to make it a positive experience for everyone involved. Here are some recommendations Steve and I have for getting help with R questions.
Two of the most novice friendly places to go for help in the R world are the R-Help mailing list on CRAN and the R section of stack overflow. Both of these forums are monitored by experts who are very willing to patiently answer questions, but not always well disposed towards mind-reading. Maximize your chances of getting a quick, positive response by formulating your question or problem as clearly as possible with minimum ambiguity. And then: do your homework. The r-project posting guide shows several ways to search for R help, lists the common mistakes people make in posting questions and provides a host of details on the resources available for getting help and the mechanics of using the various R mailing lists.
Stack overflow provides some excellent suggestions on posting questions. Doing the work to thoroughly research your question is also at the top of their list. Moreover, they point out that taking the trouble to do this makes you a valuable contributor to the R community. They write:
Sharing your research helps everyone. Tell us what you found ...and why it didn’t meet your needs. This demonstrates that you’ve taken the time to try to help yourself, it saves us from reiterating obvious answers, and above all, it helps you get a more specific and relevant answer!
Posting Your Question
When comes time to post your question you may find Steves guidelines helpful. These are based years of trouble shooting problems as a member of Revolution’s Technical Support organization.
- You should always include information on which version of R and what flavor of operating system you are running (see 'R.version' and 'Revo.version' in R)
- Be as specific as you can in describing your problem. Others will often need to duplicate the issue, even if they have worked with similiar code in the past. Simply saying 'function xxx doesn't work' is not helpful enough. Ask yourself what someone else might need to reproduce the problem on their system.
- Include sufficient context information, so that someone reading your question understands what your goal is with running your code or in doing a set of calculations in R.
- If the problem occurs in complicated or lengthy code, identify the problem function and provide a simpler reproduce for others, when possible.
- If you are having problems with an analytic function, either provide test data or provide information on how to reproduce your problem using built-in data in R(for example using 'kyphosis','airquality', etc.)
Finally, here are three examples from the Revolution Technical support archives that illustrate good and bad posts. Two are examples of what Steve calls "pretty well framed support questions" and one is an example of a question that lacks needed information.
#1 - a good post
A few folks here are trying to load the rJava library, they have JAVA_HOME set to their 64 bit java (1.6) but are getting this error.
call: inDL(x, as.logical(local), as.logical(now), ...)
error: unable to load shared library 'z:/R/win64-library/2.11/rJava/libs/x64/rJava.dll':
LoadLibrary failure: The specified module could not be found. We get this when trying to load the library in the console line. I am sure we’re missing something, but we are not sure what.
What makes this a good post is that it provides information on the versions of Java and R being run and provides a complete error message.
#2 - a good post
Platform: Windows (32-bit)
I am working with a 1.9 GB SPSS data file with 99 variables and 4,684,587 cases. When I try to read the file into Revolution Analytics R using the following command: inDataFileR3C <- "D:/2012 Base Year/RevolutionR/RandomVariables3C.sav" reaValExtData <- rxImport(inData = inDataFileR3C, outFile = "D:/2012 Base Year/RevolutionR/RandomVariables3C.xdf", stringsAsFactors = TRUE,rowsPerRead = 50000) I get the following error message: Rows Read: 50000, Total Rows Processed: 2550000, Total Chunk Time: 12.152 seconds Rows Read: 50000, Total Rows Processed: 2600000Failed to allocate 15300000 bytes. Error in rxCall("Rx_ImportDataSource", params) : bad allocation However, if I break the SPSS data file into 2 parts, one with 2,300,000 cases and the second with 2,384,857 cases, both parts can be read into R successfully.
This post provides very specific information on the error involved and on what the user did to troubleshoot the problem.
#3 - Not a good post
On a number of occasions I have been importing fairly large csv's (2 - 3 million rows). I know these are properly formatted (e.g. data is encapsulated by double quotes) and have row counts from using the wc command in a Unix environment. When I import these using rxImport, fewer rows are imported. Is there any reason why this might occur? No errors are reported and the job seems to complete successfully. Changing the number of rowsPerRead doesn't seem to make any difference.
Thanks in advance for any advice.
This question is missing the key information required to reproduce and troubleshoot the problem:
- How is the datafile being imported delimited(csv(comma-delimited), other)?
- What operating system is involved(Linux, Windows)?
- What version of R running?
- The post does not provide an example of the R code that led to the problem
Steve estimates that roughly 50% of the time the support engineers at Revolution Analytics have to ask for more information. When you post a request for help do your best to become part of the solution.