One of the great strengths of R is that it promotes reproducible research: as an open-source system, you can easily send a script file to a colleague with the confidence that they'll be able to get the same results using R on their own system. Provided they have the same data, that is. If that's your goal, you can always send along a data file, but that can add some complications. You have to take care with the filenames in your script, and dealing with data that changes regularly can be annoying.
Google Docs offers a solution. As long as you have a Google account, you can store your data as a Google Spreadsheet, and then create a special URL for that spreadsheet that can be used as a CSV file source by R. The process is a bit complex, but it only needs to be done once, and then your data are freely available to anyone who wants to access it via R.
First, change the permissions for your spreadsheet on the main Google Docs page (the one that lists all of your Google Documents: spreadsheets, documents, presentations, etc) and configure your spreadsheet so that it can be viewed by everyone. (I'm not sure if this is strictly necessary for the export, but it does give your collaborators the ability to view the data directly as a spreadsheet.)
- Open your spreadsheet in Google Docs.
- Click the "Share" botton in the toolbar, and choose "Anyone with the link" for Visibility options
Now anyone can view your spreadsheet in Google Docs using the link given in the "Share" dialog. But we want a link for the CSV export version of the spreadsheet, not the spreadsheet itself. Here's how to get that.
- Open your spreadsheet in Google Docs.
- Click the drop-down menu (down arrow) nexy to the Share button in the upper-right corner of the spreadsheet) and choose "Publish as a Web Page".
- For "Sheets to Publish" choose "All Sheets", and check the box "Automatically republish when changes are made" if you want to dynamically update the data for R when you edit it.
- Click "Start Publishing". This will activate the options in the box "Get a link to the published data", below.
- Change the export type from "Web Page" to "CSV (comma-separated values)".
- Change "All sheets" to "Sheet1" (or select the sheet you want to export)
- Change "All Cells" to the specific range you want to export, beginning with the header row. Use Excel-style notation, like "A1:C6" for the first 3 columns and the first 6 rows.
- Click "Republish now".
The Publish box should look something like this.
You can now use the URL in the bottommost box with the getURL function (you'll need to install the RCurl package first -- required because of the HTTPS connection), and then read the data with read.csv in R:
> require(RCurl) > myCsv <- getURL("https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0AkuuKBh0jM2TdGppUFFxcEdoUklCQlJhM2kweGpoUUE&single=true&gid=0&output=csv") > read.csv(textConnection(myCsv)) x y z 1 1 0.3 10 2 2 0.5 14 3 3 1.1 12 4 4 0.1 1 5 5 1.9 0
Better yet, if I ever change the data in the Google spreadsheet, the command above will always retrieve the updated data (provided I chose the "Automatically republish" option, above).
I believe you can do something similar with the RGoogleDocs package (currently in beta and not yet on CRAN), but this process requires no additional packages for you or the recipient of your data.
Update Sep 15 2011: Google spreadsheets now publish using secure https:// URLs, which read.csv doesn't support directly. Updated the code above to use getURL from the RCurl package, which fixes the problem. Also updated the sharing details, which have changed in Google Docs since this article was first published.
I just discovered this last night and have been using it continuously. Great post. The advantage (maybe difference is a better word) of RGoogleDocs is that the spreadsheet does not have to be made public to access it.
Posted by: Greg | September 28, 2009 at 11:22
This is great. What we need is a R-specific URL shorter, similar tiny.url or tr.im. This service would take any gdocs spreadsheet or other web based spreadsheet or data source and make it into http://r.da.ta/smith_matrix_example_1, for example. Much easier to remember and share.
Posted by: Michael Rutter | September 28, 2009 at 16:18
This is just way too cool. Aside from yet another reason to use google documents, it reminded me why google documents are worth using to begin with. I have two clouds: one I pay for, and one I get completely free. Guess which one is really the more useful. Duh. Thanks for the slap upside the head.
Posted by: John R. Vokey | September 28, 2009 at 22:52
David
I've been using Google docs to make my R scripts and source data files available to interested readers for a few months now. Works great.
Here's a link to a post that shows the R generated chart and gives links to my R script and source files.
It's a good way to provide reproducible research resources. And the cost is free.
Posted by: D Kelly O'Day | September 29, 2009 at 06:34
I am looking for Michael Rutter who is from Texas and attended Lon Morris. If he is out there, we are all looking fo him because there is a reunion in Feb. Mike, if this is you, please look on the LMC alum page or call Norm..............Sue Godsey
Posted by: Sue Godsey | October 21, 2009 at 13:50
I export my spreadsheets from Google Docs this way using CSV files.
Posted by: Mike | August 06, 2010 at 08:29
This service would take any gdocs spreadsheet or other web based spreadsheet or data source and source data files available to interested readers for a few months now. Works great.
Posted by: tibia gold | September 29, 2010 at 01:09
This is very informative and it has worked for me in the past. I have recently gone back to some old code that uses this approach, created a new spreadsheet and... now I get this error:
Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote, :
incomplete final line found by readTableHeader on 'http://spreadsheets.google.com/pub?hl=en&hl=en&key=0Ah58q5aXNtBOdEUwVm12SDlTTGJYQ0FsV3dBc255Z1E&single=true&gid=0&output=csv'
... I wonder if Google has changed something. Here is a spreadsheet that works by setting up the csv the way that's mentioned here:
https://spreadsheets.google.com/ccc?key=0Ah58q5aXNtBOdEpBLVhuTGJocUtsRFVNbjBlRWVXMGc&hl=en
... and here's the new one that doesn't work:
https://spreadsheets.google.com/ccc?key=0Ah58q5aXNtBOdEUwVm12SDlTTGJYQ0FsV3dBc255Z1E&hl=en&authkey=COrukusN
... any help would be greatly appreciated. I am baffled.
Posted by: Dan Bowen | March 15, 2011 at 11:44
sorry... here is the one that works:
https://spreadsheets.google.com/ccc?key=0Ah58q5aXNtBOdEpBLVhuTGJocUtsRFVNbjBlRWVXMGc&hl=en&authkey=CIDHgd8D
... my apologies - the permissions had been changed
Posted by: Dan Bowen | March 15, 2011 at 11:45
I can't get these to work anymore. The https:// protocol isn't supported with the read.csv function. I can't figure how to get google docs to give only an http:// link, instead of the https:// link. Can anyone else get this to work?
Posted by: Scott | April 04, 2011 at 17:47
Hi,
I have the same problem - used to be able to use this fine (actually more than fine), but for the last year or so the google forms have been https
Paul
Posted by: Paul Hewson | September 15, 2011 at 12:11
read.csv doesn't support HTTPS directly, but you can use the getURL function from the RCurl package to create a HTTPS-based connection. Updated the article accordingly -- thanks!
Posted by: David Smith | September 15, 2011 at 13:28
Thanks David, this helps a lot. But it does point out one of the risks of posting your data on Google (and perhaps, a risk of the "cloud" in general): they can change things at any time, without notice.
Posted by: John Shonder | September 15, 2011 at 15:35
Here's another useful tool to link up R and Google products:
http://code.google.com/p/google-motion-charts-with-r/
It allows you to take data from R and graph it using the Google Motion charts API, without having to learn javascript!
It's a great tool, with a new version released just a few weeks ago (and available through CRAN, I believe).
Posted by: Zachary | September 16, 2011 at 07:53
Hello there, I tried you tutorial but it is not working. I keep receiving the "SSL Certificate Error" when I try the R comand. What could I be doing wrong?
Posted by: Joao | September 22, 2011 at 19:53
Run this command before read.csv to get rid of SSL error:
options(RCurlOptions = list(capath = system.file("CurlSSL", "cacert.pem", package = "RCurl"), ssl.verifypeer = FALSE))
Posted by: Sanjay | October 04, 2011 at 17:18
David,
Thanks a million for the reading https update.
Posted by: Paul Hewson | November 01, 2011 at 09:13
You're welcome Paul, glad it was helpful.
Posted by: David Smith | November 01, 2011 at 09:17
The command setInternet2(use = TRUE)
makes it possible to open https connection like
one=read.csv("https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=xxxxxxxxxxxxxx....xxxxx")
Posted by: Eugene | November 02, 2011 at 17:34
Only problem with this is that there is an un-specified delay between when you make changes to a spreadsheet in google docs, and when the changes appear in the published CSV. The automatically publish option can sometimes take 5 minutes to 30 minutes. Really annoying to have to go manually republish every time a tiny change is made. There has got to be a way to get the actual doc rather than a secondary published version.
Posted by: Jon | November 16, 2011 at 08:25
I also had issues with SSL and want to encourage the author or maintainer of this article to include something about the options that Joao shared:
options(RCurlOptions = list(capath = system.file("CurlSSL", "cacert.pem", package = "RCurl"), ssl.verifypeer = FALSE))
Thanks!
Posted by: Aaryn | March 28, 2012 at 11:40
I suppose it probably should go without saying--but is still worth being said--that commas in the table cells will cause read.csv to choke.
Posted by: Aaryn | March 28, 2012 at 11:41
#Aaryn , thanks, you definitely solved the problem. working perfectly now. as for the commas, that's one of the first things i look for when i get an import error, good tip
Posted by: Ricardo | May 23, 2012 at 04:40
Instead of an Excel Spreadsheet, let's say I have a function or some code I want to read. How would you for example read all this code stored on google docs?
https://docs.google.com/document/edit?id=1f11rcVs9AtVcgOyiP0lV04yq9hshVDIPx93lA0ItFvQ
I've tried
source('https://docs.google.com/document/edit?id=1f11rcVs9AtVcgOyiP0lV04yq9hshVDIPx93lA0ItFvQ')
but it's not working
any ideas?
Posted by: Marco | August 09, 2012 at 06:08