An the New York R User Group* last night, 100 R users heard Ni Wang and Max Lin talk explain how "R is one of the important tools used by analysts and engineers at Google for analyzing data". During the talk, Lin revealed that Google plans to make "R more integrated with internal machine learning algorithms and infrastructure", and one component of that plan was announced at the meeting: a new library for R to build and score models using the Google Prediction API.
The Google Prediction API is a black-box system for building predictive models. Given a set of training data (a set of continuous and/or categorical explanatory variables and a dependent variable), the Google algorithms automatically selects from several available machine learning techniques create a model from the training model. Then later, given a set of explanatory variables, you can predict the value of the dependent variable under this model.
Now with the googlepredictionapi R package (which you can download from Google Code), you can create such models based on data stored in a local CSV file or in the Google Storage system. The model is represented as an object in R, which you can then use to make predictions using the standard predict function, as illustrated in the following code:
## Make a training call to the Prediction API against data in the Google Storage. ## Replace MYBUCKET and MYDATA with your data. my.model <- PredictionApiTrain(data="gs://MYBUCKET/MYDATA") ## Alternatively, make a training call against training data stored locally as a CSV file. ## Replace MYPATH and MYFILE with your data. my.model <- PredictionApiTrain(data="MYPATH/MYFILE.csv") ## Read the summary of the trained model summary(my.model) ## Make a prediction call for text data using the trained model predict(my.model, "This is a new piece of text") ## Similarly, predict() works for numeric features predict(my.model, c(6, 3, 5, 2))
You need to request access to the Google Prediction API to use this package (instructions how to request are here). Anyone tried this out yet? Given that all the standard statistical (as distinct from machine language) models are in R, this package would make it easy to compare the performance of the automated Prediction API with more traditional statistical techniques.
[*] The New York R User Group is proudly sponsored by Revolution Analytics.
New York R User Group: R at Google (via)
Hi David, have you tested this package? I always have a problem when I want to train my local data, my.model <- PredictionApiTrain(data="MYPATH/MYFILE.csv"), it says Error in PredictionApiTrain(data = "test.csv") :
'remote.file' should be character
feel weird, I put the test.csv in the directory of my R, and can read it with read.table(), any idea? thanks.
Posted by: Quant | December 13, 2010 at 16:22
Sorry, haven't tried it myself (I need to apply for access to the API). Perhaps others can share their experiences...
Posted by: David Smith | December 13, 2010 at 16:26
thanks anyway, I checked the code PredictionApiTrain just now, at the beginning there are lines:
if (missing(remote.file) || !is.character(remote.file))
stop("'remote.file' should be character")
However, remote.file is an argument without default value, function (data, remote.file, verbose = FALSE).
Posted by: Quant | December 13, 2010 at 16:31
@Quant, please add remote.file when you call PredictionApiTrain() with a training file stored locally.
## Replace MYPATH and MYFILE with your local training file, and MYBUCKET with your own Google Storage buckets.
my.model <- PredictionApiTrain(data="MYPATH/MYFILE.csv", remote.file="gs://MYBUCKET/MYOBJECT")
Posted by: Max Lin | December 15, 2010 at 07:27
I have the same issue with my.model <- PredictionApiTrain(data="MYPATH/MYFILE.csv"). How can I resolve this? I've been looking for an answer but I can't find it hopefully you can help me thanks.
Posted by: Pamela Dickerson | February 23, 2012 at 10:28
i have loaded the library
>library(rjson)
>library(RCurl)
>library(googlepredictionapi)
my.model <- PredictionApiTrain(data="gs:// folder name/file_name.csv")
output is:
> ~/.auth-token does not exist, let's create one for you
Please input your email account to access Google Prediction API
(eg. [email protected]): Please input your password for this account
: Requesting Authentication from Google ClientLogin for user:
Error in strsplit(url, "")[[1L]] : subscript out of bounds.
in what format shall i enter the Gmail User ID and Passowrd. This is very confusing.
Please help me out.. Thanks a Lot
Posted by: Ramesh | April 01, 2013 at 08:31