Josh Reich has created a "statistical learning web service" using R. The basic idea is that you can visit predict.i2pi.com and upload a data set (in CSV format). The only meta-information you provide is which variables in the data set are predictors, and which are responses. The service will then choose a statistical model, estimate it, and return predictions for the response variables for the model. You can leave some of the response values as NA -- missing -- to create a prediction set; rows with values will act as the training set.
The model estimation is implemented in R, and currently implements a range of common classification and regression methods. Better yet, the system is extensible: you can provide new models (including transformations of the variable space) as R code, and Josh will incorporate it into the suite of models that are tested on uploaded data sets. R has a wealth of
machine learning algorithms to draw on, so I'd expect the range of methods to expand significantly over time. The details on how models are evaluated and chosen, and how new models are added to the system, can all be found at the
i2pi blog (along with some good discussions of the engineering, performance and security implications that follow).
More than anything, I think this provides an excellent example of integrating R analytics into a web-based application. As an experiment in machine learning, color me intrigued: it will interesting to see whether this becomes a practical and useful service for predicting from data without human intervention. If so, I await the howls of protest from data miners, echoing similar howls from statisticians (vis-à-vis data mining) at the growth of data mining 20 years ago.
Ah. So this explains the new flow of incoming datasets. Thanks for the write up.
For those of you who have recently uploaded data, I'm looking to secure some more computing resources. Currently the back end is running on a heavily taxed server.
Thanks again.
Posted by: josh | June 22, 2009 at 21:47