« Demand for R jobs on the rise, ctd | Main | New Webinar: High Performance Predictive Analytics in R and Hadoop »

August 22, 2013


Feed You can follow this conversation by subscribing to the comment feed for this post.

Perhaps this is obvious, but if you need a big data set to play with, a solution is to generate the data set randomly. Specify a multivariate stochastic process and draw random samples for each variable. The size of your hard drive is the limit.

For economic data:


Great NYC datasets available at

The Lahman package is also pretty nice if you want a set of larger interlinked tables

The comments to this entry are closed.

Search Revolutions Blog

Got comments or suggestions for the blog editor?
Email David Smith.
Follow revodavid on Twitter Follow David on Twitter: @revodavid
Get this blog via email with Blogtrottr