« Demand for R jobs on the rise, ctd | Main | New Webinar: High Performance Predictive Analytics in R and Hadoop »

August 22, 2013

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Perhaps this is obvious, but if you need a big data set to play with, a solution is to generate the data set randomly. Specify a multivariate stochastic process and draw random samples for each variable. The size of your hard drive is the limit.

For economic data:

http://www.aeaweb.org/RFE/toc.php?show=complete
http://research.stlouisfed.org/fred2/

Great NYC datasets available at
http://www.nyc.gov/html/dcp/html/bytes/applbyte.shtml#lion

The Lahman package is also pretty nice if you want a set of larger interlinked tables

The comments to this entry are closed.

Search Revolutions Blog




Got comments or suggestions for the blog editor?
Email David Smith.
Follow revodavid on Twitter Follow David on Twitter: @revodavid
Blogtrottr