DataMarket, a portal that provides access to more than 14,000 data sets from various public and private sector organizations, has more than 100 million time series available for download and analysis. (Check out this presentation for more info about DataMarket.) And now with the new package rdatamarket, it's trivially easy to import those time series into R for charting, analysis, or anything. Here's what you need to do:
- Register an account on DataMarket.com (it's free)
- Install the rdatamarket package in R with install.packages("rdatamarket")
- Browse DataMarket.com for a time series of interest (I found this series on unemployment)
- Copy the URL of the page you're on (the short URL works too, I used "http://data.is/qb61uf")
- Use the dmseries function with the URL to extract the time series as a zoo object
Here's an example:
> library(rdatamarket) > dminfo("http://data.is/qb61uf") Title: "Persons Unemployed 15 weeks or longer, as a percent of the civilian labor force" Provider: "Federal Reserve Bank of St. Louis" (citing "U.S. Department of Labor: Bureau of Labor Statistics") Dimensions: > unemp <- dmseries("http://data.is/qb61uf") > plot(unemp) > str(unemp) ‘zoo’ series from Jan 1948 to Jul 2011 Data: num [1:763, 1] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:763] "1" "2" "3" "4" ... ..$ : chr "Persons.Unemployed.15.weeks.or.longer..as.a.percent.of.the.civilian.labor.force" Index: Class 'yearmon' num [1:763] 1948 1948 1948 1948 1948 ...
Created by Pretty R at inside-R.org
With this package, you can go from finding interesting data on DataMarket to working with it in R in less than a minute. With such a wealth of data so easily available to the power of R, this will be a fantastic tool for all data scientists and data journalists.
CRAN: rdatamarket package
Thanks for the write-up and kind words David. More information about this will be coming soon to our blog but we welcome any early feedback, ideas and questions.
Oh, and the 100 million is old news. We're at about 130 million time series now with approximately 17.500 data sets already published ;)
Posted by: Hjalmar Gislason, founder and CEO DataMarket | August 26, 2011 at 02:19
Scratch that, the latest number of available time series is just over 107 million (from approx 17,500 data sets). Still a decent amount of data to play with :)
Posted by: Hjalmar Gislason, founder and CEO DataMarket | August 26, 2011 at 07:05
How free is the data?
More specifically: Could I use a small dataset I found via datamarket as an example dataset in my CRAN package .. of course with attribution, but *NOT* via explicit internet connection and download via datamarket, of course.
Posted by: Martin Maechler | August 27, 2011 at 02:07
@Martin, Hjalmar can probably provide a beter answer than I can, but it looks like each dataset has its own license. For example, the unemployment data set from this post has the following statement: "You are allowed to copy and redistribute the data as long as you clearly indicate the data provider (Federal Reserve Bank of St. Louis) as the original source". So it seems like you could download that one, at least, and use it in a CRAN package with attribution.
Posted by: David Smith | August 27, 2011 at 09:05
This arctile went ahead and made my day.
Posted by: Cheyenne | February 01, 2012 at 01:02