Many public agencies release data in a fixed-format ASCII (FWF) format. But with the data all packed together without separators, you need a "data dictionary" defining the column widths (and metadata about the variables) to make sense of them. Unfortunately, many agencies make such information available only as a SAS script, with the column information embedded in a PROC IMPORT statement.
To solve this problem, R user Anthony Damico (who's also the energetic voice behind the excellent R Twotorials series) created the SAScii package. It parses the SAS script (or even an unstructured text instructions file, with a PROC IMPORT statement included), and uses that information to read the associated fix-format ASCII file. Just provide the name or URL of the script/instructions file and the data file, and SAScii does the rest. There are several examples in the read.SAScii helpfile, including the R command to read the following public data sets:
- 2009 Medical Expenditure Panel Survey Emergency Room Visits File (Medical Expenditure Panel Survey)
- 2010 National Health Interview Survey Persons file (CDC)
- IPUMS - American Community Survey Extract (Minnesota Population Center)
- 2008 Survey of Income and Program Participation Wave 1 (US Census)
Many thanks to Anthony for creating this package and pointing it out to me at the useR!2102 conference last month. It unlocks many useful public data sets for those of us without SAS licenses.
SAScii package: read.SAScii documentation
David,
This is so very awesome. I thought I was in your debt before, now I am in pretty deep. :)
Hopefully the attention will ultimately spur these survey administrators within the government to release more technical examples using the R language. And having feedback / bug reports from anyone else won't hurt. ;)
In case anyone else is interested, I've uploaded my current work on the project to https://github.com/ajdamico/SAScii
And I'm also starting work on a repository of R scripts titled, 'Importation and Analysis of US Government Survey Data with R' at https://github.com/ajdamico/usgsd
Thank you again!!
Anthony
Posted by: Anthony Damico | July 10, 2012 at 14:53
Very cool. I've run across this a number of times, and it (until now) was a true PITA.
Posted by: John Johnson | July 10, 2012 at 17:42
great work Anthony!
Posted by: nikhil | July 11, 2012 at 11:59