The latest in the O'Reilly "Short Cuts" series, and the first devoted to R, is Data Mashups in R. Written by Jeremy Leipzig and Xiao-Yi Li, this 30-page article is an excellent and very practical example of integrating messy data from varied sources, using R or REvolution R.
- Downloading an HTML file of foreclosed addresses from a public web-site (with download.file) and extracting addresses in messy formats (with grep);
- Downloading geolocation data from a Yahoo web-service and parsing the XML result (with xmlTreeParse);
- Downloading an ESRI shape file of Philadelphia and its census tracts, and plotting a map (with the maptools package);
- Matching the individual addresses to census tracts and counting the number of foreclosures in each (with plotPolys).