Oh my, that was fast! Less than 24 hours after the Choropleth Map Challenge was laid down, no fewer than 5 hackers responded with complete solutions for plotting the US unemployment data on a color-coded map, each in less than 20 lines of R code.
Overall, the results were very close to the FlowingData original. There were some data-matching problems, which were solved either by using a shape file and matching FIPS codes, or doing some data cleaning to match by county name.
I'm always amazed by the resourcefulness and ingenuity of R programmers. Each solution takes a slightly different approach to finding and loading the map data, and matching it to the census data. Some are geared more towards exploration of the data, and some more towards presentation. But all solve the problem effectively, and in my opinion much more straighforwardly than in Python. Great job, all.
Anyway, my discussion of the various solutions is below the break.
R graphics guru Hadley Wickham was first out of the gate with a solution that relied, naturally, on ggplot2. After reading in the CSV file, the map itself he extracted from the maps library (with the handy map_data extraction tool from ggplot2). Then he merged the map data with the unemployment data by county and state, and then used ggplot2 to create the chart:
state_df <- map_data("state")
choropleth <- merge(county_df, unemp, by = c("state", "county"))
choropleth <- choropleth[order(choropleth$order), ]
choropleth$rate_d <- cut(choropleth$rate, breaks = c(seq(0, 10, by = 2), 35))
ggplot(choropleth, aes(long, lat, group = group)) +
geom_polygon(aes(fill = rate_d), colour = alpha("white", 1/2), size = 0.2) +
geom_polygon(data = state_df, colour = "white", fill = NA) +
scale_fill_brewer(pal = "PuRd")
You can download the complete script (including 8 lines of data prep code) from his github page (with some bonus representations, too), but the final chart is below:
(Click to enlarge this and other charts.) The chart's not perfect: there are a few mismatches between the county names in the map data (e.g. "winn") and the unemployment data ("winn parish, la"), so for example the entire state of Louisiana is missing.
Another strong entry came from Barry Rowlingson. He avoids the data-matching problem Hadley encountered by downloading shape files from the US Census site, so his map data includes FIPS codes that can be matched directly with the unemployment data. He uses readOGR from the rgdal package to read in the shape file, then merges on the FIPS code.
county$fips = paste(county$STATE,county$COUNTY,sep="")
m = match(county$fips,unem$fips)
After using RColorBrewer and his own colourscheme package to replicate the original color scheme, it's simply a matter of making the plot:
That the Census shapefile has Hawaii, Alaska and Puerto Rico in their correct geographic locations, so he's just plotted the lower 48 so the map isn't tiny. Using the FIPS codes makes for much better data matching (but even then, there are a few counties that don't match up). Barry also notes that the border lines are a bit heavy in this version, taken from a PDF. Here's Barry's code.
Jason H from Offensive Politics took a slightly different tack, using the basic maps library to plot the data. Like Hadley's solution, this left him without the FIPS codes to match on, but managed to solve most of the data-matching problems with some clever regexp work. By working only in lower-case and deleting any references to place descriptors like "City" and "Parish" he was able to match almost all of the data to the map. Here's the code after reading the data (or download the complete file from github).
unemp$mpname <- tolower(paste(state.name[match(
sub("^(.*) (County|[Cc]ity|Parish), ..$","\\1", unemp$name),sep=","))
unemp$ri <- as.numeric(cut(unemp$unemppct,
cols <- c("#F1EEF6", "#D4B9DA", "#C994C7", "#DF65B0", "#DD1C77", "#980043")
mp <- map("county", plot=FALSE,namesonly=TRUE)
Personally, I wouldn't have gone with the dark border lines that obscure most of the South, but it's an excellent effort for just 6 lines of code.
J from This Is The Green Room used much the same approach as Jason H, using the maps library and some name cleaning to improve the data matching by county. He's provided a detailed description of his work, including all the code. In particular, he's used white borders and the polyconic map projection to create a chart that's almost identical to the original from FlowingData:
Still a few missing counties (not to mention Hawaii and Alaska), but that's one good-looking chart. J shows how to produce a similar version with ggplot, too. Nonetheless, he still prefers the original to his versions:
On the whole, I’d still take Nathan’s map over these as a finished product. However, I don’t think R can be beat for ease of use and all-in-one packageability – if I wanted, I could run regressions on the data, overlay my chart with more colors or new metrics, explode out certain counties or states… the possibilities are endless. With just a couple lines of code, I could overlay states the voted for Obama in blue, or highlight counties starting with the letter “C”. The static SVG method doesn’t allow any of that flexibility. Also, I’m completely confident that if I had any experience with these mapping packages – rather than using them for the first time tonight – I could mimic Nathan’s image perfectly.
Finally, Eduardo Leoni produced a script also using the Census data, but using the maptools library to read the shape file. Unfortunately, I his site seems to be down right now, so I can't see the results. Update: here's Eduardo's map: