Last week, Joe Rickert used R and four years of US Census data to create an image plot of the relative probabilities of being born on a given day of the year:
Chris Mulligan also tackled this problem with R, but this time using 20 years of Census data from 1969 to 1988. Chris extracted the birthday frequencies using Google BigQuery, and charted the results with the time series below using this R script.
My apologies to Joe, but I much prefer this representation to the heat map. Not only is the February 29 frequency multiplied by 4 (where we see that it's not a particularly surprising birthday to have given the overall seasonal trend), but the unusual days really stand out (and are annotated). You're relatively unlikely to find someone born on January 1, July 4 or Christmas Eve or Christmas Day (most likely because fewer Caesarian births happen, or more induced natural births are avoided, on those days). December 30 is a more likely birthday that you'd otherwise expect (maybe this has something to do with getting kids into an earlier school year?). Andrew Gelman shares a model of the seasonal trend that defines these outliers.
chmullig.com: Births by Day of Year
Thank you for putting up this chart. It answers several of my comments about shortcomings in Rickert's set of graphs.
Posted by: Carl Witthoft | June 15, 2012 at 08:05
The end of year peak trend is well documented: It is not so much school years as taxes. Getting an extra year of tax deduction is a pretty good motivator for induction...
Posted by: Josh | June 15, 2012 at 08:40
Ditto Josh. Also, school deadlines vary, but are often in September or October. It could be interesting to compare the heatmaps in states with different deadlines. I doubt the differences would be very visible, but imagine you'd see something.
Posted by: Sarah | June 19, 2012 at 19:16