Last night, the Chicago Cubs made history by winning the World Series for the first time in 108 years, beating the Cleveland Indians in a nail-biting extra-innings game. As a result, today's blog post is a little late as I, along with most of Chicago, was celebrating into the wee hours of this morning. In recognition of the event, and the fact that simple data analysis is all I can muster today, I thought I'd use the excellent Lahman package, which provides a trove of baseball statistics for R, to have a look at the historical performance of the two teams.
First, let's tidy up the Teams data set to look at only the Cubs and Indians:
library(checkpoint) checkpoint("2016-10-15") library(Lahman) data(Teams) ## filter data to just Cubs and Indians since both were playing, in 1901 ## Teams didn't play the same number of games each year, so rescale ## Use team names instead of codes, and clean up unused teams library(dplyr) library(forcats) library(magrittr) Teams %>% filter(teamID %in% c("CHN", "CLE") & yearID > 1900) %>% mutate(Team = fct_drop(fct_recode(teamID, Cubs = "CHN", Indians = "CLE"))) %>% mutate(RunsPerGame = R / G) %>% mutate(HitsPerGame = HA / G) -> CubsInd.team
Now, let's take a look at the average number of runs each team has hit every year, since the Indians joined the league in 1901:
library(ggplot2) p <- ggplot(data = CubsInd.team, aes(yearID,RunsPerGame)) + geom_point(aes(color = Team)) + geom_smooth(aes(color = Team), method = "loess") + xlab("Year") + ylab("Runs per game") + ggtitle("Runs per Game") print(p)
Historically, the Indians have generally hit more runs than the Cubs. On the other hand, the Cubs' pitching has been better in recent years, allowing fewer hits per game:
p <- ggplot(data = CubsInd.team, aes(yearID, HitsPerGame)) + geom_point(aes(color = Team)) + geom_smooth(aes(color = Team), method = "loess") + xlab("Year") + ylab("Hits per game") + ggtitle("Average hits allowed per game") print(p)
For some other examples of using the Lahman package, check out this post by Joe Rickert. The plots above were created using the ggplot2 package; here's a handy cheat sheet from RStudio.
Congratulations to the Cubs!!!
Comments