It's been a long time since I watched the The Simpsons, but I was always under the impression that Bart was the primary character. Perhaps it was all the Do the Bartman and "Cowabunga!" nonsense from the 90s. Anyway, data scientist Todd W Schneider used R to analyze the scripts of the first 26 seasons and found that Homer speaks twice as much as next most represented character, Marge. Bart comes a close third.
Marge and Lisa are represented in orange (the color of Lisa's dress, in fact) as the only 2 female characters that make the top 10. Female representation isn't much better in the supporting cast either; only 7 characters of the top 60 (12%) are female.
Todd's R code behind the blog post is available on Github (in the analysis folder). Of note to R programmers: Todd used the ggplot2 package to create the charts and created a custom ggplot2 theme for the charts (theme_tws_simpsons) using the Simpsons skin yellow and the Akbar font.
For more data analysis of the Simpsons, including a look at the ratings over the last 27 years, check out the Todd's blog bost linked below.
Todd W. Schneider: The Simpsons by the Data (via Jenny Bryan)
David, I thought I'd missed the introduction of gggplot2, so I checked rseek.org. Voila!
https://www.r-bloggers.com/fft-power-spectrum-box-and-whisker-plot-with-gggplot2/
But alas! That actually uses ggplot2, not gggplot2.
Posted by: Madeline | October 03, 2016 at 13:45
Thanks Madeline :). Corrected above.
Posted by: David Smith | October 03, 2016 at 13:47