Every year since her inauguration in 1952, Queen Elizabeth II has delivered a Christmas Broadcast to her subjects. Dominic Nyhuis used R to analyze the transcripts of the speeches, and found some interesting trends in speech length and words used. Here, for example, are word clouds of the speeches from the first half (1962-1976) and second half (1977-2001) of ER2's reign.
If you'd like to analyze similar transcripts yourself, Dominic's R code provides a good place to start. He used Selenium (controlled by RWebDriver) to automate the process of directing a browser to scrape the transcripts from the Official Website of the British Monarchy. The XML package is used to extract the transcript itself from the page source. Next, stringr is used to decompose the speeches into words. Finally, the word clouds were generated using the wordcloud package (using a Wes Anderson inspired color palette).
To see more analysis and the complete R code used to generate it, follow the link below.
Automated Data Collection with R Blog: 50 years of Christmas at the Windsors
I have been wondering for a long time which may be the most suitable visualization of the evolution of topics over years. For instance, we have Google Trends in which you can just get a line plot comparing the ocurrence of two words (or expressions) over years, but it is far from appealing, and it seems it does not fit very well a situation with dozens of topics.
A word cloud evolving over time may be suitable, but it seems far from a trivial project. Any other suggestion?
Posted by: Jose Maria Gomez Hidalgo | December 26, 2014 at 02:18