The Enron email dataset, collected during the FERC investigation of the Enron financial scandal, represents the largest publicly available set of emails. This makes theman ideal testbed for sentiment analysis algorithms. Ikanow's Andrew Strite used the open-source Infinit.e framework and a Hadoop cluster to generate sentiment scores for all of the Enron emails, and then used R to manipulate and analyze the resulting data. Here's a visualization of just a few of the email accounts: the red marks flag emails where the sender's sentiment suddenly turned sharply negative (and would therefore be a good place to start looking for evidence):
Andrew used the rjson package to interface with the Ikanow REST API, the plyr package to restructure the incoming data, and the ggplot2 package to visualize the results. In a subsequent analysis he also used the zoo package to interpolate and analyze time series of sentiment scores, which you can read about in the full blog post below.
Ikanow blog: Making the most of sentiment scores using Ikanow and R
Comments
You can follow this conversation by subscribing to the comment feed for this post.