Unstructured data (free-text documents, reports, articles and such) are a rich source of information, but for analysis in R the first step is often to convert the unstructured data into a structured, tabular format that lends itself to traditional statistical analysis and visualization techniques. We saw the other day how you can use Google Insights to convert volumes of search terms into a CSV of scaled counts. Now, you can accomplish a similar task by counting keywords in articles published in the New York Times.
Stanford professor Claudia Engel shows us how. It's a bit trickier than using Google Insights, because you need to first get a key for the NYT Developer API (it's free). With that key, you can adapt the code provided on the page linked below to count the number of articles matching a selected keyword (or words). The data is then available for analysis in R, where you can create charts like this one:
Claudia Engel: Scraping New York Times Articles with R
Comments