There's an interesting article in the NYT today about the emerging discipline of "digital humanities": extracting digital data from historical archives to answer questions from the Arts and Humanities. From the article:
Members of a new generation of digitally savvy humanists argue it is time to stop looking for inspiration in the next political or philosophical “ism” and start exploring how technology is changing our understanding of the liberal arts. This latest frontier is about method, they say, using powerful technologies and vast stores of digitized materials that previous humanities scholars did not have.
These researchers are digitally mapping Civil War battlefields to understand what role topography played in victory, using databases of thousands of jam sessions to track how musical collaborations influenced jazz, and searching through large numbers of scientific texts and textbooks to track where concepts first appeared and how they spread.
Unsurprisingly, this has kicked off a debate over the role of quantification and analysis in the liberal arts, vesus the more traditional approach of individual interpretations of documents and artifacts. While many in the old-school view such quantitative analysis as "whimsical", there are some great discoveries to be made through data science. Read the full article for some great examples.
New York Times: Digital Keys for Unlocking the Humanities’ Riches
I didn't know that the digital humanities field was as new as it sounds here. Booklamp.org's been around for 7 years now. I began working as an analyst for Booklamp for over 2 years now and it is astonishing the sorts of information that can be derived from simple text.
About the debate over individual and quantitative analysis - they're complementary, period. It depends on your question. I will say that the sort of information we (mass-)produce on a daily basis, at Booklamp, is nothing close to possible using only the human mind. There's a lot to be learned, and (apparently) the push is just getting started.
PS The technology that can be experience at Booklamp.org for public (pre-beta) consumption is about 2.5 years old now - FYI. We've kept everything to ourselves since then, but it'll make it to the public in time, and you'll never read a bad book again.
Posted by: Dan Bowen | November 16, 2010 at 19:18
I thought I'd add a rough-n-tumble bit of R code to this Humanities blog post:
text1 <- c('a', 'b', 'c', 'd', 2)
text2 <- c('1', 'b', 'e', 2, 'a', 'f')
length1 <- length( text1 ) ## 5
length2 <- length( text2 ) ## 6
overlap <- length( text1[ text1 %in% text2 ] ) ## 3
overlap_percentage <- ( 2 * overlap ) / ( length1 + length2 )
## overlap_percentage = 55%
... rough-n-tumble fer sure; but fun nonetheless.
Posted by: Dan Bowen | November 23, 2010 at 15:13