Today's guest post comes from Revolution Analytics data scientist Luba Gloukhov — ed.
As a big fan of R’s ever-expanding geospatial plotting capabilities, I jumped at the chance to create a map using The Million Song Dataset (MSD). For each of the 10,000 songs in the million song subset containing non-missing latitudes & longitudes and non-zero artists familiarity scores, I added a song ‘break-out’ score — the ratio of song hotttnesss to artist familiarity (similar that introduced by Echo Nest’s Hottt or Nottt blog post). I then subset the data further to only include songs with a break-out score of greater than 1.
Exploring the map, I found myself YouTubing songs. Many of them I had never heard of before — a sign of either generally low artist familiarity scores or my own tanking hipppnesss. I decided against exploring the “why?” fearing it might just prove the latter. In an effort to simplify my process of exploration (and “what’s hottt”-self-education), I used R to embed YouTube videos in the map’s marker info windows.
Explore the interactive map here.
For each song in my subset, I added a new variable containing code that embeds the first YouTube video search result of the artist name and song title. For plotting, I used Milan Kilibarda’s plotGoogleMaps package. You can access my code and data via github.
Oakland’s own Del tha Funkee Homosapien’s 1991 hit Mistadobalina came up as a recent break-out song (with a ratio of 1.04). Since 1991, Del’s had numerous successes – as a member of Hieroglyphics, with the album Deltron 3030 and, perhaps most prominently, as a member of the Gorrilaz. One would think that, by now, Del’s familiarity would surpass Mistadobalina’s hotttnesss, generating a ratio of less than 1. How does this ratio compare to those of other Del songs? Is it really the case that Mistadobalina remains Del’s biggest break-out hit or do the hotttness scores of Del’s songs in general surpass his familiarity? Perhaps more enlightening than momentary snapshots of these metrics would be an investigation of how the variables have changed over time.
So much data exploration, so little time! I’ll have to save these topics for another R play date.
Luba Gloukhov is a Pre-Sales Engineer at Revolution Analytics. When not playing with R or helping others do the same, she can be found lifting heavy objects, thinking light thoughts or eating delicious food.
This is SO EXCELLENT! How did you get the data set, please?
Thanks,
Erin
Posted by: Erin Hodgess | September 24, 2012 at 10:06
Thanks Erin! You can obtain the subset I used as well as the full Million Song Dataset here: http://labrosa.ee.columbia.edu/millionsong/pages/getting-dataset
-Luba
Posted by: Luba Gloukhov | September 24, 2012 at 12:16