Last night I hosted a Birds of a Feather session on R at OSCON, and despite competition from a nearby party with an open bar and free video games, about 10 people turned up. To my surprise, all but one had never used R before, but all were very keen to learn more. It was a really enjoyable session having such an engaged crowd.
Everyone had a different motivation for learning R, and given the venue most of the interests were fairly technical: how do I analyze web stats, how can I figure out what causes degradations in computer performance, how can I improve quality at my manufacturing job, that kind of thing. But one participant had a really, really interesting application in mind. He has sleep apnea, and plans to collect data on his activities before sleep, and quality measures of the sleep itself (collected by a CPAP machine).
Lots of people ask me what's the best way to learn R, and my usual suggestions are to take a
course or to check out the
manuals and tutorials aimed at beginners. That's great for learning the fundamentals, but to
really learn R (or indeed any language) the best way is to find a data set or problem that's meaningful to you and just dive right it. There's nothing more personally important than figuring out how to get a good night's sleep, and collecting and analyzing those data would be an ideal problem to tackle.
The bottom line (for me, at least), is that interesting data is just
fun to work with. There's nothing like discovering a relationship or trend you hadn't suspected through a simple (but revealing) plot or data summary. Another great example I stumbled across today comes from Sean Carmody, a father-of-three in Sydney who used R to
analyze the "Hottest 100" songs of all time as judged by listeners of
Triple J (the national indie/youth station run but the ABC). The analysis isn't complicated, but the charts are revealing and (for indie music lovers like me) quite interesting.
So, find a data set that's interesting to you, and get cracking!
I'm glad you liked the Triple J Hottest 100 charts. I've now followed it up with some charts based on the Guardian's "1000 songs to hear before you die" list. Again, light on analysis, but a bit of fun. As usual, everything is done using R.
Posted by: Sean Carmody | July 24, 2009 at 14:00
This post makes a great point. Self-directed learning benefits greatly when you are interested in the topic at hand. However, do you have any recommendations for public data sets to start with for people who aren't in a position to have their own to work on but still want to learn R?
Posted by: James | July 30, 2009 at 09:34
James: funny you should ask that, I've recently begun compiling a wiki page of publically available economics and finance data. In you have any interest in that sort of data, feel free to sign up to pworks (free) and help contribute.
Another good place to look around for interesting data is the Guardian's DataStore.
Posted by: Sean Carmody | July 30, 2009 at 14:09
I should probably also add that many of the posts on my blog involves charts produced using R and publicly available data.
Posted by: Sean Carmody | July 30, 2009 at 14:11
Totally agree with you! The best way to learn R is finding some funny data to examine. But it is sometime difficult...
Posted by: TodosLogoso | July 31, 2009 at 02:15
That's sort of like saying the best way to learn to play an instrument is to get a piece and then learn the technique required to play each measure. Perhaps the best way to learn a language (on a reading level only) is to get a paper and translate each sentence. As you encounter each new point of grammar, you would then look it up and apply it in the context of translating yoru paper.
Yes, it's possible in theory. but some time spent with the fundamentals would be time well spent.
Posted by: mft | August 03, 2009 at 07:56