"Machine Learning for Hackers" is a new book from O'Reilly Media by Drew Conway and John Myles White. A "hacker", here, is "someone who likes to solve problems and experiment with new technologies", and "Machine Learning" is usually thought of as a black-box, algorithmic approach to producing predictions or classifications from data.
This book, however, takes a pleasingly statistical approach to real-life prediction and classification problems. Rather than merely providing a "cookbook" approach to say, building a "who to follow" recommendation system for Twitter, it takes the time to explain the methodology behing the algorithms and give the reader a better basis for understanding why these methods work (and, equally importantly, how they can go wrong).
An analysis of author Drew Conway's Twitter network, classified by topic area favored by each Twitter user.
The book assumes familiarity with command-line scripting, programming, and algorithms in general. It does, however, give a gentle introduction to the R programming language, which is used to implement all of the examples. (The R scripts and associated data are also available for download.) In fact, this section also serves double-duty as an introduction to some of the basics of statistical thinking (moments, distributions, visualization, etc.), which is a very work addition in a "machine learning" book. It's also rich with many data visualizations (mostly created with the ggplot2 package), which not only helps explain the algorithms but is a useful demonstration in its own right of the value of data visualization in the data modeling process.
Machine Learning for Hackers is available for purchase now in hardcopy or digital format from the link below. I recommend it to any programmer who needs to generate predictions or classifications from data -- using R and learning more about the statistical techniques behind the methods will help you to create better data hacking applications in the long run.
O'Reilly Media: Machine Learning for Hackers, Case Studies and Algorithms to Get You Started
How you can review this book without mentioning the *astonishing* number of typos is beyond me. The core ideas contained in the book is solid enough but it, like the authors' previous book "Machine Learning for Email", clearly hasn't been edited by someone with even a basic grasp of English; many of the code samples clearly won't run either.
It's potentially a seven or eight out of ten given the content. As it stands, however, it barely merits a two. This is possibly the worst proofed book that O'Reilly have ever released.
Posted by: Ben Martin | February 17, 2012 at 00:12
Love the article, would love to have you publish some articles on seczine.com.
Check out the below link if interested:
http://www.seczine.com/article/contribute.html
Posted by: Bob Jones | June 11, 2012 at 17:28