A new book by Jeffrey Stanton from Syracuse Iniversity School of Information Studies, An Introduction to Data Science, is now available for free download. The book, developed for Syracuse's Certificate for Data Science, is available under a Creative Commons License as a PDF (20Mb) or as an interactive eBook from iTunes.
The book begins with the following clear definition of Data Science:
Data Science refers to an emerging area of work concerned with the collection, preparation, analysis, visualization, management and preservation of large collections of information. Although the name Data Science seems to connect most strongly with areas such as databases and computer science, many different kinds of skills - including non-mathematical skills, are needed.
Throughout the book, you'll find many examples of data science applications implemented in the R language. For R beginners a Getting Started with R chapter is included, but it does get into some fairly in-depth topics including sentiment analysis of Twitter data, working with data in Hadoop via RHadoop, and creating information maps. R code is sprinkled liberally for your own use, and available to download (also under an open-source license) from GitHub.
You can find more details about the book at its companion website, linked below.
Introduction to Data Science: Teach Data Science (via Guillermo Santos)
[Updated 23 Apr 2014 to repair linkrot.]
Is there a table of contents?
Posted by: brian | February 24, 2013 at 18:26
Google "fake data science" or click on my name to read my comments about this book.
Posted by: Vincent Granville | March 01, 2013 at 23:16
This will be a good resource of information on R.
Posted by: Dr.Shailendra Singh | March 02, 2013 at 11:12
Vincent, not sure I agree with your assessment of this book. Hadoop/NoSQL is a part of data science to be sure, but not a *necessary* part. (You can do data science on many different data platforms, including small-data platforms.) Statistics *is* a necessary part though, and I wish more practitioners labelling themselves "data scientist" had a better grounding in the statistical basics. That's why I think this is a valuable book, especially given the price tag,
Posted by: David Smith | March 02, 2013 at 13:08
@Shailendra: The book can mislead people into thinking that data science = statistics + R. It also includes graph models and databases, processes for big data (read my article on the curse of big data to see why traditional statistics fail with big data), computer science, business analytics and more.
Statistics + R alone is not data science. It's like saying that gastronomy is French cuisine.
Posted by: Vincent Granville | March 02, 2013 at 13:53
The book is copyright under the CC license which restricts its use to non-commercial endeavors. Apparently no one at a company can look at the book.
Posted by: DTS | March 22, 2013 at 06:13
As a compete beginner, I found this book is quite interesting, but yes, it's more like a Stats + R programming book..
however, the Twitter examples is not working..
Posted by: ML | April 24, 2013 at 12:32
Please make it available on i-tunes Indian store.
Thank you
Posted by: Milind Sattur | June 01, 2013 at 05:56