Tim O'Reilly's keynote talk at OSBC this evening was thought-provoking to say the least. The title of the talk was "The Real Open Source Opportunity", and the surprise for me was that he wasn't talking about Open Source software. Tim's insight, and it's a profound one, is that the next frontier for freedom and openness -- and indeed, the way we'll live our lives -- lies with data.
Why? The world in which open source software was born is very different from the world we're heading to. Less than a decade ago, a major concern of the computing world was that much of the capability and innovation was locked up in closed software held by major corporations, like Microsoft. Open-source software addressed that. But look where innovation lies today: companies like Google (and few others) --built on the backs of open-source technology, mind you -- can now perform tasks that not long ago were the realm of science fiction. Today, you can speak a question into a tiny handheld gadget, and find out where to get good pizza. But think for a moment how Google can do this reliably and quickly: it's their data. They've amassed a massive, proprietary database of search queries, written text, and voice samples that allow the Google Voice Search app on the iPhone (and algorithms in Google's cloud servers) to distinguish "pizza" (said in on a noisy street in a Jersey accent) from "piece of" or the city "Pisa". Tim was careful to point out: it's not the closed algorithms that make this work. Peter Norvig from Google has said it himself: Google doesn't have better algorithms than everyone else. They just have more data.
Tim asked a question to the audience: "Could anyone in the Open Source community build the infrastructure to deliver Google Voice Search?" The response: a stony silence. The implication? Vendor lock-in is lo longer about proprietary source code. It's about massive, hard-to-replicate data sets -- making Google a potential Microsoft of the next decade. The corollary? The future will be about who has the most data, and who is able to extract meaning from it and deliver it in real time.
So how can we avoid data lock-in in the years to come? Tim suggests that it may be the underdogs of these new cloud-based tools that become the allies of open source data applications. Ironically, it may be Microsoft, lagging today in the domains of search, maps, and speech recognition, that may be the biggest ally in making the associated data services available openly. Google certainly has no motivation to do so; on the contrary, in areas like local search where they once linked to third-parties like Yelp, they're now providing their own data exclusively. Another opening likely comes from open standards for data sharing, like the Gov2.0 initiative.
The implications are profound, not just in terms of lock-in but also in the areas of privacy. (Interesting privacy implication: did you know that it's possible to identify a specific appliance, like a Kenwood dishwasher, from the "DNA" of its power draw signal? Consider that when your power company tracks your power usage with e-meters.) But when the operating system of the future is the entire internet, which license you use for open source software somehow seems like small beans compared to the open data issue.
Update: The slides from Tim O'Reilly's keynote are now available: Open Source in the Cloud Computing Era
Very interesting. O'Reilly's slides are interesting, but it's only with your comments that I can understand all the nuances of what he claims. Thanks!
Posted by: NotMe | March 18, 2010 at 01:58
A wonderfully important post David, many thanks for sharing!
(And how ironic that when I try posting this comment I got "we're sorry - cannot accept this data" :D
Posted by: Tal Galili | March 18, 2010 at 15:04
You did a good job connecting the slides, but I believe the thesis is false.
It is all about the software. We already have more data than we know what to do with. The challenge has always been in making sense of it. The fact that Hadoop and Lucene exist is proof that Google's approach to free software is wrong. I can see how we are repeating the Microsoft mistake with Google, and yet Tim is focused only on easy import / export of scraps from Google. Is Google's map database open? How is that going?
Once you explain free software, the free maps is a very short conversation.
He also seems to think hardware is a limiting factor even though you can get a terabyte drive for $100 at Best Buy.
Posted by: KeithCu | March 20, 2010 at 13:20
Keith, not sure I follow your points there. In particular, I don't think export of scraps of data from Google would solve the problem that Tim described -- indeed, the very fact that the Google Maps database is *not* open is an illustration of it. In fact, I'd say it's broader -- it's the integration of maps, contact, imagery and all the databases *together* that makes these new unique applications possible for Google and Google alone.
Posted by: David Smith | March 22, 2010 at 13:48
Hi David;
My point is that the free software movement will also guarantee "open data". Find me a piece of data about the Linux kernel that isn't "open". That is why focusing on open data is fighting the wrong battle.
And I don't agree that only Google can create data and put it on the Internet. I agree there is a lot of data to put up there, but I disagree that only Google can put it up there. There are many companies that have made maps before, for example. The whole point of "mashups" is to string stuff together. It doesn't need to be all Google doing this.
Posted by: KeithCu | March 22, 2010 at 18:24
I, envy you. Your blog is much better under the maintenance and design than mine. Who to you the design did?
Posted by: Feminissimo | July 26, 2010 at 19:14
If you’ve spoken to us before, you’ve probably heard us say that Second Life, despite whatever some of the press may say, continues to grow at a very healthy rate. More land, more users, more currency transacted - it is undeniable that this virtual world is not only alive and kicking, but still young
Posted by: KatjaPuurunen | April 14, 2011 at 11:28
Interesting post, I agree the future is definitely in open source, take a look at applications such as Ccleaner and Apache. All open source applications which leave behind commercial products in the same categories.
Posted by: Tech Avenue | June 28, 2011 at 07:16
If you are willing to buy a house, you would have to receive the personal loans. Furthermore, my father usually takes a college loan, which occurs to be the most firm.
Posted by: PansyDonovan | July 20, 2013 at 01:45