Google Labs has just released Google Squared. Unlike a Google web search which returns an unstructured list of web pages, Google Squared is designed to return structured data. Searching for US States returns a "square", much like an Excel spreadsheet or a data frame in R. The rows are states, and the columns are "facts" about those states: Name, Image, Population, etc. You can customize the columns returned to add new variables.
My first thought was that this would be a great source of data for examples in R. Just the other day, I was looking for a list the populations of the largest US cities to illustrate Zipf's law -- could Google Squared have helped me? Sadly, no -- at least not yet.
The first problem is data quality. That search for US States included Georgia in the top 10 ... but if you add "Capital" to the list of variables, the capital is listed as T'bilisi, not Atlanta. To be fair, Google Squares lets you click on a data value and select from other possibilities, so I can change it to Atlanta if I want. But I was hoping that Google Squared would draw on the consensus of the Web, in context with my search, to produce a table of good data values. It seems the intent is to use Google Squared as an alternative to Excel for collecting data you've found and verified yourself on the Web.
Even if you can find the right variables, getting the right records is tricky, too. Let's say I want to generate data for the 50 US States. First of all, I have to keep clicking "Add next 10 items" until the Square is full of all 53 rows Google generates. (Why can't I get all the rows in one fell swoop?) Then I have to delete DC, Virgin Islands, Afghanistan and Harvard University: that leaves me with 49 rows. One state is missing, but which one? You can't sort the rows by state name, which might have helped.
My next thought was to export the Square to R, and match the names against state.name to find the missing one. But, alas, you can't export the data. C'mon Google, why not a simple CSV export? I have to spend all this time creating and verifying the data, and now you're not going to let me use it? Grr.
I know this is only a Labs feature, and it does show promise. But with the data quality issues and the inability to export, sadly it doesn't seem like it's going to be a useful source of datasets anytime soon.
What about Wolfram Alpha?
Posted by: Alexandre Rademaker | June 04, 2009 at 15:30
Wolfram Alpha fares better with the question, although one can download the data only as PDF or Mathematica notebook. On the positive side, it even plots the data for you.
Google squares is a shocker, where 'problem of data quality' is an understatement. Data presented in each column are very heterogeneous and would require quite a bit of fiddling to make it into something usable.
Both Wolfram and Google perform really badly when looking at data from other countries (even English speaking ones).
Posted by: Luis | June 04, 2009 at 17:09
I quite like this search -- "mountain heights," the first thing that popped in my head: http://www.google.com/squared/search?q=mountain+heights
Still in Beta I suppose.
Posted by: Jay Porter | June 04, 2009 at 21:14
I agree with your criticisms - next 10 is annoying but I think they did it because Google^2 is compute/data intensive. The export to CSV would have been nice....
I disagree with your pessimism about the product - this is a Labs - so they are throwing it out there to let people play it. Lets hope that enough people use it to motivate google to improve it....thank you Google for another amazing product!
Posted by: arbitrage | June 04, 2009 at 23:02
Google Squared appears to be similar to my patent application:
Frankly, I am getting a Déjà vu effect while going through the “Google Squared” application because it appears to be very similar in function to my United States patent application which was filed on April 12, 2007 and as publicly disclosed by the United States Patent and Trademark Office on October 16, 2008, when the patent application was published.
My patent application is titled as “Method And System For Research Using Computer Based Simultaneous Comparison And Contrasting Of A Multiplicity Of Subjects Having Specific Attributes Within Specific Contexts” bearing Document Number “20080256023” and Inventor name “Nair Satheesh” which may be viewed at http://patft.uspto.gov/ upon Patent Applications: Quick Search.
Google Squared appears to be using at least some if not many of the same methods and systems as set forth by me more than two years ago in my patent application. In fact there are many more methods and systems disclosed in my patent application which I believe will help resolve certain inaccuracies found in current Google Squared application.
I have issued legal notices to Google through my Patent Attorney in the US but Google has not responded yet to any of my notices.
Posted by: Nair Satheesh | August 20, 2009 at 11:04