Wednesday, August 31, 2011

Google Sets Experiment: Cities 1337 & L4m3

Google has recently decided to terminate their super-kewl Google Sets service, a "mechanism for quickly and efficiently generating lists of items given one or more example". If you've never played with it before, we'd say give it a try as you may not get another chance. Here's an example of a recent experiment we ran on Sets using one of our favorite topics: cities.

Two cities where your editor has had a lot of fun are San Francisco and Berlin. Conversely, we've always found Brussels to be particularly boring and unpleasant, and have been cautioned by many folks to skip Dallas if possible. We used Google Sets to generate this list of cities which fit the first category but aren't in the second:

  • barcelona
  • paris
  • são paulo
  • shanghai
  • taipei
  • tokyo
  • toronto
  • zürich
Except for Taipei and Toronto this squares pretty well with a list of cities we'd like to spend time in. Contrast with the list of Brussels- and Dallas-like cities:
  • atlanta
  • boston
  • chicago
  • cleveland
  • columbus
  • frankfurt
  • houston
  • washington dc

What can we infer from these lists? Whatever it is we like about SF and Berlin can hopefully be found in Barcelona and Zürich as well - be it art, technology, or general excitement. Maybe these cities were all on Damien Hirst's last world tour, or hometowns of Roboexotica participants. Conversely the second list looks like places people are because they have to be (no offense Boston, but outside your universities you don't have much going on). A first guess might be these are all cities with major airports where people go to get a connecting flight to somewhere good.

If you're interested to help build an open source replacement for Google Sets, please contact us through the website or email EmbeddedLinuxGuy [at] GMail [punkt] com. We are especially interested in folks who want to work with machine learning, text processing, and large databases. Which, according to Google Sets, means we're also looking for "natural language processing", "data mining", and "fun".