Update: The following was written when this weblog was called Kevin’s Worklog (and my personal journal was called The Bruised Edge). Now this is called The Bruised Edge and my personal journal is Kevin’s Journal. Confusing, I know… anyway, here it is.
Interesting… I recently changed the URL of my worklog from “/worklog” to “/weblog” (I’ve set up redirects in my .htaccess file and, now, everything that was pointing to “/worklog” should go to its new home). What is interesting, though, is that my worklog’s new home used to be the home of my personal weblog (that has a new location now too obviously). Stay with me… this is where it gets interesting…
Search for The Bruised Edge (the name of my personal journal) in Google and “Kevin’s Worklog” now shows up as the number one hit. This is because, I assume, the worklog has replaced the location of the journal. There is nothing in the worklog, though, that mentions the journal (also keep in mind the Google results that are returned have been updated to show the worklog’s name (not TBE’s)).
What this means is that Google is finding the phrase “The Bruised Edge” in it cache and ranking my worklog as the number one hit because of data in it’s cache, not the current data that it has (and even though the worklog has nothing to do with the journal aside from replacing it). All this brings me back to the Google patent (titled appropriately enough: Information retrieval based on historical data).
It seems that this historical approach to data is how Google intends to defeat the abuse of its ranking system by spammers (who have latched onto weblogs and the linking aspect of Google’s algorithms to drive up their Google rankings). For more on this new approach on Google’s part, see the Buzzle story that brought the patent to my attention.
The Buzzle story was posted, earlier today, to one of the lists to which I subscribe… I can’t remember which one. It is all very interesting (the conditions used to determine relevance (or should we say take a stab at relevance)). Those Google folks are frighteningly smart.
It reminds me of when they spoke at Stanford. I didn’t see them, but Dick reported that one of the most interesting things from the talk was that they said: once you had tons of data it was amazing the types of things you could do. The same algorithms that wouldn’t return good results with smaller sets worked much better when the data set was massive. It seems once you get past a certain point, you get a new perspective.
I wonder if the same is true for library organizations (like OCLC) that just have reams and reams of data?

Posts