Zoo Station

Wednesday, November 10, 2004

The new Microsoft search engine

I just checked out Microsoft's new so-called algorithmic search engine, now in Beta. I've got to admit that the first thing I tried was searching for my own name, a phenomenon that I have heard referred to as ego surfing, but it at least yields a search result that I am somewhat familiar with. The pages indexed by Google and Microsoft appear to be intersecting but mutually non-including. The top hits were often the same, as seen in this example. Is the "algorithm" in question the PageRank algorithm by some chance? I would be surprised if it were not the PageRank algorithm appropriately "Microsoft-ised". Read that "existing technology unaccountably made slower".

PageRank can be thought of as a model of user behavior. We assume there is a "random surfer" who is given a web page at random and keeps clicking on links, never hitting "back" but eventually gets bored and starts on another random page. The probability that the random surfer visits a page is its PageRank. And, the d damping factor is the probability at each page the "random surfer" will get bored and request another random page. One important variation is to only add the damping factor d to a single page, or a group of pages. This allows for personalization and can make it nearly impossible to deliberately mislead the system in order to get a higher ranking. We have several other extensions to PageRank, again see Page 98.

Based on my 2 minute testing, it seems like Google has more options under regular searching (such as definitional searches), and has more options for searching (such as image searching). Best of all, Google is blazingly fast. I would stick with Google.

UPDATE : It looks like I stumbled up on the search engine while they were still ironing out a few kinks. The latest version is nothing like what I saw Wednesday night - the UI is different, plus the latest version is not noticeably slower than Google. Worth checking out. That said, a key metric in search engines is the quality of results. It would be interesting to see how good that is.

[link] posted by Anand Manikutty : 9:32 PM