Joho the Blog
An Entry from the Archives

« Reddit acquired || Back to Blog | [berkman] Wendy Seltzer on copyright technology policy »

October 31, 2006

Ethanz on Google Coop

Ethan Zuckerman discovers that Google Coop's roll your own search engine has high precision but poor recall, i.e., it gives few irrelevant returns, but misses stuff it should find.

A little poking solves the mystery pretty quickly. Google Coop Search works by searching against the main Google search catalog, retrieving 1000 results and filtering them against the sites you've included in your catalog. This makes sense, computationally - these searches are fast, almost as fast as normal Google searches. Rather than conducting 3000 "site:" searches and collating and reranking the results, Google is sacrificing recall, getting 1000 results and discarding those not in your set of chosen sites, which requires one call to the index and a really big regular expression match.

...

...In other words, the little engine I've built is useful only if the sites I've chosen are relatively high ranking and authoritative sites on the topics I'm searching on.

[Tags: ethan_zuckerman google ]

Posted by D. Weinberger at October 31, 2006 02:37 PM


Post a comment

Guidelines for Commenting

Basically, you can say what you want. (Click here for the fine print.)

If you haven't left a comment here before, your comment may be put into a queue for me to approve. Sorry for the delay. Blame the damn spammers.