The Personal Blog of Stephen Sekula

Oh Yahoo!, you cad…

Yahoo! recently claimed that they had more searchable content than Google. I’ve used Google for many, many years – I haven’t used Yahoo! for searches for the opposite reason I *do* use Google. It’s historically poor by comparison.

Looks like “Yahoo’s claims have been challenged by an independent check”:http://vburton.ncsa.uiuc.edu/indexsize.html. Scientists at the NCSA performed as many random searches for the same material from each service, and conclude that (with some errors) you can expect to obtain 166.9% more results from Google than Yahoo!.

Neato!

However, a very important question is missed by this study: what is the purity of the search result content? For instance, Google might return 100 results, but of that the total number of relevant links answering your query may be 10. That gives a purity of 10% (10 useful out of 100 returned). Yahoo might return 20, including the same 10 useful links, for a purity of 50%. Clearly, purity is an important factor in search engine output – you hope to see the relevant links! This is not considered by the authors, as they consider only random searches.