Showing posts with label clustering. Show all posts
Showing posts with label clustering. Show all posts

Monday, August 17, 2009

More information, a smaller fraction


I'm still thinking about my last post.

The paradoxical thing about information and searching is that the more of it there is, the less of it we will see. The results we retrieve will be a smaller and smaller sample of what's actually available. And I don't see how this trend can be reversed.

Today when I search Google for the phrase "information fluency" it reports that about 39,800 results are found. But if I had the time and wanted to look at all those returns, I could only get to 981 of them, or 2.5%.

When I ask workshop participants if they've ever gone to the end of the list retrieved, I've never encountered anyone who has. Most searchers stop after the first page; the number who look at two pages is much smaller. For the 8 people in 100 who go beyond the third page, they have access to 0.1% of the information theoretically available. For the majority who never look beyond page one, that number falls to 0.025%.

Keep in mind these are the stats for "information fluency." If the search was for "civil war" (90,200,000 results), single-pager searchers would be able to reach 0.00001% (one one-hundred-thousandth) of the information theoretically available. That's a very small percentage and it's going to get smaller.

One of the changes in Google I noticed over the last year was greater variation in the results on the top pages. It used to be that 21cif.com (previously 21cif.imsa.edu) took 8 of the top 10 spots. Today, 21cif shares the top ten places with four other organizations. By eliminating similar results, Google is able to show more choices for the query. That's better in terms of available information--there's no reason why 21cif.com should have a monopoly on the content--but the results remain limited to a handful of major players in the field of information fluency.

There are other ways to serve up a broader array of results, such as clustering. Polymeta is one example. But nothing digests more than a fraction of the information out there. That places the responsibility for gaining a broader perspective on the searcher. And that takes work: multiple queries to get at information buried too deep to be retrieved by a single query.

If we want to stay on top of information, given today's tools it's not going to be easy.