Monday, September 28, 2009

Revisiting Stop Words and Clutter Words

The final item on the Query Checklist that I'm revisiting is #7: Did I use any stop words or clutter words?

Briefly, stop words are terms ignored by the search engine: common parts of speech that don't add significant content such as pronouns, prepositions and conjunctions. Google lists some of its exceptions to the "every word counts" rule. Here's a more complete list of overlooked words.

One way to tell if a word is being overlooked is to examine the query results. Consider the query here are all the stop words (not using quotes). In Google, all the words will appear in bold if the exact phrase is found (you don't need quotes to return the exact phrase). If only certain words from the query were used to find a matching result, those words will be shown in bold. In my query example, the second snippet contains the word 'the' but it does not appear in bold. Yahoo is similar.

One way to guarantee ALL the words are used is to link them with the AND operator (returning results containing all the words but not necessarily in the order you used them) or putting quotes around the phrase (returning results containing the exact phrase you submitted).

Stop words are so common they add little to the uniqueness of a query, which helps drive you to more well-matched, meaningful results. Students might be tempted to use stop words with a natural language query (e.g., I want a list of all the stop words), thinking this means something to the search engine. The query, list stop words gets to the point.

In a similar vein, clutter words are less common than stop words but don't add value to the query. In fact, they may detract from it, forcing the search engine to look for words you think are important but do not occur with the information you are seeking. Clutter words include unnecessary redundancies (like earthquake AND damage--in which case damage is redundant: it's hard to write about earthquakes without referring to damage or destruction or a bunch of other words you might not have used). Verbs, adjectives and adverbs are often clutter terms as well. A good rule of thumb to keep in mind is "if you can't clearly see it, don't use the word." Stick to objects--nouns and numbers.

All in all, the Query Checklist has held up well over the past few years. Once the list is internalized it can help you cut down on search time and produce more relevant results.

Next time: It's probably time for another Search Challenge!

No comments: