Showing posts with label stop words. Show all posts
Showing posts with label stop words. Show all posts

Monday, September 28, 2009

Revisiting Stop Words and Clutter Words

The final item on the Query Checklist that I'm revisiting is #7: Did I use any stop words or clutter words?

Briefly, stop words are terms ignored by the search engine: common parts of speech that don't add significant content such as pronouns, prepositions and conjunctions. Google lists some of its exceptions to the "every word counts" rule. Here's a more complete list of overlooked words.

One way to tell if a word is being overlooked is to examine the query results. Consider the query here are all the stop words (not using quotes). In Google, all the words will appear in bold if the exact phrase is found (you don't need quotes to return the exact phrase). If only certain words from the query were used to find a matching result, those words will be shown in bold. In my query example, the second snippet contains the word 'the' but it does not appear in bold. Yahoo is similar.

One way to guarantee ALL the words are used is to link them with the AND operator (returning results containing all the words but not necessarily in the order you used them) or putting quotes around the phrase (returning results containing the exact phrase you submitted).

Stop words are so common they add little to the uniqueness of a query, which helps drive you to more well-matched, meaningful results. Students might be tempted to use stop words with a natural language query (e.g., I want a list of all the stop words), thinking this means something to the search engine. The query, list stop words gets to the point.

In a similar vein, clutter words are less common than stop words but don't add value to the query. In fact, they may detract from it, forcing the search engine to look for words you think are important but do not occur with the information you are seeking. Clutter words include unnecessary redundancies (like earthquake AND damage--in which case damage is redundant: it's hard to write about earthquakes without referring to damage or destruction or a bunch of other words you might not have used). Verbs, adjectives and adverbs are often clutter terms as well. A good rule of thumb to keep in mind is "if you can't clearly see it, don't use the word." Stick to objects--nouns and numbers.

All in all, the Query Checklist has held up well over the past few years. Once the list is internalized it can help you cut down on search time and produce more relevant results.

Next time: It's probably time for another Search Challenge!

Sunday, July 12, 2009

World's Fastest Animal


"What is the top speed of earth's fastest animal?"

Seems simple enough. But just letting students search for an answer shortcuts an opportunity for learning. In my workshops for elementary teachers and librarians, I hand out half sheets of paper on which is written a different word from the challenge. I have the participants stand and ask them, "which of these words do we need for a query?" Prior to this we've looked at the Question to Query checklist.

With adults, the stop words automatically sit down without any question (what, is, the, of).

The individuals holding earth's and top realize they aren't necessary: where else would one look for an animal except earth (earth's is redundant) and top is redundant because fastest is one of the words.

The last one to sit is often speed. Fastest usually makes the point that speed is unnecessary as long as she's there.

That leaves fastest and animal. These two form the optimal query for the challenge. I should point out that one of these is an adjective--not usually a good "as is" word (nouns and numbers are better).

The exercise appeals to language arts teachers because it reinforces understanding parts of speech and the making of meaning--in this case understanding about redundancy. With younger audiences, it's possible to lead them to these discoveries by the use of questioning. In either case, the activity intersects two valuable lessons: one about language and the other about searching.

Over the course of the last month, the answer to the search challenge has become more interesting. Most people discover that the best answer depends on whether the animal flies, runs or swims. (As I wrote in the previous blog, this points out the inadequacy of the question and the knack some students have for assuming incorrectly they know what the question is about--and why many will say the answer is a cheetah.)

If you think the Peregrine Falcon is the speediest animal, you now need to defend your choice. A faster animal shows up in the results. Again, this points out the inadequacy of the search challenge question, but it forces you to decide what makes something the fastest.

Curious? Try the challenge.