Showing posts with label query. Show all posts
Showing posts with label query. Show all posts

Saturday, January 25, 2020

Where does information fluency intersect with entrepreneurship?

The latest edition of the Full Circle Kit for winter 2020 looks at information through the eyes of an entrepreneur. Startups live and die in markets. Gathering intel about one's market includes knowing about the size of one's market, market demographics, competitors and their market share. Much of this information is available online, if you know where to look.

This seven step tutorial includes interactive challenges and explanations, starting with optimizing keyword queries, finding and navigating specialized databases, browsing and using search strategies to find accurate market sizes, competitors and their share of the market. Online searching is only one of the ways to research a market, but it's one of the first tools used by lean startups that can't afford expensive market reports.

For a limited time, this tutorial is available without a subscription. Take a free tour!

Market Research Information Fluency

Friday, August 24, 2018

Keyword Games

In the newest Full Circle Resource Kit (Fall 2018), the featured article is about using games to change search behaviors. Telling students not to use full sentences and pay attention to their keywords, while instructive, is not very effective.

Student success in searching isn't quite as bad as the illustration shown, but all it takes is a couple of queries to hit to target to satisfy a poor searcher.

We invented a simple search game that serves as a game-making structure to reinforce good search techniques. A Keyword Game is basically a riddle: Who Am I? Three clues are provided: Latina, Bronx, Tarzan. Most people can't guess the identity without using a search engine.

The combination of the terms is what makes them effective, plus they are very specific terms--two are proper nouns.

For more on the games, see the latest edition of the Full Circle Kits (accessible with a school subscription).

The real value in the games is for students to create their own keyword challenges. Using one of the prompts, "Who/What/Where/When am I?" the challenge is to find 3-5 keywords that can be entered into a search engine to reveal the answer. To do so successfully, students engage in these effective search techniques:

  • Search with a few words (between two and five keywords)
  • Include only effective keywords and unique combinations
  • Avoid verbs, pronoun, articles, conjunctions, adverbs, adjectives (including complete sentences)
Have them play each others games to see whether they can be solved. If a set of clues doesn't work, this opens up an opportunity to understand what's wrong with the query.

By the way, the answer to the riddle above is Sonia Sotomayor. In Google there are other search results at the top of the list, but these don't answer the question because they don't contain all the keywords. This emphasizes critical reading to make sure the results match expectations.


Tuesday, July 19, 2016

Refining and Finding Keywords

The keywords you start with are often not the keywords you need.

A good example of this occurred recently in a summer program I was leading. It wasn't an Information Fluency workshop, but it did give me an opportunity to show some middle schoolers how to find the information they were seeking.

The program was "Lifecycle of a Startup" at the Illinois Mathematics and Science Academy in Aurora, Illinois. Middle school students attend who want to experience being in a startup. We compress the first year of a startup into five days as a simulation game. Most of it is real--they pitch their ideas to real investors in a shark tank experience to raise (simulated) capital to get their business off the ground.

One team was having trouble developing its idea. It was Day Three and they hadn't firmed up what their new product was going to be. They had been toying with the problem of CO2 emissions and wanted to develop an ink that could absorb CO2 from the atmosphere. They just hadn't found a way to do this.

As I watched them search, this was a typical query:
how to remove carbon dioxide from the air using ink
The first article they found was one about using carbon nanoscale fibers to remove CO2 from the air. But since this didn't have anything to do with ink, they moved on, growing frustrated. Fortunately, improvements to search engines allows them to use a long natural language string and get results (it wasn't always that way).

They missed seeing a couple better keywords in the reading which I pointed out: carbon sequestration--the name of the process.

I suggested they query: carbon sequestration ink

I'm not sure the students had ever heard of sequestration before, but it's an effective term to query. Would they have used it on their own? Doubtful. But students should be encouraged to look for better terms in the results, even (especially) if the words aren't familiar. 

This produced a link to some Google Scholar articles which opened doors to what they were looking for. Of course, the girls had to skim the articles to see if they were relevant. Another search term popped out of the first article: reduced carbon-footprint concrete.

The girls eventually found a connection between what makes concrete absorb CO2 and what could be added to ink. It took persistence. They changed their idea to carbon sequestering paint, since that covers more surface area.

See if you can find the compound or chemical that may be added to paint to suck CO2 out of the air.

Tuesday, March 18, 2014

Finding a Publication Date

Here's a common problem: You find a resource online that you want to cite, but the publication date is missing.

The case in question here is this article by Joseph Renzulli: A Practical System for Identifying Gifted and Talented Students.

The article references a number of studies and articles from the 1980's. How recent is the article itself?

It's not a hard problem to solve. Normally, start on the page itself and if clues don't reveal themselves, truncate the URL to see if there's a directory with date information, or try Page Info to see when the page was last updated. Upon investigation, there doesn't immediately seem to be a listing of articles on the site and the last update for this page was in 2013. This doesn't seem that accurate since the article is mostly about older findings.

So one investigation technique is to query Google (or another database) with the title of the article.

Try it and see what date you find.

Wednesday, March 13, 2013

High Cost of Being Gullible

The price of cyber crime is astounding.

  • UK Guardian: Consumers and businesses in the UK lost an estimated £27 billion in 2012 due to cybercrime.[i] 
  • Ponemon Institute: The average annualized cost of cybercrime for 56 benchmarked U.S. organizations is $8.9 million per year.[ii]  
  • People’s Public Security University of China: In 2012, economic losses from Internet crimes in China totaled an estimated $46.4 billion (RMB 289 billion).[iii]
And it's growing annually.

So what does being gullible cost the average American?

See if you can find the cost to the average Senior Citizen in the US today.

What does this say about the need to investigate online information?


[i] John Burn-Murdoch, “UK was the world’s most phished country in 2012 – why is it being targeted?”, www.guardian.co.uk, last modified on February 27, 2013, http://www.guardian.co.uk/news/datablog/2013/feb/27/uk-most-phishing-attacks-worldwide.
[ii] “2012 Cost of Cyber Crime Study: United States” Ponemon Institute, October 2012, http://www.ponemon.org/local/upload/file/2012_US_Cost_of_Cyber_Crime_Study_FINAL6%20.pdf
[iii] “Internet crimes cost China over $46 billion in 2012, report claims”, thenextweb.com, last modified January 29, 2013, http://thenextweb.com/asia/2013/01/29/china-suffered-46-4b-in-internet-crime-related-losses-in-2012-report/.

Friday, January 18, 2013

Invisible Query

Time flies! I've neglected this blog for about 6 weeks.

Dennis O'Connor and I are deep into authoring a book on Teaching Information Fluency. Our deadline is the end of April.

Writing a book is a discovery activity for me. Last time I wrote this much was my dissertation and I discovered plenty about flow and mathematics while doing that.

This time, while it would seem I've traversed the topic of information fluency through this blog and the 21st Century Information Fluency Project website, there are still Aha! moments.

As I was thinking about the process of querying, it occurred to me that there's a lot more to it than translating a natural language question into a query. That's just the visible query--the one that search engine responds to. There's also an invisible query, the one you don't enter into the text box. The keywords or concepts that remain in your head.

These help you filter the results of the query. Some results are more relevant than others, not due to their ranking, but because you have some priorities in mind the search engine is unaware of.

It's generally ineffective to enter everything you're looking for in a search box.  It constrains the search and produces fewer results--sometimes none. It's better to submit two or more (keeping it a small number) keywords and scan the results for your invisible query.

Using one of our classic examples, "How many buffalo are there in North America today?", a good query is buffalo north america (bison is better than buffalo). Yet that's not really enough information to answer the question which is going to be 1) a number and 2) as recent as possible. That's the invisible part that you have to remember throughout the process. You choose results that satisfy 1 and 2; if not, you're probably not answering the question.

One premise of the Filter Bubble is that the machine is learning from us and will hone its output to our preferences. This becomes a harder task when we are not feeding the machine everything we have in mind. It may be a pretty good way to keep the Filter Bubble from encompassing us.

Think about what you're not querying that you are still looking for next time you search.

Tuesday, March 13, 2012

Slinky Challenge (#006)

The Toys in Space Challenge started with Apollo 8 taking Silly Putty into Space. After the Challenge had been up for a year or so, the answer (Apollo 8) was no longer difficult to find.

So the next iteration was "On what NASA mission was Slinky first taken into space?" That was an intermediate challenge that required searching the NASA database. With the profusion of information, it was only a matter of time until a simple word query was able to find an answer without searching anything other than Google's database.

Today, the challenge reads: "Who commanded the first NASA mission to take a Slinky toy into space?"  This no longer requires finding the right database to query, but it does require some strategy and careful skimming of results. Using the FIND command is also helpful in sorting through pages of content looking for a relevant term--i.e., one of the keywords you used. Using fewer rather than more keywords is also helpful. It may also help to think of how an answer might be worded and use those keywords. I'd be interested in hearing what strategies or techniques you use in finding the answer.

This is the sixth 'refreshed' Challenge. My goal is to keep at least 10 Challenges up-to-date that focus on slightly different search skills.

Try the Challenge

Friday, January 28, 2011

Semantic Searching

The crew of Challenger's last mission 
Improvements to search engines make searching easier.

Take the semantic application hakia.com, for example. According to their site information, hakia is a semantic search technology company whose mission is to "deploy semantic search solutions to meet the challenges of elevated user expectation, business efficiency and lowest cost."  [source]

To no small extent, "elevated user expectation" stems from the frustration users experience when they can't quite figure out the right keywords to use or right databases to search in Web 1.0 and 2.0 worlds. Remember when search engines ONLY performed literal searches? There are still some of those around, but a new generation of engines is getting to the place where natural language (or free text) searches start to be meaningful to machines.

With improvements in programming and programming languages it is now possible to type in a question as a query and have a search engine interpret the meaning and return relevant results. There will continue to be advances, but this is well beyond the capabilities of the old "ask Jeeves" engine.

This has obvious implications for teaching students how to turn questions into queries, which has always been a challenge.  In time, questions could be the standard for queries.  Keyword searching will still work, but the user won't have to figure out what the important concepts are, which keywords might need to be replaced by more powerful search terms and what stop words not to include.

These advances don't solve the problem of whether the information returned can be trusted, but I have a feeling that isn't too far away. In the not too distant future, search engines will be able to provide a credibility rating based on authorship information, publication date, links to and a host of other factors. The semantic web makes establishing and checking these kind of data connections behind the scenes possible. Authors could have a "credibility score." Not sure how that would be determined, but the technology exists to do it.

A challenging thought, to be sure.

Despite "information crunching" advances that will make (supposedly reliable) information retrieval easier and easier, something tells me being a critical reader will never go out of fashion.

In the meantime, try a question search using hakia.com!  (e.g., what caused the Challenger disaster -- what do the results tell you?)

Monday, January 17, 2011

Flight Status Challenge

My daughter is flying home to the West Coast tomorrow morning. The weather today has caused delays at the Chicago airports.

The question is: Could weather be a factor that could delay the plane before it even gets to Chicago?

It's one thing to know your flight number, it's another thing to figure out where that plane has been or where it's going after you de-plane.  My son-in-law used this information last week to avoid a delay that seemed inevitable: his plane was coming in from Albany, NY. New York was about to be hammered by winter weather. He managed to reschedule his flight for a day earlier and avoided the delays.

Challenge: If you know your flight number, how can you find information about all the legs of that flight (the cities where it stops)?  Where would you search? What keywords would you use?


Post your answers for Southwest Flight 1048, Jan. 18, 2011.

Wednesday, May 26, 2010

A Query for Curry


Here's a good example of what can happen when facts aren't checked adequately.
It is reported that Ann Curry, news anchor for the NBC Today Show, mixed up her list of noteworthy graduates from Wheaton College while speaking at the Norton, MA college last Saturday. There happen to be two Wheaton Colleges (I attended the one in Illinois) and it's easy to get them mixed up.

Identical or similar names are easy to confuse online. Several Internet Search Challenges are built on this premise. The earliest of these, the Buffalo Challenge, was created to require additional keywords besides 'buffalo' ("How many buffalo are there today in North America?"). Search engines do a better job with that phrase than they once did. A few years ago you'd have to sort out the Buffalo New Yorks and Buffalo Bills and Buffalo wings from the bison statistics you wanted.

The query, famous graduates wheaton college, leads to a Wikipedia list of notable alumni, possibly the source of Curry's information. The results suggest there could be a problem with the information: Wheaton College (Illinois). Overlooking that part about Illinois could be someone's undoing.  I wouldn't call this as failure of fact-checking as much as a failure of reading.

When you don't know the existence of rival information, what are you to do? Taking the first return and stopping there is rarely a good idea, although the alternative, checking several returns, takes time.  When the accuracy of the information matters, spending the time to look at several returns makes good sense.  It's probably less time-consuming than writing an apology.

The query checklist doesn't ask this question, but it may be a good one to add: "is there more than one of the person, place or thing I am looking for?"

If you've ever encountered a similar confusion between (or among) search objects, share your story!

Sunday, May 9, 2010

Poop for Power

My Call for Reviewers was a little too successful last week. Before I knew it, 50 people had applied and I only needed 15. In case you went to the website site to apply and it was already taken down, my apologies. Everyone who applied was well-qualified. It was a painful selection process--I hated to turn anyone away.

So here's a glimpse of one of the items that was originally developed for Information Investigator 2.0 that didn't quite make it (because it was too hard).

First, a little about Information Investigator. Part of the package consist of a pretest and posttest designed to measure information fluency, in particular, investigative competencies.  Those competencies include knowledge and skills to use techniques like truncation, browsing, skimming, querying, special operators, etc., to help determine the credibility of information.

The pretest consists of 10 performance items. The posttest has 10 different items, measuring the same sets of skills. It's hard to guess the answers, and as a result, students tend to score in the 45-55% range on the pretest. After training, scores go up by about 15 points on average.

Here's one of the performance items that I developed but didn't use for the posttest. I felt it was too challenging. But it's a good challenge for this blog, nonetheless--one that really brings out the reference librarian in a person.

The back story is related to the use of animal waste to produce energy. There are lots of examples: L'Oreal powering a cosmetics plant with cow manure, the Dutch recycling chicken waste to power nearly 100,000 homes, and then there are some stories that seem a little harder to believe.

One of these is a story dating back to 2006 about San Francisco exploring the possibility of turning dog poop into methane to power households in that city. Here's a sample news report about it. The people of San Francisco have a lot of dogs. Dogs produce a lot of waste. Waste can be turned into methane. Did they ever do it? 

The challenge is:  
Fact check to see if San Francisco is using dog poop for power in 2010.  

Rather than just make it a yes or no question, here are some possibilities (multiple choice):


1. This is a hoax. There is no evidence that San Francisco ever considered using dog poop as a power source. 

2. This was never more than a proposal. Development never started. 

3. San Francisco has not yet started the program but still plans to do so.  

4. San Francisco started to collect dog poop for methane but later discontinued the practice.

5. San Francisco continues to use dog poop as a power source today. 


What do you conclude?  (If you live in San Francisco, this might be easier)

Tuesday, February 9, 2010

Mining eBay

A search this morning led me through eBay. This has happened before, so I think it's worth writing about.

Thinking that the brand name and model number would instantly fetch me the humidifier replacement filters I was looking for, I queried Google using sunbeam scm2412. Normally such a unique combination of keywords would be powerful enough to return the product and (I assumed) related parts.  That was not the case.

I did find the unit but nothing about replacement parts or filters. Expanding the query to include the words filter and replace didn't help. I ended up taking the humidifier apart to see the object I was searching for. No part number, but I could clearly see that the filter did not match the types I was finding online. Even a 'deep web' search of filter databases didn't turn it up. As far as I could determine, Sunbeam doesn't have a filter associated with the model scm2412.

The only alternative, short of picking up the phone to a supplier, was to reconsider and browse some of the sites selling the unit.  I had already browsed amazon.com, www.appliancefactoryparts.com, www.filters-now.com, shopping.yahoo.com, www.marbeck.com/humidifier_filters_sunbeam.html, www.nextag.com/sunbeam-humidifier/shop-html, www.householdappliance.com/, www.hardwareandtools.com/ and probably others without much success. I did notice that the name Jarden was often paired with Sunbeam, but I wasn't sure what to do with that.

Then there was eBay. I had exhausted the majority of the "official" distribution sites that might have proprietary information about the filter. From past experience I know that people who sell on eBay often provide their own descriptions. I found someone selling the unit and there in the description (I had to scan the page) were the words replacement filter: Jarden SW2002.

With those keywords I was able quickly to find the cheapest price online and placed an order.

It occurred to me that I had mined information from eBay before when I wasn't looking to buy something. For example, I've done research on 'beer engines' -- professional language for the equipment used to draw a pint (a fact I discovered on eBay) -- to determine how they work. I also learned a good deal about concertinas in eBay. Obviously, product-related information is going to be most prolific, so I wouldn't recommend going there to research the causes of WWI.  But if you want to find out about specific WWI hand weapons, I'm sure someone is offering that information for free.

There remains the matter about vetting the information. In my case, I fact-checked it and found the replacement filter. It's always a smart idea to see if the information found in an unvetted database can be verified or replicated.

I'm curious if you or your students have also found useful information in eBay or similar sites that you care to write about.

Friday, January 1, 2010

Blue Moons

If you live in the Western Hemisphere, New Years Eve was a blue moon. (Readers in the Eastern Hemisphere still have a month before their blue moon occurs.)

Lots of news sources (example) acknowledged the blue moon, citing that it is 'blue' because it is the second full moon of the month. There's no doubt this is a very popular answer, but its trendiness doesn't mean it is completely accurate. Blue moons have occurred on the 20th day of a month.

Fact: it takes the moon 29.5 days to go from full to full. Hmmm. There's no way a second full moon can happen in 20 days. What's going on?

The modern definition of blue moon is the result of an interpretive mistake, one made long before the Internet made the transmission of such errors immediate and widespread. The challenge is to use the Internet to track down the name of the individual who reinterpreted the definition and the year it happened.

This is a good example of how erroneous information, when picked up by a reputable source, becomes entrenched.

Good hunting!

Tuesday, November 3, 2009

Basic vs Advanced Searching


You probably don't use advanced search options very often.

You're not alone. Advanced searchers--such as the members of the search group at Google--use just the basic search functions more than 95% of the time. In practice, no more than 1 search in 20 requires special or Boolean operators other than AND, which is nothing more than using the space key.

You might expect that very experienced searchers would use specialized search tools more than that. Knowing how to use special operators is only a small part of becoming a search expert.

After leading search strategy workshops for a couple of years, I came to realize how little I depended on anything other than good keywords. Boolean operators (except for AND) are not needed most of the time. In fact, unless you really know how to use them, they either limit your search in ways you don't want or yield results you don't expect.

So the first point is that you really don't need to use them. This goes for "", OR, NOT, inurl: and a host of others. Concentrate on the quality of your keywords; that's what does the heavy lifting.

The second point is to know when you really do need an operator. My advice is never to use double quotes ("") around a phrase unless you know for certain that is the exact string you need to query. "Carl Heine" will return all the references to my name, but won't return any occurences of Heine, Carl (which is just as likely) or Carl A. Heine which includes my middle initial. Quotes is a good device for cutting down on the number of keywords in a longer query, for example: bison statistics 2008 "North America" (when you are confident that North America will be included in the information you want).

The OR operator is helpful when you want to cut down on the length of a query and believe there are multiple terms (usually rival nouns or adjectives) that might be in the information. For example: bison statistics OR population OR research 2008 "North America" That's still a four term query.

If you've discovered a time when NOT was essential to a search, let me know. Most of the time it eliminates results that may be valuable without giving you a chance to see them.

I don't believe I've ever needed to use inurl:, intitle: or any of the other in_: operators to find information. Keywords get the job done quite nicely.

Can you be an expert searcher without knowing operators? Over 95% of time, yes.

The real advanced part of being an expert searcher relies on the ability to use words sparingly and recognize words that are more powerful than others in the context being searched. That takes quite a bit of experience using words and predicting how they may be used in the type of search being conducted. For most children that presents a real challenge. They will need help while their linguistic skills mature.

Read the comments for more...

Wednesday, September 23, 2009

Revisiting words with multiple meanings


Of the items on the Query Checklist, one that could be dropped is #6: avoiding words that have multiple meanings.

If your query includes an adequate number of keywords--and not more than necessary--a word with multiple meanings does little to prevent you from finding what you seek. From my perspective, today's search engines (the ones that continue to develop) are less sensitive to multiple meanings and more sensitive to contextual clues provided by the other keywords. This is why a search for roman spears no longer returns top results about Brittney Spears. A few years back this type of search challenge was pretty easy to construct: find a word with a very popular or common usage and use it in a search for a less common object or idea.

Nowadays, the pairing of words in a meaningful context excludes other uses of the terms. As long as the accompanying term is sufficiently unique, using a word with multiple meanings is not a problem. The challenge is to find the word that uniquely modifies the more ambiguous term.

If some of the 21cif Search Challenges seem easier than they once were, credit search engines for producing more focused results.

It's still a good idea to be mindful of words with multiple meanings and pair them with unique terms. If you are looking for information on a disc jockey whose name is "Bill Gates" you definitely need some unique terms to ferret out someone other than the Bill Gates of Microsoft fame. This one still may be challenging.

Finally, does it need to be stated that one-word searches are confounded by words that have more than one meaning?

Next time: revisiting stop words and clutter words

Thursday, September 17, 2009

Revisiting Optimal Number of Keywords


When building a query, how many keywords is enough and what number is too many?

There really are no absolutes here because so many variables are involved, but there are some guiding principles which I've found consistently helpful.

The first item on the Query Checklist remains highly relevant: 'How many key concepts are contained in the question?' If you are merely interested in mp3 players, there's one concept contained in two words. On the other hand, if you want to know 'How many buffalo are there today in North America?', then you have four key concepts with which to contend:
what - many (number), buffalo
where - North America
when - today
Generally, the more defined the objective, the more concepts there are. Searching for just one or more than three concepts both may be problematic because of literal matching.

Literal matching: Too few terms
One-word queries are often ineffective because they match so much information that is irrelevant. The reason results are irrelevant is that the search wasn't defined sufficiently to begin with. I may want to find information on buffalo, but if I search only using that word, I will have to browse through a lot of information I may not care about. Interestingly, about half the college-aged subjects taking College Board's ICT test a couple years ago used one-word queries (citation needed--anyone up for the challenge?). One-word or single concept queries are probably good enough if you want to do a broad scan of the information landscape pertaining to a product, a person, an idea, etc., but they tend to cast a very wide net and consequently slow you down.

Part of the problem with a word like buffalo is that it has more than one meaning. You only had one meaning in mind but the search engine doesn' t know that because it looks for literal matches. This is where I usually introduce the 1 in 5 rule (although it's more of a phenomenon of language than it is a rule). On average, there are five terms that may be used for the concept you have in mind. You say buffalo, others say bison. There are probably only a couple more (ungulate anyone?), but in some cases there could be many more alternate terms (this happens especially with verbs).

Literal matching: Too many terms
Trying to match all the same words used by an author becomes increasingly difficult the more words you use. The beauty of search engines is that when you use words in a meaningful context, they tend to retrieve the meaning you have in mind. That's why a search for many buffalo North America doesn't yield information about buffalo wings or the Buffalo Bills football team. But those may not be the exact words an expert used when writing on the subject I'm researching. He or she may have used population instead of many, bison instead of buffalo. Proper nouns such as North America are more likely to be matched. The more terms you use, the less likely it becomes you will find an exact match.

I've had a lot of success searching for two or three concepts in my career. It requires keeping important concepts in mind that aren't really needed in the query--such as today in the buffalo example. Scanning the results, I look for current data, not information from the 1800's.

Sure, you can use queries containing more than three concepts, but unless you have a good idea what words an expert used, you're pushing your luck. Probability is against you. You're better off keeping your query simple.

Tuesday, September 15, 2009

Revisiting Word Order


When does the order of keywords matter?

The ninth item of the query checklist was always last because keyword order mattered the least. This remains largely the case.

Take a query I used today while doing some IMSA program planning: business ethics simulation. There are five other ways to order the terms. But does it make any difference?

Analyzing the top ten results in Google, Bing and Yahoo, here's how many different results were obtained when the order was switched (a total of 60 different results per engine is theoretically possible):
14 - Google
15 - Bing
15- Yahoo
A few other insights are worth mentioning:

Google returned the identical top result no matter the keyword order. The second and third slots were filled consistently by the same two pages with minimal alternation. In all, six returns were common across all possible keyword combinations. Queries that returned the most diverse results were: business ethics simulation, ethics simulation business and ethics business simulation. I'm not sure what to make of this observation, but I thought I'd mention it nonetheless. Any ideas?

Compared to Google, Bing was more varied in its ranking of results. No page was consistently the top result, although five pages appeared in the top ten on all trials. While Bing produced one more unique page than Google, several pages were from the same site. Of greater interest, Bing and Google returned a number of pages not replicated by the other (see below).

Yahoo, like Google, consistently returned the identical top page no matter what the query order. The second return was also identical across all queries, although this page was related to the first, so not entirely a unique return. Again, five of the same results were found with every query. Yahoo did not return Google's top return at all, but both Google and Bing included Yahoo's top result.

All three search engines combined produced a total of 31 unique returns. If I had stopped after entering the first query--business ethics simulation--the three search engines would have yielded 21 different pages. Fifteen additional queries netted only 10 additional, unique pages. Probably not worth the effort.

Pages unique to each search engine:
7 - Google
4 - Bing
9 - Yahoo
What to make of this? The biggest lesson, it seems to me, is that searching different databases is more worthwhile than playing with word order. Without looking past the first page of each, I netted twice as many highly ranked results than if I had only used Google. (Now whether the results are all that relevant is a matter of investigation). By contrast, I netted only 4-5 new pages by sticking with one search engine and varying the keyword order.

Based on the number of unique results, if you're not using Yahoo, you might consider adding it to your list of go-to search engines.

Some differences are obtained by changing the word order, but maybe not enough (in this case) to warrant going through all the permutations. In general, stick with the natural language order of the words. It seems natural to say business ethics simulation. The other forms seem a bit awkward or forced. Since search engines look for words in relationship to one another, and this is the order most people might use when writing about business ethics simulations, it's good enough. I'm sure there are cases you can think of when a particular order works better. If there are, post your reply.

There's one case when order is highly important: when operators are used. The operator modifies the keywords around it, so if placed in the wrong order, the results may be wildly unpredictable. For example: business OR ethics OR simulation (a student favorite when they stumble upon the OR operator).

Next time: revisiting the optimal number of keywords.

Saturday, September 12, 2009

Query Checklist Revisited


A number of years ago, we published the Query Checklist, a guide for turning questions into queries. I seldom think about the checklist anymore--I guess I've internalized the list, so checking off the items as I search really isn't necessary.

I thought it would be helpful to revisit the checklist to see if it's still relevant. As search engines have evolved, maybe something has changed.

The original list was the combined search wisdom of the 21cif team back when there were 7 of us and IMSA was the publisher. Now there are just two of us and 21cif is privately owned. That's what happens when federal funding runs out. At least the program survived, thanks to IMSA's decision to release it to its authors.

Here's the list, if you're not already familiar with it:

1. How many key concepts (important ideas) are found in the question?
2. How many key concepts will I search for?
3. What keywords are probably effective “as is?”
4. For which concepts are more effective keywords probably needed?
5. Are there hyponyms or professional language for any of the intermediate words?
6. Are there words that have multiple meanings?
7. Did I use any stop words or clutter words?
8. Did I spell the words correctly?
9. Did I put the most important words first?


There's too much here to cram into one blog posting, so I'll spread it out over a series.

Let me start with number 8, which seems pretty obvious. The importance of this question depends on the search engine being used. For instance, Google has a built-in spell checker so you might think spelling no longer matters much. When spelling does matter is when the misspelled word turns out to be a bona fide yet different word.

Example 1: If you are looking for information on bear tracks but mistakenly type bare tracks, Google thinks there's nothing wrong with your spelling and returns information on an Australian nudist colony. An honest mistake, but not one you'd likely want to make in front of a class of students.

When words have more than one spelling, or a different word happens to match the misspelling, then spelling counts.

Example 2: When a search engine lacks the capacity to spell check, then literal matching is all you've got. If you search for Mississipi using the Farmers Almanac, you won't get any results.

Challenge: Take the top ten search engines you use and see which ones check spelling. Better yet, have your students do this for practice. Don't have 10 search engines that you use? Time to branch out.

Tuesday, August 25, 2009

Search Challenge in the Toilet?


The title may be too graphic, but here's a search challenge that belongs in the bathroom. I could format this into a flash version of a search challenge, but first I thought I'd see if anyone can solve it or encourage me to develop it further.

It's really not as bad as it sounds.

What's the name of the company that makes a toilet that performs a personal health exam each time it is used?

I think the wording of the challenge could still be improved. Suggestions welcome.

Wednesday, July 22, 2009

Yes, There's a Need - Part 3

When people aren't sure what else to do, they resort to browsing.

While using a query or truncating a url might be a faster solution to a search problem, if a person isn't sure what else to do, he or she will browse. Even the best searchers do this. (And almost every search ends by browsing.)

Browsing is typically the least efficient of the three main search methods. Using a search engine is the quickest and using a subject directory (0r menu) can get one closer to the target in fewer clicks. But there's something fundamentally satisfying or comforting about browsing that makes it a preferred method.

In terms of satisfaction, browsing provides immediate feedback. You still have to scan the surroundings to determine what the feedback says about getting closer or not to your objective, but it's a bit like low stakes gambling and pretty addictive.

Nonetheless, browsing is not a good substitute technique much of the time. For example, I got an email recently about a link being changed on one of the pages associated with a particular search challenge. The page to be investigated really didn't call for browsing, but that's what this individual was doing when he or she discovered the changed link (it wasn't dead, it now pointed to something unrelated). The optimal technique is a string search of a statement to see if it is considered truthful by external sources. Following page links will not achieve this. In fact, browsing tends to confirm the truthfulness of the statement because the links provided on the page reflect the bias, not the objectivity, of the author.

Here's my advice: think before you browse. Ask yourself, is there another technique I know that might be more efficient or suited to the task? If not, ask yourself, what keywords am I looking for that will tell me I'm getting closer? You don't need to compile such a list first. Just being sensitive to the question will help you evaluate the keywords in the links you come across. Some will bear a closer relationship to your target than others.

I ended up removing the page with the (misleading) links from the tutorial challenge. It hadn't occurred to me that anyone would try to follow them, so I hadn't vetted them. Some led to objectionable content. Now the page has no links. The only way to answer the question is to use the preferred technique. Of course, if you don't know what that technique is, you're sunk.

Here's the challenge: http://untaughtgeneration.com/obama-quote.html