Showing posts with label cache query search. Show all posts
Showing posts with label cache query search. Show all posts

Wednesday, October 5, 2011

Medical Approval Challenge

News articles tend to introduce (incomplete) information that make good search challenges.

Here's an example from the Wichita Eagle posted on Sept. 6: Robot surgery now offered for head, neck procedures

The part of the article that ping'd my search radar was this:
"The da Vinci surgical system is now approved for use in surgeries of the head and neck."
Search for the missing information. Who approved it and when?

This search requires a keyword query, browsing, a deep web query of a specialized database and careful reading.


Post your answers in the comments: Who approved the device (I left a clue in this post) and (here's the real challenge) WHEN was approval granted? Provide an official date.

What problems do you encounter in this search?

Monday, March 14, 2011

Earthquake Challenge

The tragedies in Japan triggered by the earthquake last Friday may have been much worse if national preparedness for such events was not a way of life in that country.

Effective building construction and earthquake drills put many people in better circumstances than they would have otherwise experienced. While the death toll continues to rise and our thoughts go out to the people affected, we can also learn from this how better to be prepared ourselves.

Several years ago we created an Earthquake Challenge to test searchers' abilities to select the best keywords for finding this information:

Which toy demonstrates a construction principle that can reduce damage from an earthquake?


It's not a very difficult challenge since all the information needed is in the question

So what happens if you query the question "as is" (not eliminating any words)? The first two returns in Google are from Answer.com. Interestingly, a lot of Internet Search Challenge questions have been posted in Answers.com. Like most of them, the answer to this challenge is incorrect (sorry, Legos is not the name of the toy).
The next result is the original article I wrote introducing the Earthquake Challenge. It also does not contain the solution.

Down the page is a link to a patent page for a device that is "an object protection system." The page references a trampoline toy that is "a prior art." Patents have to disclose artifacts that may be considered similar to new inventions. This is not the answer.

Another result, this one from Blurtit.com quotes the question verbatim and provides another incorrect answer: "Toys can be perfectly used as models that can reduce damage from earthquake." Too general.
Life123.com has the same incorrect answer to the question as Answers.com. Makes you wonder where Life123.com gets their information.

What's going on?

With improvements to search engines I thought it would become easier and easier to locate a correct answer without having to decide which keywords to use. This is not yet the case.

Not until you start to eliminate extraneous keywords will the result start to show up.

If you are looking for an information fluency challenge that ties in with recent news, the earthquake challenge offers both. You can make the following points:
  • it is possible, using the right construction principles, to prevent damage from earthquakes;
  • just because you can find an answer to the question online doesn't mean the answer is correct (it takes fact-checking);
  • choosing a limited number of keywords rather than querying a whole question is a more powerful way to search;
  • if you use this challenge in a science class, you could explore why the principles seen modeled in the toy are effective in reducing damage caused by earthquake forces.
While I'm not the one responsible for the wrong answers that are posted online, I ask that you don't fix these. It makes a much better lesson with inaccurate information floating in the stream.

P.S. I changed the wording of the question today, so the challenge you see online differs from the question shown above.

Friday, November 20, 2009

Search Queries Unpacked



I'm at the Illinois Educational Technology Conference in Springfield, IL waiting for my presentation videos to upload.  Going over the wireless network is excruciatingly slow, so I thought I'd check my blog stats.

One of the free services I use with this blog is sitemeter. One of the most interesting features of this service is "referrals:" the url where readers were just before coming to this blog. I haven't tallied the numbers, but it appears that about half of those drawn to this blog arrive as a result of a Google search.

For example, here's the url that brought one of the last readers to the Internet Search Challenge blog:
http://www.google.com/search?hl=en&source=hp&q=internet%20searches%20for%20students&rlz=1R2ADSA_enUS343&aq=f&oq=
If you unpack that url (or click it), you can tell what query the person used to find the blog: internet searches for students. You can see the blog listed in the search results for that query. Currently it's #8 although that can change as time goes on.

That's a pretty good query, even though the stop word "for" is unnecessary. Here's a larger sampling of recent queries:

Detecting biases in writing
challenges searching by keywords
detecting bias activity
query two common interpretations internet
challenges in searching with keywords
optimal number of search terms
how has getting info easier with internet
"they formed an angel band" lyrics

Several observations occur to me in unpacking these queries.

First, even though queries are not formatted particularly well, they still led to relevant information (I'm assuming this blog was relevant to the original intent of the searcher!). Better queries are possible, but if natural language queries work, isn't it just being anal to insist that more elegant queries be used?

Second, and closely related, there are no failed queries in this list. Everyone came to this blog as a result of a query.  No matter what I might insist makes a good or optimal query, a lot of variation and clutter (unnecessary terms) still produces effective results. One of the techniques I like to use with students is to display a list of queries they propose for a specific type of search. One way to collect their queries is to create a Google form and have them post their query to the form. You can display the results in a spreadsheet without names attached to the queries. That works pretty well because there are almost always queries that will not work. It's informative for students to see how others create queries. Better queries are pretty easy to identify--which I think you can do with the list above. However, poor queries can be effective too, so it's always a good idea to try the queries to see if they connect to the information that was sought.

Third, looking at queries reveals trends. Looking at more than 30 queries shows me what topics brought people here. There's quite a bit of interest in detecting bias, which will prompt me to develop more resources on bias/objectivity in evaluation. If you have a blog, I recommend you unpack the queries that brought readers to you. You may be surprised.

Thursday, January 8, 2009

Coolhunting Web 2.0


I've been busy preparing for and facilitating an intersession course here at IMSA called "coolhunting." Essentially, this is a trendy term for "trend prediction." The basis of coolhunting is that you can use Web 2.0 tools to locate creative swarms of individuals who are developing new ideas before they reach a tipping point. Web 2.0 forums, chat, bulletin boards, etc. (even emails) afford a window into the communication patterns of people who are engaged in creative swarms.

My purpose in blogging about this is related to searching and evaluation. There are two ways to search for coolfarmers (those creative swarms). You can do what everyone on Web 1.0 does: lurk. I suppose browse is a better term for this, but lurking is really what the majority is doing when they are searching a Web 2.0 world. A fraction of the people who view blogs, conversations and other posts actually participate--something less than 1 in 5 get involved.

That means when searching Web 2.0 for information, at least 80% of people are at a distinct disadvantage. Without being involved in a conversation, they don't earn the trust of the individuals who are involved. The opposite is also true: you don't know if you can trust the people you are reading. That's the main obstacle to determining credibility in Web 2.0 circles.

Here's an example I used in the workshop this week. In Twitter, at present, I am following only 2 people. One of them is Scott Swanson, a colleague at IMSA who has leadership responsibilities for Second Life and One Laptop Per Child (OLPC). Let's say I wanted to find out about new developments in the OLPC movement. If I didn't know Scott, I could search Twitter for OLPC and I would find Scott along with a host of other people I didn't know. How can I tell if what they are talking about is worth following? Besides from reading their posts and becoming really familiar with OLPC, as a lurker, I am at a real disadvantage.

With my own Twitter account (it's free) I can see who Scott is following and who is following Scott. But I can't tell who in this crowd knows anything about OLPC without lurking for hours, reading posts from hundreds of individuals. I can see from posts that Scott is attending an OLPC conference in MA with students from IMSA. If I didn't already know him, this would make him appear somewhat trustworthy--the institution let him take students on a cross-country field trip to learn more about this subject.

I think it would save a lot of time to take a chance on writing to Scott and ask him about OLPC, explain my interest and see what happens. If he responds and it seems like a trust relationship develops, I have made a huge leap forward. Scott can introduce me to people with high opinion leadership in OLPC that he's already vetted. Now I'm using Web 2.0 as it was intended: to build networks, in this case, my own personal learning network in OLPC.

Try it yourself. The next time you have to use Web 2.0 for searching--a really good place to find creative projects, by the way--find a 'prospective expert' and get to know him or her. Then use this relationship as a springboard into their networks that you can trust.

I highly recommend reading the book Coolhunting by Peter Gloor and Scott Cooper if you want to know more about social network analysis, swarm creativity, collaborative innovation networks and so on. There are some very powerful search tools in this field that are like Google on steroids. I'll blog about that later.

Thursday, September 4, 2008

Find it fast(er)


All information retrieved by a search engine is archived. When you read a snippet (the title, abstract and url of a ranked Web page) you are viewing a copy of information gathered at an earlier date. A copy of the original page is usually available by clicking the cache link at the end of the snippet.

Knowing that information is archived can save time if you try to view the entire page. No doubt you've clicked on pages that looked promising only to find that the information you want is nowhere to be found. The reason is simple: the link goes to the current version of the page, not the archived version. The only way to see the archived version, where the information you want is located, is to click the cache link.

Cached pages are temporary. The next time a search engine crawler or spider visits the live page, the current version replaces the previous version. If the live page ceases to exist altogether (meaning the crawler can't find the page at all), the cached version is deleted. In order to find versions of pages older than the last crawl, you'll need to search a database that stores copies of pages, such as archive.org.

One nice thing about a cached page is that the database often highlights the words you used in your query. You still have to scroll and scan the page, but the highlighting makes finding the critical terms much easier and faster.

So consider clicking the cache instead of the live link -- you'll save time searching, especially if the owner of the page has updated information since the last crawl.

Here's the challenge: to test this advice, divide students into two teams. Have both conduct the same search, but (secretly, if you want) instruct one group to go to the cached information instead of the live page. See who finds the information more quickly. After a few rounds, have the winning team explain how they were able to find the information more quickly.

Try these Google searches (Reading level is middle school and up):

1. Google won't recrawl my site. Answer this question: What are four things this developer can do to encourage Google to recrawl the overlooked site?
2. Myst Cheat Codes. Answer this question: After locating diagram 158 in the library book and going to the chimney, what should you do next?
3. Radioactive Granite--a myth? Answer this question: according to Tim Cordova's Rock Blog, do granite countertops pose a health threat?

________________________________________________

Answers to the Previous Browsing Challenges

1. Plans for building a tree house: Home > Homeowners > Treehouses
2. Sites about Will Smith: Arts > People > S > Smith, Will
3. A fashion model agency in Minnesota: Business > Arts and Entertainment > Models > Agents and Agencies > North America > United States > Minnesota
4. Information about the World Game of Economics: Science > Social Sciences > Education > Software

Of course, all of these are EASIER if you use the Directory Search Box. What does that tell you about the difference between searching by querying and searching by browsing?