Showing posts with label Yahoo. Show all posts
Showing posts with label Yahoo. Show all posts

Friday, February 17, 2012

The Slow Death of the Link: Operator?

For a number of years, one of the best investigative tools for checking to see what websites link to another website has been the link: operator. Over the past couple of years, the effectiveness of the link: operator has diminished.  I hope it's not on its way out as a free tool.

Here's how it works: Let's say you want to see all (actually a sample) of the webpages that link to a site you want to investigate. Martinlutherking.org is a common example for this purpose. The query is
link:martinlutherking.org
...and the results would reveal the urls of pages that link to the martinlutherking.org home page. This list would contain links from the martinlutherking.org site itself (most sites link to their home page), educational institutions warning about bias on the site (a red flag) and hate groups (another red flag).

Google was the go-to search engine for this until a couple years ago, when they really pared back the number of results they returned. At that time, I advised using Yahoo.com as the search engine. Yahoo's Site Explorer would return many more results than Google.

Within the past few months that has changed. Yahoo merged their Site Explorer with Bing. Now that search capacity is part of the Bing Webmaster set of tools. If I want to see who links to my site 21cif.com, I have to create a Webmaster account, download an xml file from Bing (BingSiteAuth.xml) which the search engine uses to collect data on my traffic, etc. That's 1) not as user friendly as it used to be, and 2) if a site doesn't include the xml file, I doubt any information could be retrieved.

So here's what happens today if I search for link:21cif.com:

Google: 48 results, many of which are from other 21cif.com pages
Yahoo: 1 result

From using other webmaster sites like majesticseo.com, where I had to create yet another account, I know there are 394 referring domains, 193 of which are educational and 16 are governmental. Too bad I can't see what they are without a paid subscription. This seriously impares one's ability to check the 'references' of a site by looking to see who voluntarily links to it.

Checking inbound links is not only of interest to a webmaster, it helps searchers be more critical consumers of information.

For investigative purposes, having a readily available link: operator has become a staple. Now the challenge is: what is out there that can replace it?  For the time being, I'm recommended going back to Google.

Monday, September 28, 2009

Revisiting Stop Words and Clutter Words

The final item on the Query Checklist that I'm revisiting is #7: Did I use any stop words or clutter words?

Briefly, stop words are terms ignored by the search engine: common parts of speech that don't add significant content such as pronouns, prepositions and conjunctions. Google lists some of its exceptions to the "every word counts" rule. Here's a more complete list of overlooked words.

One way to tell if a word is being overlooked is to examine the query results. Consider the query here are all the stop words (not using quotes). In Google, all the words will appear in bold if the exact phrase is found (you don't need quotes to return the exact phrase). If only certain words from the query were used to find a matching result, those words will be shown in bold. In my query example, the second snippet contains the word 'the' but it does not appear in bold. Yahoo is similar.

One way to guarantee ALL the words are used is to link them with the AND operator (returning results containing all the words but not necessarily in the order you used them) or putting quotes around the phrase (returning results containing the exact phrase you submitted).

Stop words are so common they add little to the uniqueness of a query, which helps drive you to more well-matched, meaningful results. Students might be tempted to use stop words with a natural language query (e.g., I want a list of all the stop words), thinking this means something to the search engine. The query, list stop words gets to the point.

In a similar vein, clutter words are less common than stop words but don't add value to the query. In fact, they may detract from it, forcing the search engine to look for words you think are important but do not occur with the information you are seeking. Clutter words include unnecessary redundancies (like earthquake AND damage--in which case damage is redundant: it's hard to write about earthquakes without referring to damage or destruction or a bunch of other words you might not have used). Verbs, adjectives and adverbs are often clutter terms as well. A good rule of thumb to keep in mind is "if you can't clearly see it, don't use the word." Stick to objects--nouns and numbers.

All in all, the Query Checklist has held up well over the past few years. Once the list is internalized it can help you cut down on search time and produce more relevant results.

Next time: It's probably time for another Search Challenge!

Tuesday, September 15, 2009

Revisiting Word Order


When does the order of keywords matter?

The ninth item of the query checklist was always last because keyword order mattered the least. This remains largely the case.

Take a query I used today while doing some IMSA program planning: business ethics simulation. There are five other ways to order the terms. But does it make any difference?

Analyzing the top ten results in Google, Bing and Yahoo, here's how many different results were obtained when the order was switched (a total of 60 different results per engine is theoretically possible):
14 - Google
15 - Bing
15- Yahoo
A few other insights are worth mentioning:

Google returned the identical top result no matter the keyword order. The second and third slots were filled consistently by the same two pages with minimal alternation. In all, six returns were common across all possible keyword combinations. Queries that returned the most diverse results were: business ethics simulation, ethics simulation business and ethics business simulation. I'm not sure what to make of this observation, but I thought I'd mention it nonetheless. Any ideas?

Compared to Google, Bing was more varied in its ranking of results. No page was consistently the top result, although five pages appeared in the top ten on all trials. While Bing produced one more unique page than Google, several pages were from the same site. Of greater interest, Bing and Google returned a number of pages not replicated by the other (see below).

Yahoo, like Google, consistently returned the identical top page no matter what the query order. The second return was also identical across all queries, although this page was related to the first, so not entirely a unique return. Again, five of the same results were found with every query. Yahoo did not return Google's top return at all, but both Google and Bing included Yahoo's top result.

All three search engines combined produced a total of 31 unique returns. If I had stopped after entering the first query--business ethics simulation--the three search engines would have yielded 21 different pages. Fifteen additional queries netted only 10 additional, unique pages. Probably not worth the effort.

Pages unique to each search engine:
7 - Google
4 - Bing
9 - Yahoo
What to make of this? The biggest lesson, it seems to me, is that searching different databases is more worthwhile than playing with word order. Without looking past the first page of each, I netted twice as many highly ranked results than if I had only used Google. (Now whether the results are all that relevant is a matter of investigation). By contrast, I netted only 4-5 new pages by sticking with one search engine and varying the keyword order.

Based on the number of unique results, if you're not using Yahoo, you might consider adding it to your list of go-to search engines.

Some differences are obtained by changing the word order, but maybe not enough (in this case) to warrant going through all the permutations. In general, stick with the natural language order of the words. It seems natural to say business ethics simulation. The other forms seem a bit awkward or forced. Since search engines look for words in relationship to one another, and this is the order most people might use when writing about business ethics simulations, it's good enough. I'm sure there are cases you can think of when a particular order works better. If there are, post your reply.

There's one case when order is highly important: when operators are used. The operator modifies the keywords around it, so if placed in the wrong order, the results may be wildly unpredictable. For example: business OR ethics OR simulation (a student favorite when they stumble upon the OR operator).

Next time: revisiting the optimal number of keywords.