Joho the Blog » search

October 22, 2012

[internet librarian] Search tools

Gary Price from Infodocket is moderating a panel on what’s new in search. It’s a panel of vendors

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

The first speaker is from, which he says “is thought of as the third search engine” in the US market. It features info from authoritative sources. “You don’t want your health information to come from some blog.” When you search for “kate spade” you get authenticated Kate Spade fashion stuff. Slashtags let you facet within a topic, based on expert curation. Users can create their own slashtags. At /webgrep you can ask questions about the corpus that if upvoted the techies at Blekko will answer. describes itself on its site as “Natural Language Processing Tools and Customizable Knowledge Bases for Semantic Search and Discovery Applications.” Thomas talks about OntoFind and semantic search, which is a search that produces “meaningful results even when the retrieved pages” contain none of the search terms [latent semantic search!]. He points to Google’s Freebase, which has info about 500M entities and their relationships. In a week you’ll be able to try OntoFind at, I believe. Searching for big brother and privacy first asks you to disambiguate and then pulls together results. is designed to help scientists follow science. It diagrams publications on a topic, and applies article-level metrics. It’s focused on the undergrad and graduate research markets. It integrates genomic knowledge plus much more. It lets you see the history of science top down, and browse e.g. by date. You can share what you’ve found.

[I couldn't hear the Q&A well enough to blog it.]

1 Comment »

January 7, 2012

Does Google’s use of ‘social signals’ break the Web?

There’s a fascinating post at ReadwriteWeb by Scott M. Fulton III about the effect “social signals” such as posts by people within your Google+ Circles, has on search results. It is not an easy article to skim :) Here’s the conclusion:

It is obvious from our test so far, which spanned a 48-hour period, that there may be an unintended phenomenon of the infusion of social signals into all Google searches: the reduction in visibility in search results of the original article that generated all the discussion in the first place. This may have a counter-balancing effect on the popularity of any article…

Be the first to comment »

October 7, 2011

[2b2k] How we assess credibility

Soo Young Rieh is an associate professor at the University of Michigan School of Information. She recently finished a study (funded in part by MacArthur) on how people assess the credibility of sources when they are just searching for information and when they are actually posting information. Her study didn’t focus on a particular age or gender, and found [SPOILER] that we don’t take extra steps to assess the credibility of information when we are publishing it.


January 24, 2011

Grimmelman non search neutrality

James Grimmelmann, whose writing on the Google Books settlement I’ve found helpful, has written an article about the incoherence of the concept of “search neutrality” — “the idea that search engines should be legally required to exercise some form of even-handed treatment of the websites they rank. ” (He blogs about it here.) He finds eight different possible meanings of the term, and doesn’t think any of them hold up.

Me neither. Relevancy is not an objective criterion. And too much transparency allows spammers to game the system. I would like to be assured that companies aren’t paying search engine companies to have their results ranked higher (unless the results are clearly marked as pay-for-position, which Google does but not clearly enough).


December 17, 2010

The Annals of Searching: Cluetrain circa 1505

Confine your search at Google Books for only the 19th century Cluetrain references, and you get four hits. In fact, the earliest reference to Cluetrain indexed by Google Books was in the 1505 business best-seller Extravagantes com[m]unes, in which appears the sentence “Markets are conversations…with that lying bastard Roger the Offal Merchant.”

1 Comment »

November 18, 2010

[defrag] Semantic 10 minute sessions

Ann Hunt is describing Primal‘s ability to let people create what she calls “idiosyncratic ontologies.” It wants to let two people have differing tags and ontologies about the same objects, and see the shared and social point of view. From the Primal site: “The Primal Semantics API helps users find material of interest in a larger collection of information. It organizes responses into hierarchies of concepts, with broad topics leading to more specific ones.” Ann stresses that it’s cool to bring together individual points of view and semantic networks.

Bob Smith of ISYS Search Software says that most people don’t find what they’re looking for on Google the first time they search. Google is an ad company, not a search company, so “you shouldn’t buy your next search service from an ad company.” Today, we need search everywhere, for everything. Bob then pitches us on Isys.

Brian Cheek of TigerLogic says he’s in the search enhancement business. Links make problems for searches, he says. Google instant preview helps a little, he says, if it’s for a site you’ve been to already. He focuses on YoLink, which provides more intelligent searching and browsing within particular domains. It’s a browser add-on that’s available for incorporation into apps by developers. YoLink mines links, extracting content from them based on your key terms. You can check-of the returns of interest and publish them directly into a Google Doc or tweet them. You can explore a set of links without having to browse to each of them.

Be the first to comment »

September 1, 2010

OED goes paperless

The Oxford English Dictionary has announced that it will not print new editions on paper. Instead, there will be Web access and mobile apps.

According to the article in the Telegraph, “A team of 80 lexicographers has been working on the third edition of the OED – known as OED3 – for the past 21 years.”

It has been a long trajectory toward digitization for the OED. In the 1990s, the OED’s desire to produce a digital version (remember books on CD?) stimulated search engine innovation. To search the OED intelligently, the search engine would have to understand the structure of entries, so that it could distinguish the use of a word as that which is being defined, the use of it within a definition, the use of it within an illustrative quote, etc. SGML was perfect for this type of structure, and the Open Text SGML search engine came out of that research. Tim Bray [twitter:timbray] was one of the architects of that search engine, and went on to become one of the creators of XML. I’m going to assume that some of what Tim learned from the OED project was formative of his later thinking… (Disclosure: I worked at Open Text in the mid-1990s.)

On the other hand, initially, the OED didn’t want to attribute the origins of the word “blog” to Peter Merholz because he coined it in his own blog, and the OED would only accept print attributions. (See here, too.) the OED eventually got over this prejudice for printed sources, however, and gave Peter proper credit.

1 Comment »

August 14, 2009

Search Pidgin

I know I’m not the only one who’s finding WolframAlpha sometimes frustrating because I can’t figure out the magic words to use to invoke the genii. To give just one example, I can’t figure out how to see the frequency of the surnames Kumar and Weinberger compared side-by-side in WolframAlpha’s signature fashion. It’s a small thing because “surname Kumar” and “surname Weinberger” will get you info about each individually. But over and over, I fail to guess the way WolframAlpha wants me to phrase the question.

Search engines are easier because they have already trained us how to talk to them. We know that we generally get the same results whether we use the stop words “when,” “the,” etc. and questions marks or not. We eventually learn that quoting a phrase searches for exactly that phrase. We may even learn that in many engines, putting a dash in front of a word excludes pages containing it from the results, or that we can do marvelous and magical things with prefaces that end in a colon site:, define:. We also learn the semantics of searching: If you want to find out the name of that guy who’s Ishmael’s friend in Moby-Dick, you’ll do best to include some words likely to be on the same page, so “‘What was the name of that guy in Moby-Dick who was the hero’s friend?’” is way worse than “Moby-Dick harpoonist’.” I have no idea what the curve of query sophistication looks like, but most of us have been trained to one degree or another by the search engines who are our masters and our betters.

In short, we’re being taught a pidgin language — a simplified language for communicating across cultures. In this case, the two cultures are human and computers. I only wish the pidgin were more uniform and useful. Google has enough dominance in the market that its syntax influences other search engines. Good! But we could use some help taking the next step, formulating more complex natural language queries in a pidgin that crosses application boundaries, and that isn’t designed for standard database queries.

Or does this already exist?



July 19, 2009

Britannica: #1 at Google

Today, for the very first time in my experience, The Encyclopedia Britannica was the #1 result at Google for a query.

It’s good to see the EB making progress with its online offering, but I’m actually puzzled in this case. The query was “horizontal hold” (without quotes), and the EB page that’s #1 is pretty much worthless. It’s a stub that gives a snippet of the article on the topic, but the snippet oddly begins with definition #4. The page then points us into actual articles in the EB, but they’re articles you have to pay for (although the EB offers a “no risk” free trial).

So, how did Google’s special sauce float this especially unhelpful page to the surface? And why isn’t there a Wikipedia page on “horizontal hold”? And does this mean that if there’s no Wikipedia page for a topic, Google gets the vapors and just doesn’t know what to recommend? Nooooo………

[Tags: ]


July 17, 2009

Search matchups

Google vs. Yahoo

Google vs. WolframAlpha

Google vs. Bing

(via Keith Dawson)

[Tags: ]


Next Page »

Switch to our mobile site