logo

Let’s just see what happens

Mobile Version

About me

Newsletter

Videos

Speaker

Hard to Read? Choose a style: Style 1 Style 2 Style 3 Default Toggle Sidebars

Search Pidgin

Posted on August 14th, 2009

I know I’m not the only one who’s finding WolframAlpha sometimes frustrating because I can’t figure out the magic words to use to invoke the genii. To give just one example, I can’t figure out how to see the frequency of the surnames Kumar and Weinberger compared side-by-side in WolframAlpha’s signature fashion. It’s a small thing because “surname Kumar” and “surname Weinberger” will get you info about each individually. But over and over, I fail to guess the way WolframAlpha wants me to phrase the question.

Search engines are easier because they have already trained us how to talk to them. We know that we generally get the same results whether we use the stop words “when,” “the,” etc. and questions marks or not. We eventually learn that quoting a phrase searches for exactly that phrase. We may even learn that in many engines, putting a dash in front of a word excludes pages containing it from the results, or that we can do marvelous and magical things with prefaces that end in a colon site:, define:. We also learn the semantics of searching: If you want to find out the name of that guy who’s Ishmael’s friend in Moby-Dick, you’ll do best to include some words likely to be on the same page, so “‘What was the name of that guy in Moby-Dick who was the hero’s friend?’” is way worse than “Moby-Dick harpoonist’.” I have no idea what the curve of query sophistication looks like, but most of us have been trained to one degree or another by the search engines who are our masters and our betters.

In short, we’re being taught a pidgin language — a simplified language for communicating across cultures. In this case, the two cultures are human and computers. I only wish the pidgin were more uniform and useful. Google has enough dominance in the market that its syntax influences other search engines. Good! But we could use some help taking the next step, formulating more complex natural language queries in a pidgin that crosses application boundaries, and that isn’t designed for standard database queries.

Or does this already exist?

Tags: search pidgin nlp natural_language_processing google everything_is_miscellaneous

Tagged with: everythingIsMiscellaneous • everything_is_miscellaneous • google • metadata • natural_language_processing • nlp • pidgin • search

Previous: « Lego hops off the Cluetrain onto the tracks in front of it, wondering what that increasingly loud sound could be || Next: New Mac, and cloning BootCamp XP »

3 Responses to “Search Pidgin”

  1. Mirek Sopek, on August 14th, 2009 at 5:23 pm Said:

    David,

    I think there is another problem with Wolfram processing of input: the answers it returns are misleading.

    I tried ’surname Sopek’ and it told me (in)famous “Wolfram|Alpha isn’t sure what to do with your input”. So I tried ’surname Weinberger’ and it get the same !!! However ’surname Kumar’ or ’surname Smith’ or ’surname Johnson’ get what we should get.

    I found this problem before on other queries as well.
    In essence – you just do not know if you query was badly formatted or result is not possible to compute.

    I think this could be easily corrected by Wolfram, but -
    I agree – in principle Wolfram was not able to invent/practice a pidgin dialect for its, possibly innovative service – what was clearly the great step made by “classical” search engines.

  2.  

  3. Brad, on August 14th, 2009 at 10:04 pm Said:

    At least you don’t get this:

    “Interpreting “Turcotte” as “turcite”

  4.  

  5. Gardner, on August 16th, 2009 at 2:19 pm Said:

    I passed on your query . . . not that you didn’t . . . along with my own requests for both a vocabulary and a “grammar incorrect”/”information not in db” response, so Wolframs love of stats has now near doubled your chance of a solution.
    G

  6.  

Leave a Reply


Web Joho only

 

Entries (RSS)
Copy this link as RSS address

Comments (RSS).

Creative Commons License
Joho the Blog by David Weinberger is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. Share it freely, but attribute it to me, and don't use it commercially without my permission.

Joho the blog uses WordPress blogging software.
Thanks, WordPress!