Joho the Blogsearch Archives - Page 2 of 3 - Joho the Blog

July 17, 2009

Search matchups

Google vs. Yahoo

Google vs. WolframAlpha

Google vs. Bing

(via Keith Dawson)

[Tags: ]

6 Comments »

July 7, 2009

Free book on search interfaces

Berkeley’s Marti Hearst, who was way ahead of everyone else in faceted classification (e.g.,flamenco) , has written a a definitive book on user interfaces to search engines. And it’s up on the Web for free, if that’s the way you roll. Thanks, Marti!

[Tags: ]

4 Comments »

June 5, 2009

Bing, Google … and Kayak

I’ve been poking around Microsoft’s Bing. The short answer is that it’s not going to move me off of Google. Of course, my Google inertia is pretty much sleeping-hippopotamus-like at this point. Plus, Bing’s ripping off of Kayak.com (see below) has me pretty cheesed.

Bing does some useful and clever things. But, I think some of the coverage has actually undersold Google. For example, Hiawatha Bray in the Boston Globe, whose writing I like a lot, today opens his review with the clever idea of searching for “google” at Bing and for “bing” at Google. He says Bing gives you a concentrated dosage of stuff about Google, while Google is all over the map with its “bing” results. Well, sure! “Google” is a made-up word with only one dominant meaning, so of course Bing gives you concentrated Google goodness. But “Bing” has lots of meanings, so Google’s right to return a mix of bingy words…with Microsoft Bing as the top result. Now, it is true that, as Hiawatha says, Microsoft gives its “Google” results in convenient tabs about Microsoft the corporate entity as well as listing sub-pages within the google domain, while Google’s top return on “Microsoft” only gives you a set of sub-pages. Microsoft looks more like WolframAlpha in that regard, and that’s a good way to look. But, Google also recently added easier ways to refine and expand searches (by timeline, by WonderWheel), etc., as Hiawatha points out. So, it really depends on what you’re trying to do. As always. (Type MSFT into either and you’ll get similar boxed stock data.)

Hiawatha writes: “Say you want the latest weather or traffic data. Google will tell you where to get it. Bing will just give it to you.” Not exactly. Type “weather” into either site and you get your local weather at the top, in pretty much identical displays. Google’s been doing that for quite a while. Likewise, type an airline and flight number and Google will tell you if it’s on time. But the “traffic” trick doesn’t work for Google. For that you have to go to Google Maps and click on the “Traffic” button (assuming you’re signed in). I wonder how long it’ll take Google to add Bing’s way of responding.

When it comes to shopping, Bing has some very nice touches. Well, primarily it has faceted classification — like NewEgg.com, and also using Endeca‘s engine? — that lets you sort a big list based on multiple criteria, using any of them in any order. Also, Bing has separate ratings by users and experts. On the other hand, Google found many many more copies of “splinter cell double agent” for sale than Bing did.

As many have noted, Bing’s handling of video searches is stellar. Hover over any of the thumbnails and the thumbnail starts to play. But, as someone pointed out — sorry, I lost the link — there seems to be no way to keep users from turning off the adult filter, which means that every school and library now has the greatest multi-screen porn browser ever invented. You can browse for porn videos on Google, of course, but with Bing it’s like watching all of them all at once. Well, maybe this will be like catching a kid smoking and making him smoke an entire pack all at once.

And now we come to Bing’s travel searches. OMG. Bing blatantly ripped off Kayak.com. [Disclosure: I’m old friends with the Kayak folks.] Just take a look at this post. If you’re going to rip off an innovative design, then at least innovate on top of it! Grrrr…

[Tags: ]


Later that morning: I just came across a very amusing article in Ars Technica about the new Google Squared app that puts info into tables. It makes clear why some sites (e.g., WolframAlpha) are willing to pay the price to gain the benefits of curation. (via Lee Baker, Berkman summer intern)

16 Comments »

May 17, 2009

WolframAlpha’s big problem

After a day of poking at the awesome WolframAlpha and watching some of the reactions around the Web, a major problem has emerged. WA is fantastic if it has what you’re looking for. But if it doesn’t, it looks like it’s failed, as in: “What? It can’t tell me how much energy it would take to move Henry VIII one kilometer, expressed in cheeseburger-calories? What a piece of crap!”

Google doesn’t have this problem. If you get no hits, it’s almost always because you’ve so egregiously mistyped something that no one else on the planet has ever posted anything with that same typo. Or, it’s because you’ve put an odd phrase in quotes, which requires taking the special action of, well, putting things in quotes. Almost always, Google succeeds at what it does (find pages that contain particular text), even when it fails at doing what you want (find a particular answer).

WolframAlpha, on the other hand, is like a roomful of idiot savants. Each knows a scary amount about a topic. And, unlike a such a roomful, WA also knows how to recombine and compute what each of the savants knows. But if the room doesn’t have the savant you’re looking for, you get back nothing but a “Huh?”

The eclecticism of WolframAlpha is its selling point. But the delight that it knows things you would never have guessed at means that you can have trouble guessing what it knows about. The question is whether general users will go back enough times to be trained on the sorts of questions it can answer. If not, WA will remain an awesome tool for specialists but will not become the broad, general-purpose tool it wants to be.

It would, however, be a completely awesome addition to Google…a path I suspect Stephen Wolfram does not want to take.

[Tags: ]

10 Comments »

May 7, 2009

Wolfram podcast

My interview with Stephen Wolfram about WolframAlpha is now available. Some other me-based resources:

The unedited version weighs in at a full 55 minutes. The edited version will spare you some of my throat-clearing, and some dumb questions.

A post about what I think the significance of WolframAlpha will be.

Live blog of Wolfram’s presentation at Harvard.

Wolfram’s presentation at Harvard.

[Tags: ]

Comments Off on Wolfram podcast

May 4, 2009

How important is WolframAlpha?

The Independent calls WolframAlpha “An invention that could change the Internet forever.” It concludes: “Wolfram Alpha has the potential to become one of the biggest names on the planet.”

Nova Spivak, a smart Semantic Web guy, says it could be as important as Google.

Ton Zijlstra, on the other hand, who knows a thing or two about knowledge and knowledge management, feels like it’s been overhyped. After seeing the video of Wolfram talking at Harvard, Ton writes:

No crawling? Centralized database, adding data from partners? Manual updating? Adding is tricky? Manually adding metadata (curating)? For all its coolness on the front of WolframAlpha, on the back end this sounds like it’s the mechanical turk of the semantic web.

(“The mechanical turk of the semantic web.” Great phrase. And while I’m in parentheses, ReadWriteWeb has useful screenshots of WolframAlpha, and here’s my unedited 55-minute interview with Wolfram.)

I am somewhere in between, definitely over in the Enthusiastic half of the field. I think WolframAlpha [WA] will become a standard part of the Internet’s tool set, but is not transformative.

WA works because it’s curated. Real human beings decide what topics to include (geography but not 6 Degrees of Courtney Love), which data to ingest, what metadata is worth capturing, how that metadata is interrelated (= an ontology), which correlations to present to the user when she queries it (daily tonnage of fish captured by the French compared to daily production of garbage in NYC), and how that information should be presented. Wolfram insists that an expert be present in each data stream to ensure the quality of the data. Given all that human intervention, WA then performs its algorithmic computations … which are themselves curated. WA is as curated as an almanac.

Curation is a source of its strength. It increases the reliability of the information, it enables the computations, and it lets the results pages present interesting and relevant information far beyond the simple factual answer to the question. The richness of those pages will be big factor in the site’s success.

Curation is also WA’s limitation. If it stays purely curated, without areas in which the Big Anyone can contribute, it won’t be able to grow at Internet speeds. Someone with a good idea — provide info on meds and interactions, or add recipes so ingredients can be mashed up with nutritional and ecological info — will have to suggest it to WolframAlpha, Inc. and hope they take it up. (You could to this sorta kinda through the API, but not get the scaling effects of actually adding data to the system.) And WA will suffer from the perspectival problems inevitable in all curated systems: WA reflects Stephen Wolfram’s interests and perspective. It covers what he thinks is interesting. It covers it from his point of view. It will have to make decisions on topics for which there are no good answers: Is Pluto a planet? Does Scientology go on the list of religions? Does the page on rabbits include nutritional information about rabbit meat? (That, by the way, was Wolfram’s example in my interview of him. If you look at the site from Europe, a “rabbit” query does include the nutritional info, but not if you log in from a US IP address.) But WA doesn’t have to scale up to Internet Supersize to be supersized useful.

So, given those strengths and limitations, how important is WA?

Once people figure out what types of questions it’s good at, I think it will become a standard part of our tools, and for some areas of inquiry, it may be indispensable. I don’t know those areas well enough to give an example that will hold up, but I can imagine WA becoming the first place geneticists go when they have a question about a gene sequence or chemists who want to know about a molecule. I think it is likely to be so useful within particular fields that it becomes the standard place to look first…Like IMDB.com for movies, except for broad, multiple fields, with the ability to cross-compute.

But more broadly, is WA the next Google? Does it transform the Internet?

I don’t think so. Its computational abilities mean it does something not currently done (or not done well enough for a crowd of users), and the aesthetics of its responses make it quite accessible. But how many computational questions do you have a day? If you want to know how many tons of fish France catches, WA will work as an almanac. But that’s not transformational. If you want to know how many tons divided by the average weight of a French person, WA is for you. But the computational uses that are distinctive of WA and for which WA will frequently be an astounding tool are not frequent enough for WA to be transformational on the order of a Google or Wikipedia.

There are at least two other ways it could be transformational, however.

First, its biggest effect may be on metadata. If WA takes off, as I suspect it will, people and organizations will want to get their data into it. But to contribute their data, they will have to put it into WA’s metadata schema. Those schema then become a standard way we organize data. WA could be the killer app of the Semantic Web … the app that gives people both a motive for putting their data into ontologies and a standardized set of ontologies that makes it easy to do so.

Second, a robust computational engine with access to a very wide array of data is a new idea on the Internet. (Ok, nothing is new. But WA is going to bring this idea to mainstream awareness.) That transforms our expectations, just as Wikipedia is important not just because it’s a great encyclopedia but because it proved the power of collaborative crowds. But, WA’s lesson — there’s more that can be computed than we ever imagined — isn’t as counter-intuitive as Wikipedia’s, so it is not as apple-cart-upsetting, so it’s not as transformational. Our cultural reaction to Wikipedia is to be amazed by what we’ve done. With WA, we are likely to be amazed by what Wolfram has done.

That is the final reason why I think WA is not likely to be as big a deal as Google or Wikipedia, and I say this while being enthusiastic — wowed, even — about WA. WA’s big benefit is that it answers questions authoritatively. WA nails facts down. (Please take the discussion about facts in a postmodern age into the comments section. Thank you.) It thus ends conversation. Google and Wikipedia aim at continuing and even provoking conversation. They are rich with links and pointers. Even as Wikipedia provides a narrative that it hopes is reliable, it takes every opportunity to get you to go to a new page. WA does have links — including links to Wikipedia — but most are hidden one click below the surface. So, the distinction I’m drawing is far from absolute. Nevertheless, it seems right to me: WA is designed to get you out of a state of doubt by showing you a simple, accurate, reliable, true answer to your question. That’s an important service, but answers can be dead-ends on the Web: you get your answer and get off. WA as question-answerer bookends WA’s curated creation process: A relatively (not totally) closed process that has a great deal of value, but keeps it from the participatory model that generally has had the biggest effects on the Net.

Providing solid, reliable answers to difficult questions is hugely valuable. WolframAlpha’s approach is ambitious and brilliant. WolframAlpha is a genius. But that’s not enough to fundamentally alter the Net.

Nevertheless, I am wowed.[Tags: ]

19 Comments »

April 29, 2009

Wolfram interview

The Berkman Center has posted the raw audio of my 55 minute interview with Stephen Wolfram, about his deeply cool WolframAlpha program (which he talked about here yesterday). On the other hand, if you wait a few days, you can skip some throat-clearing on my part, as well as my driving him down an alley based on my not seeing where WolframAlpha puts links to other pieces of information. As is so often the case, the edited version will be better.

[Tags: ]

Comments Off on Wolfram interview

April 28, 2009

[berkman] Stephen Wolfram – WolframAlpha.com

Stephen Wolfram is giving at talk at Harvard/Berkman about his WolframAlpha site, which will launch in May. Aim: “Find a way to make computable the systematic knowledge we’ve accumulated.” The two big projects he’s worked on have made this possible. Mathematica (he’s worked on it for 23 yrs) makes it possible to do complex math and symbolic language manipulation. A New Kind of Science (NKS) has made it possible that it’s possible to understand much about the world computationally, often with very simple rules. So, WA uses NKS principles and the Mathematica engine. He says he’s in this project for the long term.

NOTE: Live-blogging.Posted without re-reading

Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

You type in a question and you get back in answers. You can type in math and get back plots, etc. Type in “gdp france” and get back the answer, a graph of the history of the shows histogram of GDP.

“GDP of france / italy”: The GDP of France divided by the GDP of Italy

“internet users in europe” shows histogram, list of highest and lowers, etc.

“Weather in Lexington, MA” “Weather lexington,ma 11/17/92” “Weather lexington, MA moscow” shows comparison of weather and location.

“5 miles/sec” returns useful conversions and comparisons.

“$17/hr” converts to per week, per month, etc., plus conversion to other currencies.

“4000 words” gives a list of typical typing speeds, the length in characters, etc.

“333 gm gold” gives the mass, the commodity price, the heat capacity, etc.

“H2S04” gives an illustration of the molecule, as well as the expected info about mass, etc.

“Caffeine mol wt/ water” gives a result of moelcular weights divided.

“decane 2 atm 50 C” shows what decane is like at two atmospheres and at 50 C, e.g., phase, density, boiling point, etc.

“LDL 180”: Where your cholesterol level is against the rest of the population.

“life expctancy male age 40 italy”: distribution of survival curve, history of that life expectancy over time. Add “1933” and adds specificity.

“5’8″ 160 lbs”: Where in the distribution of body mass index

“ATTGTATACTAA”: Where that sequence matches the human genome

“MSFT”: Real time Microsoft quote and other financial performance info. “MSFT sun” assumes that “sun” refers to stock info about Sun Microsystems.

“ARM 20 yr mortgage”: payment of monthly tables, etc. Let’s you input the loan amount.

“D# minor”: Musical notation, plays the D# minor scale

“red + yellow”: Color swatch, html notation

“www.apple.com”: Info about Apple, history of page views

“lawyers”: Number employed, average wage

“France fish production”: How many metric tons produced, pounds per second, which is 1/5 the rate trash is produced in NYC

“france fish production vs. poland”: charts and diagrams

“2 c orange juice”: nutritional info

“2 c orange juice + 1 slice cheddar cheese”: nutritional label

“a__a__n”: English words that match

“alan turing kurt godel”: Table of info about them

“weather princeton, day when kurt godel died”: the answer

“uncle’s uncle’s grandson’s grandson”: family tree, probabiilty of those two sharing genetic material

“5th largest country in europe”

“gdp vs. railway length in europe”:

“hurricane andrew”: Data, map

“andrew”: Popularity of the name, diagrammed.

“president of brazil in 1922”

“tide NYC 11/5/2015”

“ten flips 4 heads”: probability

“3,7,15,31,63…”: Figures out and plots next in the sequence and possible generating function

“4,1 knot”: diagram of knot

“next total solar eclipse chicago”: Next one visible in Chicago

“ISS”: International Space Station info and map

It lets you select alternatives in case of ambiguities.

“We’re trying to compute things.” We have tools that let us find things. But when you have a particular question, it’s unlikely that you’ll find that specific answer written down. WA therefore tries to compute answers. “The objective is to reach expert level knowledge across a very wide range of domains.”

Four big pieces to WA:

1. Data curation. WA has trillions of people of curated data. It gets it from free data or licensed data. Partially human partially automated system cleans it up and tries to correlate it. “A lot can be done automatically…At some point, you need a human domain expert in the middle of it.” There are people inside the company and a network of others who do the curation.

2. The algorithms. Take equations, etc., from all over. “There are finite numbers of methods that have been discovered in the history of science.” There are 5-6 millions lines of Mathematica code at work.

3. Linguistic analysis to understand the inputs. “There’s no manual, no documentation. You get to interact it with just how you think about things.” They’re doing the opposite of natural language processing which usually tries to understand millions of pages. WA’s problem is mapping a relatively small set of short human inputs to what the system knows about. NKS helps with this. It turns out that ambiguity is not nearly as big a problem as we thought.

4. Automated presentation. What do yo show people so they can cognitively grasp it? “Algorithmic presentation technology … tries to pick out what is important.” Mathematica has worked on “computational aesthetics” for years.

He says that have at least a reasonable start on about 90% of the shelves in a typical reference library.

Q: (andy orem) What do you do about the inconsistencies of data? We don’t know how inconsistent it was and what algorithms you used.
A: We give source info. “We’re trying to create an authoritative source for data.” We know about ranges of values; we’ll make that information available. “But by the time you have a lot of footnotes on a number, there’s not a lot you can do with that number.” “We do try to give footnotes.”

Q: How do you keep current?
A: Lots of people want to make their data available. We hope to make a streamlined, formalized way for people to contribute the data. We want to curate it so we can stand by it.

Q: [me] Openness? Of API, of metadata, of contributions of interesting comparisons, etc.
A: We’ll do a variety of levels of API. First: presentation level: put output on their pages. Second, XML-level so people can mash it up. Third level: individual results from the databases and from the computations. [He shows a first draft of the api] You can get as the symbolic expressions that Mathematica is based on. We hope to have a personalizable version. Metadata: When we open up our data repository mechanisms so people can contribute, some of our ontology will be exposed.

How about in areas where people disagree? If a new universe model comes out from Stanford, does someone at WolframAlpha have to say yes and put it in?
A: Yes
Q: How many people?
A: It’s been 150 for a long time. Now it’s 250. It’s probably going to be a thousand people.

Q: Who is this for?
A: It’s for expert knowledge for anyone who needs it.

Q: Business model?
A: The site will be free. Corporate sponsors will put ads on the side. We’re trying to figure out how to ingest vendor info when it’s relevant, and how to present it on the site. There will also be a professional version for people who are doing a lot of computation, want to put in their own data…

Q: Can you define the medical and population databases to get the total mass of people in England.
A: We could integrate those databases, but we don’t have that now. We’re working on “splat pages” you get when it doesn’t work. It should tell you what it does know.

Q: What happens when there is no answer, e.g., 55th largest state in the US?
A: It says it doesn’t know.

Q: [eszter] For some data, there are agreed-upon sources. For some there aren’t. How do you choose sources?
A: That’s a key problem in doing data curation. “How do we do it? We try to do the best job we can.” Use experts. Assess. Compare. [This is a bigger issue than Wolfram apparently thinks where data models are political. E.g., Eszter Hargittai, who is sitting next to me, points out “How many Internet users are there?” is a highly controversial question.] We give info about what our sources are.

Q: Technologically, where do you want to focus in the future?
A: All 4 areas need to be pushed forward.

Q: How does this compare to the Semantic Web?
A: Had the Web already had been semantically tagged, this product would have been far far easier, although keep in mind that much of the data in WA comes from private databases. We have a sophisticated ontology. We didn’t create the ontology top-down. It’s mostly bottom-up. We have domains. We have ontologies for them. We merge them together. “I hope as we expose some of our data repository methods, it will make it easier to do some Semantic Web kind of things. People will be able to line data up.”

Q: When can we look at the formal specifications of these ontologies? When can we inject our own?
A: It’s all represented in clean Mathematica code. Knitting new knowledge into the system is tricky because our UI is natural language, which is messy. E.g., “There’s a chap who goes by the name Fifty Cent.” You have to be careful.

Q: What reference source tells you if Palestine exists…?
A: In cases like this, we say “Assuming Case A or B.” There are holes in the data. I’m hoping people will be motivated to fill them in. Then there’s the question of the extent to which we can build expert communities. We don’t know the best way to do this. Lots of interesting ideas.

How about pop culture?
A: Pop culture info is much shallower computationally. (“Britney Spears” just gets her name, birthdate, and birthplace. No music, no photos, nothing about her genre, etc.) (“Meaning of life” does answer “42”)

Q: Compare with CYC? (A common sense reasoning system)
A: CYC deals with human reasoning. That’s not the best method for figuring out physics, etc. “We can do the non-human parts of reasoning really well.”

Q: [couldn’t hear the question]
A: The best way to debug it is not necessarily to inspect the code but to inspect the results. People reading code is less efficient than automated systems.

Q: Will it be integrated into Mathematica?
A: A future version will let you type WA data into Mathematica.

Q: How much work do you have to do on the NLP sound? Your searches used a special lexicon…
A: We don’t know. We have a daily splat call to see what types of queries have failed. We’re pretty good at removing linguistic fluff. People drop the fluff pretty quickly after they’ve been using WA for a while.

Q: (free software foundation) How does this change the landscape for open access? There’s info in commercial journals…
A: When there’s a proprietary database, the challenge is making the right deals. People will not be able to take out of our system all the data that we put into it. We have yet to learn all of the issues that will come up.

Q: Privacy?
A: We’re dealing with public data. We could do people search, but, personally, I don’t want to.

Q: What would you think of a more Wikipedia-like model? Do you worry about a competitor making a wiki data that is completely open and grows faster?
A: That’d be great. Making WA is hard. It’s not just a matter of shoveling data in. Wikipedia is fantastic and I use it all the time, but it’s gone in particular directions. When you’re looking for systematic data there, even if people put in systematic data — e.g., 300 pages about chemicals — over the course of time, the data gets dirty. You can’t compute from it.

Q: How about if Google starts presenting your results in response to queries?
A: We’re looking for synergies But we’re generating these on the fly; it won’t get indexed.

Q: I wonder how universities will find a place for this.
A: Very interesting question. Generating hard data is hard and useful, although universities often prefer higher levels of synthesis and opinion. [Loose paraphrase!] Leibniz had this nailed: Take any human argument and find a way to mechanically compute it. [Tags: ]

11 Comments »

April 16, 2009

WolframAlpha alpha

Seb Schmoller went to a webinar put on by Stephen Wolfram about the upcoming WolframAlpha search engine (well, answering engine) and came away impressed…

[Tags: ]

2 Comments »

March 8, 2009

Wolfram computes it all

Stephen Wolfram is promising “A new paradigm for using computers and the web.” It involves “a mixture of many clever algorithms and heuristics, lots of linguistic discovery and linguistic curation, and what probably amount to some serious theoretical breakthroughs.” He doesn’t lay it out explicitly, but says “…armed with Mathematica and NKS I realized there’s another way: explicitly implement methods and models, as algorithms, and explicitly curate all data so that it is immediately computable.”

Wolfram is very very very very very smart. No one doubts that. He’s smart enough that he would not be posting and hyping this site unless there’s something there. I don’t understand it, but, frankly I’m looking forward to it.

The site is called WolframAlpha, and it opens in May.

[Tags: ]

7 Comments »

« Previous Page | Next Page »