Joho the Blog
|
|
|
January 15, 2006
North Carolina State University now lets you search its libraries' collections using faceted classification. Go to the libraries' search page and search for any term. For example: On the search page, enter "military music" (no quotes). It returns a straightforward list of works about military music. But, on the left of the search page are the facets under which the works on the returns list are classified, e.g. Topic, Format, etc. Under each of those is a list of the types available under that facet. E.g., Under Format we see there are 146 books and 16 DVDs about military music. (Think of a facet as a column in a database and the types as the contents of the cells in that column: At an online music site, a column might be Genre and the types might be rock, jazz, classical, etc.) Under the Topic facet, click on "United States." You are now shown a list of all the holdings about military music and the US. Notice that the list of facets on the left has changed. For example, DVDs have vanished from the Format facet because the libraries have no DVDs about US miliitary music. Faceted classification systems don't show you impossible or irrelevant options. No dead ends, no branches without fruit. Under the Era facet, click "20th Century." We're down to three results and the facet list has narrowed dramatically. For example, the Format facet only shows Books and E-Books. Under the Genre facet, click on "Songs and music." We're down to three results and the facet list has narrowed dramatically. The Format facet is entirely gone because all three results are books. Suppose, however, you decide the list of returns is too small. You're willing to consider books about more than just the US. Toward the top of the page, on the left, the NCSU site shows you the facets you've selected already, with a little red X box next to each. Click on the X next to "United States," removing it as a selection criterion. Not only does the list of returns expand, now a bunch of facets are back because they're legitimate choices. Notice that you don't have to walk back up the tree in the order in which you created it. Why is this a big deal? Unlike parametric searches that let you enter specifications for your search, a faceted search doesn't simply apply search criteria. Instead, a faceted classification system — in this instance, called a "guided navigation" system by Endeca, the company behind this implementation — the browsable interface changes with every choice so that it never shows you parameters that would result in an empty results list. So, you don't have to keep randomly banging on it and then backing up, trying to find the one book you want, or the one left-threaded, chrome-plated, 15mm, philips-headed, round-capped screw you need as you build the specs for your new aircraft engine. And when it turns out there are no screws exactly like that, you can decide you could do without the chrome-plating, or the philips-headedness, until you get something that works. Endeca has a customer with a library of 25 million engineering parts for whom this type of interactive search is a tremendous time and money saver. As Endeca will tell you, there's another advantage as well: Faceted systems know a lot about their contents. That's why they're able to show you how many entries there are in each branch before you click on it. Endeca uses this information to build data reporting systems that let you click facets in and out, interactively revealing patterns that might otherwise have been hard to find. (Non-disclosure: I am not involved with Endeca, although I'm using them as an example in my book because I think what they're doing is way cool.) (Thanks to Peter Morville for the link, who also points to this thread about the implementation.) [Tags: facetedClassification taxonomy EverythingIsMiscellaneous libraries endeca] Posted
by D. Weinberger at January 15, 2006 10:08 AM
|
Comments
Interesting search capability at NC State...I've been trying to locate a book from the early 1980s, can't remember the author or the title...only remember that it is about cliches and how they develop...I entered the keyword cliche and their search engine "corrected" it to cloche...real useful...
Posted by: Larry Irons | January 15, 2006 11:30 AM
Look up Cluetrain.
Depressing.
Posted by: Doc Searls | January 15, 2006 09:24 PM
Hey Doc, yea it changes it to clue train, finds four books and then returns cluetrain manifesto as the fifth...seems to have a mind of its own until proven wrong...
Posted by: Larry Irons | January 15, 2006 11:35 PM
Seems overwhelming to the common user. I'm sure the librarians love it.
Posted by: Davezilla | January 16, 2006 02:55 PM
It's cluttered, but it's pretty fast and the faceted stuff is interesting. If I was NCSU/Endeca I'd improve the stemming (see Cluetrain above) and hide more of the clutter behind an 'Advanced Search' option a la Google... simplicity is what 80% of users want. At any rate, it's a start and I'm sure things will get better.
Posted by: Adster | January 16, 2006 05:17 PM
'cliche' does not occur in any of the titles that NCSU holds, so it is corrected to something which is held by NCSU. That's one of the nice features of Endeca, unlike other dictionaries in OPACs which simply suggest words which are equally unlikely to be found, it only corrects or offers "did you mean..." if the data support it.
=======
Try "cluetrain" in quotes and you will find what you're looking for. Endeca corrects spellings based on actual keywords in the data when there are fewer than 3 hits on the words in the query. And the one you're looking for is still in the results without quotes. Hardly depressing.
========
Libarrians and users alike love it. Yes, we have more work to do, but for now, it is unlike any other online catalog, so the territory is somewhat uncharted. Unchartered but hardly depressing. (I tried to relate this part off-blog, but Mr. Irons emailer only works if I have an i-name too, which I don't, so I could not reach him directly.)
Cheers,
Andrew
Posted by: Andrew Pace | January 17, 2006 06:34 AM
Andrew, you can reach me at lirons@charter.net
I would simply note that needing quotes around a single word cuts against folk knowledge about how to search...people are not accustomed to needing to specify that one word they are search for is in fact one word...good luck with your users...
Posted by: Larry Irons | January 17, 2006 11:26 AM
of course, I meant "searching for" in that last post...
Posted by: Larry Irons | January 17, 2006 11:27 AM
My company is working with Endeca as well. In our implementation, we're offering 4 faceted points of entry into the browse schema. I found NCSU's implementation difficult to get started on because it allows only one entry point (LCSH), and if a user is not terribly familiar with LCSH, it can be difficult to decide where to start. I wonder if increasing available facets would increase use. I feel pushed into search because LCSH is so kludgey.
Posted by: Melissa Riesland | January 17, 2006 11:54 AM
I agree in re cluetrain. The SE should bring up cluetrain as cluetrain, and then if it wants to be helpful, present other results as suggestions. Endeca is slick and it ain't cheap; it oughter be able to do that. Otherwise you get penalized for seeking unique items... as for quotes, since cluetrain is one word, who would think of that anyway?
NCSU, having decided to use faceting, has a problem. LCSH is the only faceting they have. The website I manage has a natural-language folksonomy generated by real people. I'm not bragging, just explaining. LCSH is the ultimate pig in a silk dress. You cannot make it better than what it is and it is not a collection-level language suitable for web browsing. I wrote about this at ALA Techsource this week.
I think what NCSU is doing is very cutting-edge, though maybe so much that they will bleed a little. Most OPACs can't do simple relevance searching, let alone stemming or spell-check. Tuning both requires time, patience, and expertise... and iteration.
As for the faceting being a bit busy... well, a usability expert I know had that immediate reaction. I did too. I imagine that some follow-on modifications will happen.
Posted by: K.G. Schneider | January 17, 2006 08:33 PM
Regarding K.G. Schneider's comment:
"NCSU, having decided to use faceting, has a problem. LCSH is the only faceting they have. "
This is incorrect. There are 10 facets currently in use in the NCSU Endeca implementation. Most of these facets are not LCSH. For example: "Library", "Language", "Format" and "Author" will appear for many searches in the lefthand column. Also at the top of the results there is a link that reads "Limit results to _currently available items_". This is an "Availability" facet.
Personally I don't think the LC classification block at the top is very useful for the types of searches that *I* do, but some people really like it. I like using the "Availability" and "Library" facets because it helps trim the results to a set I am interested in. Surely some users will not find the "Subject" facets useful, just as some users will not find the "Language" facets useful.
As for Melissa Riesland's comment:
"I found NCSU's implementation difficult to get started on because it allows only one entry point (LCSH), and if a user is not terribly familiar with LCSH, it can be difficult to decide where to start."
I don't think you are looking at the right page. Most of our users will go here:
http://www.lib.ncsu.edu/catalog/
...and enter some search terms. The LCSH will appear for you based on your search terms. Users will ignore them if they don't find it useful.
Regarding clutter, it is very well possible that we may remove some of these facets if we find that no one is using them. But you have to give them a try first. Refinement will follow based on some actual use data.
As for the comments regarding the "cluetrain" search, this is a matter of optimizing the auto-correction and relevance ranking features. Thanks for pointing the problem.
Tito
Posted by: Tito Sierra | January 18, 2006 12:51 PM
You're right, Tito, I was overlooking the other facets. The format facet is particularly nifty. I don't know what the usability tests showed, but I'd hazard faux-heuristically that the browse by format works well. The way most ILS's show format is so random that it's useless.
Author is also good, of course.
I do take issue with this statement: "Users will ignore [x feature] if they don't find it useful." Where I work, found in usability testing that users get distracted and confused if there are too many features. Rather than ignore a feature, they might assign it too much importance.
Posted by: kgs
|
January 18, 2006 02:24 PM
In defense of Endeca, there are a multitude of configuration options that can be explored. We still have tweaking left to do.
The 'cliche' example is a case in point. The reason the search returned no valid cliche results originally (and performed the auto-correction) is because it's actually spelled 'cliché' (note the diacritic) in our records. Endeca was differentiating between cliche with and without the diacritic. With a bit of investigation, we found that it could be configured to equate common latin-1 characters and their ascii equivalents when searching. If you retry your search, you'll see lots of useful results. Thanks for pointing this out!
As for the 'cluetrain' example, I suspect that a few tweaks to our relevance ranking algorithms could improve that situation, as well. Don't give up on the product because all these options haven't been explored yet!
Emily
Posted by: Emily Lynema | January 19, 2006 08:37 AM
Directory navigation is giving way to a more search oriented interaction. Statistics show that users are becoming more and more sophisticated in exploiting multiple keyword features. Faceted navigation, or guided navigation, term on which Endeca has been granted a patent (24th of May 2006), is a natural form of interaction at this stage.
My experience is that more and more ecommerce sites are following the large library institutions in implementing this guided site search for their users. I guess its trickle down technology with google spearheading "The Search".
Posted by: Paul Salber | June 9, 2006 05:07 AM