March 3, 2005
Trees and tags - An introduction: What are taxonomies, tags, faceted classification, folksonomies...? And do they matter?
My life as a Berkperson: I've been at the Harvard Berkman Center since last summer. Here's what it's like.
Larry Summers and the Web as world: The blogosphere practically demands that Harvard-related bloggers say something — something! — about their President's comments...and that's evidence that the Web is a world, not just a medium
Low-Value, Repetitive Issue
I'm away at conferences most of March, so I figured I'd take the opportunity to put out this issue even though the first two articles in it already ran on my blog. Plus, the first one is on the topic I've been harping on and the second is so positive that you might want to suck on a horseradish root while you read it.
(BTW, the current issue of Harvard Business Review (March) is running an essay of mine about why businesses are finding benefit in keeping their information miscellaneous...yes, it's the same old thing.)
Nightline Thursday March 8
We've been told that ABC's Nightline — the on-its-way-out Ted Koppel vehicle — is going to air a piece on blogging including footage taped at the Thursday night blogging confabulation at the Berkman Center. Because the cameras were there, we all tried to be extra-special quotable.
The ABC folks came in thinking they were doing a story about how bloggers have knocked down media idols — the Hit Squad view of blogging. If a broader picture of blogging comes across, then I'll count the Thursday session as successful. (Here's my live blogging of the event. And if you can't wait for Nightline, Steve Garfield, a video blogger, did his own excellent 5-minute version of the session.)
Trees and tags - Introduction
This is the introductory section of the new issue of Esther Dyson's Release 1.0. The article goes on to talk about some companies doing interesting things in this area, including Yahoo, Corbis, ClearForest, Chandler, the Dewey Decimal Classification system, Endeca, Siderean, NYTimes.com, del.icio.us, Flickr, Wikipedia, frassle and Technorati. If you'd like the issue, you can buy it here. (It's $80). If you're feeling flush, you can subscribe here. And if you are flush and have a few free days, I'll see you at PCForum, one of the best conferences around. (Thanks to Esther and Christina Koukkos for permission to post this.)
The Three Orders
The narrative that tells of the first man and woman encountering the tree of knowledge focuses on its tempting fruit. But after we took the bite, we apparently looked up and got the idea that knowledge is shaped like the tree's branching structure: Big concepts contain smaller ones that contain smaller ones yet. Over the millennia, we have fashioned the structures of knowledge in just such tree-like ways, from the departmental organization of universities (liberal arts contains history and history contains ancient Chinese history) to the hierarchy of species. The idea that knowledge is shaped like a tree is perhaps our oldest knowledge about knowledge.
Now autumn has come to the forest of knowledge, thanks to the digital revolution. The leaves are falling and the trees are looking bare. We are discovering that traditional knowledge hierarchies that have served us so well are unnecessarily restricted when it comes to organizing information in the digital world. The principles of organization themselves are changing now that they are being freed from the constraints of the physical world. For example:
In the physical world, a fruit can hang from only one branch. In the digital world, objects can easily be classified in dozens or even hundreds of different categories.
In the real world, multiple people use any one tree. In the digital world, there can be a different tree for each person.
In the real world, the person who owns the information generally also owns and controls the tree that organizes that information. In the digital world, users can control the organization of information owned by others. (Exception to the rule: Westlaw owns the standard organization of case law even though the case law itself is in the public domain.)
These differences are so substantial that we can think of intellectual order as entering a third age. In the first, we organized the things themselves: We put books on shelves and silverware into drawers. In the second, we physically separated the metadata from the data: We built card catalogs and drew diagrams. In the third, the data and the metadata are digital, untying organization from the strictures of the physical world. In response, we are rapidly inventing new principles and tools of organization. When it comes to innovation on the Internet, metadata is becoming the new content.
But traditional taxonomic trees aren't something we can throw away without a thought. They are an amazingly efficient way of organizing complexity because they enable us to focus on one aspect (e.g., that's an apple) while keeping a universe of context (it's a fruit, part of a plant, a type of living thing) in the background, ready for access. Tree structures are built into our institutions. They may even be built into our genes. So we are in a confusing and fertile period as we try to sort out what works and what doesn't. Without trees, how would we organize college curricula, business org charts, the local library, and the order of species? How will we organize knowledge itself?
We may be on the path to finding out.
Webogeny recapitulates ontogeny
The tree of knowledge has roots, of course. They go back to Aristotle, who figured out how knowledge could be nested without having to claim that the container (say, the concept of human-ness) is the same sort of thing as what it contains (all existing humans). The individual items in a hierarchy inherit the properties of all the categories above it, so that if you know that Alcibiades is a human, you also know that he is a mammal and an animal. Inheritance provides a context by which the individual accretes the accumulated wisdom of the tree just by hanging on a particular branch — an amazingly efficient way of expressing knowledge.
Five hundred years later the Syrian philosopher Porphyry first drew Aristotle's system of nested concepts as a tree. That notion stuck, implicitly endorsed by Carl Linnaeus and Charles Darwin in the sciences, Francis Bacon in philosophy, and by libraries and academic departments just about everywhere.
The next stop in this story is Postmodernism's insistence that trees of knowledge are reflections of particular cultural assumptions and, importantly, conflate knowledge and power. You can't read Michel Foucault's The Order of Things and believe that order itself has no history. And not just French philosophers have given up on the old dream of finding a single, universal, comprehensive way of organizing the world's knowledge. You can't come out of Geoffrey C. Bowker and Susan Leigh Star's study of the International Classification of Diseases, Sorting Things Out, thinking that classification systems are value-free and objectively true. Nor can you look at the US Census' 2000 decision to expand the number of possible races without seeing that taxonomies can have enormous political and budgetary consequences.
The brief history of the Web has recapitulated Western culture's ontogeny of trees. Yahoo!'s directory tree became the early center of the Web, each leaf hand-selected and placed into categories designed initially by two computer science grad students at Stanford. But text search engines — AltaVista, HotBot, Google — dethroned Yahoo! as the Monarch of Search, and Yahoo! in turn has moved its browsable tree below the fold on its home page.
When text search isn't the right solution — for example, at e-commerce sites where people may not know the names of the products they're looking for — a more dynamic way of creating and presenting trees, called faceted classification, is coming into its own. Invented in the early 1930s by Shiyali Ranganathan, an Indian librarian, it applies a pre-defined set of parameters (or facets) to its objects. For example, watches might have facets such as manufacturer, digital or analog, men's or women's, price, and electric or spring-driven. Some facets are a set of possible values (such as a pick-list of available manufacturers); others are a range of numerical values (such as price range). Users can then browse by selecting first on, say, digital or analog and then by price, or first by price and then by men's or women's. Users can drill down as they do with a normal tree, but the arrangement of the branches is dynamic and reflects the users' interests, not the store's. The store may not like it that you've routed around the $25,000 Rolex they're offering on sale for a mere $24,000, but you've found your $50, waterproof, analog watch much faster.
Faceted classification still presents users with a hierarchical tree, making it easy for them to browse to what they want. But unlike traditional trees, faceted systems don't decide beforehand how the branches are arranged. For example, if an ice cream stand organized its "customer experience" around a traditional hierarchical taxonomy — a tree — it might have a customer first choose between two flavors, then among three sizes, and finally between a cup or cone. There are 12 potential paths and exactly one path to a large cup of chocolate ice cream. In a faceted system, you could browse first by flavor, size, or container, resulting in 36 potential paths and three ways of getting to your large cup of chocolate. Faceted systems, like trees, enable users to navigate by continually focusing their interests, but users get to decide how their interests are structured. This makes faceted systems very useful where there are lots of items with easily specifiable properties and users whose ways of browsing are difficult to predict, such as a parts catalog.
The long tail of tags
Tags have become the meme of the year, at least so far, writing another chapter in the history of classification systems. Tagging is an old idea, but it seems to be taking off now because some applications provide end-users with immediate benefits. For example, at del.icio.us, users enter bookmarks (URLs) they want to remember, adding a word or two — tags — so they can sort them later. Del.icio.us users can see not only everyone else's bookmarks, but also all the bookmarks tagged with a particular word. For example, if you care about Emily Dickinson, you can see all the Web pages del.icio.us users have tagged with "Dickinson" or "Emily Dickinson," a great tool for researchers.
Traditionally, people have been loath to attach metadata to objects, because it felt like a chore without immediate benefit. At del.icio.us and other sites such as Flickr, a photo-sharing site, there is a strong social benefit to tagging: We get to contribute to, and benefit from, the tagging done by others. To lower the hurdle and encourage tagging, both sites allow us to type in any word we want, rather than forcing us to navigate some hierarchical, controlled vocabulary. Of course, that also makes it far harder to find relevant objects: There's no immediate way to tell whether a photo tagged with "apple" shows a fruit or a computer. Plus, a search for photos tagged with "apple" will miss relevant photos tagged as "GrannySmith."
Tags are a break from previous ways of categorizing. Both trees and faceted systems specify the categories, or facets, ahead of time. They both present users with tree-like structures for navigation, letting us climb down branches to get to the leaf we're looking for. Tagging instead creates piles of leaves in the hope that someone will figure out ways of putting them to use — perhaps by hanging them on trees, but perhaps creating other useful ways of sorting, categorizing and arranging them.
Even in these early days of tagging, we're seeing self-organizing taxonomies emerge from the piles. For example, if you're tagging a page about an Apple computer, you may notice that far more people use the tag "Mac" than "Macintosh." So, if you want lots of people to find the page, you will tag it "Mac." By using that tag, you have also increased the popularity and momentum of the "Mac" tag. The resulting bottom-up clusters of tags has been called a folksonomy. (It's also been called a "tagsonomy," but that's harder to differentiate from "taxonomy" when spoken aloud.)
Folksonomies stand in sharp contrast to both trees and faceted systems. First, folksonomies tend to be clusters of tags, not hierarchies: There's a pile of "apple" tags and another pile of "GrannySmith" tags, but the folksonomy may not recognize that the latter is a subset of the former. Hierarchies can sometimes be derived from folksonomies, but they don't have to be. Second, trees and faceted systems are designed ahead of time, usually by information professionals. Folksonomies grow organically. Third, trees and faceted systems are usually owned and controlled by the people who own the information being organized, whereas folksonomies are (so far) unowned and not centrally controlled. Fourth, trees and faceted systems drive out ambiguity. For example, take a page that in a tagging system carries the ambiguous tag "apple." In a tree or faceted system, the branch it hangs from would tell you whether the page is about computers or fruit — inheritance at work. Tagging systems are inherently ambiguous. Trees are neat; piles of leaves are messy.
Because of these differences, the three approaches are useful in different circumstances:
Because they are unambiguous, trees work well where information can be sharply delineated and is centrally controlled. Users are accustomed to browsing trees, so little or no end-user training is required. But trees are expensive to build and maintain and require the user to understand the subject area well: How do you find the recipe for bread soup if you don't know to look in the "Tuscan Cooking" category?
Faceted systems work splendidly where an application is being used by such a wide range of users that no one tree going to match everyone's way thinking. They are also easier to maintain than trees because adding a new item requires only filling in the information about facets, rather having to make a decision about exactly which category it should go into.
Tagging systems are possible only if people are motivated to do more of the work themselves, for individual and/or social reasons. They are necessarily sloppy systems, so if it's crucial to find each and every object that has to do with, say, apples, tagging won't work. But for an inexpensive, easy way of using the wisdom of the crowd to make resources visible and sortable, there's nothing like tags.
The craft of creating and maintaining trees and faceted systems is well advanced and well understood. Businesses have been built around them. But we don't yet know the outcome of the current infatuation with tags. The potential is real: If tag-mania continues, it will provide a layer of new metadata, generated by humans for other humans, that will invoke innovation and businesses — and problems — we necessarily cannot anticipate.
My life as a Berkperson
I struggled to make this a hard-hitting piece that rips the lid off Harvard Law's Berkman Center for the Internet and Society. But the fact is that I'm really happy here, and my fellowship was renewed before I published this. It's a stimulating and kind environment. So, you'll have to make up your own snarky comments.
Before I applied for a Berkman fellowship, I had to ask John Palfrey and Ethan Zuckerman, neither of whom I knew, a whole bunch of damn fool questions. I had no living sense of what it meant to be a Berkman fellow. Do you drink sherry at 4? Just how witty is the banter? Would I get a discount on ascots?
I've been a fellow since July. Here's what it's like.
The Berkman Center for the Internet and Society is a Center within Harvard Law. The professors affiliated with it are all with the Law School, and so are many of the students who take part in the various activities, but I find the overall interests have more to do with policy than law; I spend little of my time listening to lawyers discussing cases in an argot I don't understand.
When you apply for a fellowship, you have to state what project you want supported, and that determines what your activities will be. The site lists five project areas, each prefixed with the word "open": Law, governance, education, commerce and content. Some of the actual projects underway are:
Documenting Internet "filtering" (=censorship) by various governments
Trying to increase international awareness in the blogosphere by facilitating "bridge bloggers"
Encouraging and facilitating the growth of blogs in rights-challenged countries
Aggregating information about all the groups aiming at establishing international governance of the Internet
Building software to encourage classroom cross-discipline and cross-border conversation
The Digital Media Project, looking at the legal, social and economic effects of five possible "scenarios" describing the development of digital media tech and law
The Center combines research and advocacy, which is always a tough balance. While the Center doesn't enunciate official stands on issues, it comes down consistently in favor of keeping the Internet an open space for ideas and innovation.
What it's like
The Berkman Center has its own house, a three-story Victorian on Mass Ave a few blocks (but on a cold day, a very long few blocks) from Harvard Square. It's a funky place, furnished with a dog pound of furniture, just the way your college apartment was. There's not a lot of space, so only a few people have offices there. The rest of us come in as appropriate and hang around the small-ish downstairs meeting room or perhaps grab a spare computer in a hallway or cranny. (You've gotta like a house with crannies.) I have a home office, so I don't come into the Center to write. I come to hang out with people.
Last year, the Center started a new semi-policy: Tuesdays are fellows days. That's the day to show up. In the morning, fellows hang out in the downstairs meeting room around a table. There are bagels, fruit and coffee, and no topic. It's usually only a handful of us. I think I most see Rebecca MacKinnon, Ethan Zuckerman, Zephyr Teachout, Mary Rundle, Derek Bambauer, Henrik Schneider and Wendy Koslow there. There's never a problem getting a conversation going. Jezoos Carruthers, I learn a lot.
Most Tuesdays there's a lunchtime speaker. It's in the same small room, often with an overflow crowd of twenty or so. The speakers range pretty much all over the lot, from a Microsoft lawyer talking about copyright to a report on connectivity in Uzbekistan. Typically the speaker doesn't get through her presentation entirely. The Center provides sandwiches.
Tuesdays are the most structured, but any day of the week you will find interesting people from whom you will learn gobs. Plus, there are speakers, meetings and get-togethers at random times.
What you have to do
Each fellow is expected to present her research at a Tuesday lunch or equivalent and to write something for the Center's journal. The rest of your duties are determined by the project the Center is supporting.
My case is a bit unusual because my project — working on a book about the effect the digital organization of stuff is having on the nature of knowledge (I really have to find a more interesting way of describing it) — is a bit off-topic for the Center. So, I'm supposed to work on the book and also lead a series of Wednesday night discussions.
Fellowships are usually for one year.
What you get
A stipend that ranges from $0 to $42,000. (I'm way at the low end of the scale, and certainly need to keep my day job.)
A Harvard ID that lets you use just about any of its resources
A Harvard business card that impresses the hell out of people
The opportunity to participate in the life of the Center
No parking privileges
I've been in a variety of academic environments, and the Berkman is the most collegial of them. Much of that is due to the personalities of the law professors in charge. The Center's first instinct, in my limited experience, is to support you in your project or line of thought. There is an air of sweetness about the Center, which I did not expect. I mean, these are Harvard law professors. Didn't they see The Paper Chase, fer pete's sake?
The Center is multi-partisan in theory. In practice, the Center's heart is clearly pro-grassroots. It's unlikely to file a friend-of-the-court brief supporting the RIAA. (If you're from the RIAA and give a lunchtime talk, you'll be treated with respect, but you'll also be asked tough questions by Harvard lawyers.)
I personally love the mix of scholarship and activism. These are folks passionate about the Internet both intellectually and practically. And it's a "learning community": I have yet to be laughed at (to my face, anyway) for asking dumb questions. The ethos is one of generosity: People will spend forever helping me to understand things.
I see more women there than men.
The gender balance feels about right in practice among the fellows (yes, I'm aware of the irony of using the word "fellows" in this sentence), although it's way off at the professorial level. And the atmosphere is definitely not one of macho competition and oneupmanship. There's a fair bit of international presence, and most discussions occur within a global perspective. The racial balance sucks.
It is an academic environment, which often informs the discourse. If that's not your cup of tea, then the Berkman Center is probably not for you. It is, however, also an activist center. I like the balance. You may not.
The range of political and policy opinions among the fellows is fairly narrow. More diversity would help.
I'm having trouble coming up with other negatives. (Oy, that sentence sits there like bait!)
If you can't tell, I'm enjoying my time as a Berkperson. I'm meeting people I care about and, unsurprisingly, you can't hardly walk through the doors without falling into a conversation that changes the way you think. What more could I ask for? Besides a parking space.
Larry Summers and the Web as world
I've posted a couple of blog entries about the Larry Summers affair — he's the president of Harvard and, as you may have heard, he shot off his mouth about the genetic inferiority of women when it comes to math and the sciences. I posted this before Summers released the transcript of his remarks, and these longer remarks afterwards. Today, The Boston Globe ran a column by Robert Kuttner that I like a whole lot better than what I wrote.
But what's most interesting to me is the fact that as a blogger and a member of the Harvard community (fellows are not faculty members) I felt that I should say something about it. The blogosphere is becoming a moral space, not in the sense that it's all goodness, but in the sense that the failure to post is itself a statement.
This I take to be evidence that the Internet is becoming a world, not just a medium. A world is coherent enough that its absences themselves have force.
I noticed this also when I examined how I came to believe that during the first debate Bush was wearing an Unidentified Rectangular Object that probably (certainly not definitely) was a receiver of some sort: I got there in part by thinking that if there were a better explanation, it probably would have surfaced on the Net.
Note that I am not recommending this way of thinking, and I'm certainly not recommending my conclusion. I'm merely observing that it shows that the silence of the Net is itself becoming observable, a sign that an environment is becoming a world.
OnFolio's new version supports Firefox. Yay! I had bought rev 1 a couple of months before I switched from Microsoft IE, and I've missed it.
OnFolio does something really simple: When you come across a Web page you want to save, it makes a copy and puts it into a folder of your choosing. Of course you can do that yourself, but you end up with component parts all over. OnFolio gives you its own foldering system, lets you add keywords and descriptions, and makes the whole thing hassle-free. Of course, it does more than that, but that's the functionality that got me to buy it. It's a great way to organize your research.
But here's my worry about OnFolio's fate. If other people are using OnFolio for the same basic service as I am, how long will it be before someone writes a free add-in to Firefox that saves Web pages into MHT format? I'm not convinced that the extra features in OnFolio are going be attractive to enough people...
(Neattricks has an interesting set of reviews of what seems to be v1, with responses from OnFolio.)
What I'm playing
I had been playing Doom 3. Great graphics and leap-out-of-the-seat scary, but ultimately, I'm sorry to say, I got bored. Plus, I couldn't kill the boss monster at the end, even after by going into god mode and giving myself all the weapons. (Yes, I know what the trick is to doing damaged to it.) So, I'm not 100% sure I actually finished it.
Now I've moved on to Half Life 2, the greatest game ever created. It's got a semi-involving narrative, fantastic graphics, and perfectly balanced game play.
JOHO is a free, independent newsletter written and produced by David Weinberger. If you write him with corrections or criticisms, it will probably turn out to have been your fault.
To unsubscribe, send an email to firstname.lastname@example.org with "unsubscribe" in the subject line. If you have more than one email address, you must send the unsubscribe request from the email address you want unsubscribed. In case of difficulty, let me know: email@example.com
There's more information about subscribing, changing your address, etc., at www.hyperorg.com/forms/adminhome.html. In case of confusion, you can always send mail to me at firstname.lastname@example.org. There is no need for harshness or recriminations. Sometimes things just don't work out between people. .
Dr. Weinberger is represented by a fiercely aggressive legal team who responds to any provocation with massive litigatory procedures. This notice constitutes fair warning.
Any email sent to JOHO may be published in JOHO and snarkily commented on unless the email explicitly states that it's not for publication.
The Journal of the Hyperlinked Organization is a publication of Evident Marketing, Inc. "The Hyperlinked Organization" is trademarked by Open Text Corp. For information about trademarks owned by Evident Marketing, Inc., please see our Preemptive Trademarks™™ page at http://www.hyperorg.com/misc/trademarks.html
This work is licensed under a Creative Commons License.