The Berkman Center has a lunchtime speaker every Tuesday, and this week it’s my turn. I’m talking about — guess what? — taxonomies and tags. It’s an informal venue, and with luck I’ll be interrupted after ten minutes, but I need to have a full talk prepared, just in case. I’ve been having trouble structuring it. Here are the notes I have so far. Comments? Criticisms? Rude suggestions?

Why Tags Matter

I want to talk about three ways tags matter.

If necessary: Brief explanation of tags. Show and Flickr. [Yes, I’m confident Berkpeople know what tags are, but these talks draw a broader audience.]

First, tags may not matter:

We’re in an early adopter phase. Historically, people have resisted adding metadata to objects.

Why is there such enthusiasm now? A. We get individual value from tagging.
B. No one is telling us to do it or how to do it.

First reason: Aristotle

For Aristotle, to be is to be a type of thing. Types = categories. He gave us genus-species definitions: X is a type of P and is different from other members of P. I.e., X is what it is because of the category it’s in.

Atistotle’s implications/assumptions:

Knowledge and world are one

Categories are defined by principles (e.g., “rational animal”): These principles are rational, can be known by experts who have authority, exist independent of our awareness, and are precise. (Every member of a category is an equally good example of that category.)

Aristotle’s principles of organization come from how we organize physical things in the real world: Lumping and splitting. So, ideas are assumed to be subject to the same limitations as physical things: X can only be “shelved” in one spot at a time. (Law of Identity — ((A=A) and ~(A = ~A)) — becomes true for ideas as well as for physical objects.)

Challenges to Aristotle:

Postmodernism (brief!): Disputes that categories are independent of us and are rational. Points to relation of knowledge, authority and power.

Eleanor Rosch: Not all members of a category are equally good examples. Her theory of classification by prototype. Prototype classification says our conceptual organization is far fuzzier and messier than Aristotle thought.

Tagging: Categories are driven by convenience not principle, are relative and relevant to the individual, and are non-authoritative

Lack of special status for author’s own tags indicates just how non-authoritative tagging is

Why does disputing Aristotle matter? Aristotelianism affects us when we think of the world as something that starts with definitions, that consists of topics that persist through history, that enable domain-specific authority.

Second reason: Nature of topics

Frank Miksa, professor at the University of Texas, Austin: We all tend to believe that “there exists a realm of knowledge that grows through individual contributions and is transmitted from generation to generation such that its existence is thought to be continuous and is capable of being examined.”

Example of the breakdown of that idea: Wikipedia

Topics are whatever someone is interested in, so long as it can be verified

450,000 entries in English so far (60,000 in Encyc. Britannica)

Categories (like tags) are assigned by readers. Hierarchy also. E.g., Tori Amos is a top-level category because someone assigned her sub-categories. This isn’t a statement about what’s important but about how to make it easy to find the new Tori Amos CD.

Topics are becoming more like interests than self-standing, transgenerational slots. Also, finer-grained.

Third reason: Re-meaning

We have been born into taxonomies. Now we’re making our own. It’s messy, but, well, so are we.

The fact that the basic principles of taxonomies — lumping and splitting — have reflected physical limitations means that our alienation from categories is an alienation from the physical world??

Most exciting thing: We don’t know where this is going. A new infrastructure of human meaning. What will emerge?

27 Responses to "Why tagging matters — Notes"

  1. Hierarchies aren’t going to go away. Whether we start with a concept and refine top down or work the other way we are organizing symbols into some kind of hierarchy. It seems to me that chunking is a part of language and of thinking. Tags, in this sense are a form of symbolic chunking.

    Taxonomies are a kind of organized chunking that seem to treat Aristotle’s concepts as dogma. But taxonomies are problematic. One problem with taxonomies is that they are often just wrong or misleading. Another is that they are subject to change. I am a recreational mycologist and, like many others, I have reverted to calling mushrooms by their common names rather than their botanical ones because the taxonomic classifications changed. Thanks to DNA analyses a lot of mushrooms have been reclassified. Matsutake which used to be “Armillaria ponderosa” are now officially “Tricholoma magnivelare”. Likewise, Shaggy Parasol which used to be “Lepiota rachodes” is now “Macrolepiota rachodes”. And so on.

    I don’t know about other branches of biology but it seems to me that using DNA rather than structural features as a basis for classification has to be changing things there too. And that, almost by definition, challenges any strictly hierarchical organizational scheme. Maybe tags are an answer to resolving many to many relationships.


  2. I agree that hierarchies aren’t going away, although some are.

    Your use of common names is fascinating. You might check out uBio for the life sciences. (I wrote about it here:

    Thanks for the comments.

  3. “Historically, people have resisted adding metadata to objects. Why is there such enthusiasm now?”

    (Putting aside my contention with the term “metadata” altogether):

    One way metadata becomes attractive to folk is when it is made usable as data.

    Folks like usable data–and folk can be very enthusiastic about creating data that is usable to themselves or to others who might reciprocate.

  4. A couple remarks:

    * The feedback loop inherent in, Flickr, and the like seems to be a vital component of the process. Over time I have tuned my tags to more closely align with those of others who seem to be in the same “concept neighborhood”. That makes it more useful both to me and others, as far as I can tell. I’d be surprised if I’m alone in that behavior.

    * Tags enable a network of concepts in a way that hierarchies don’t. Hierarchies are an instance of a concept network, but they are a special case. Hierarchy implies a frame-of-reference on a corpus of knowledge. But there’s no reason to believe that any given perspective is more “right” than another. Tags relax that restriction and provide the potential for multiple simultaneous frames-of-reference to exist, thereby providing a much richer way of navigating knowledge.

    That said, the tag-based systems currently held up as examples aren’t quite there yet. Tagging tags seems like a logical next step which can then start to enable synonym mapping, multiple inheritance (borrowing the programming term) hierarchies, etc.

  6. In the edventure article you refence above, you lead in with the question: “Does anyone still believe that the line between facts and values is clear, distinct and easy to maintain? The Web is only making the line more blurry…as well as making it more urgent that information technology take that into account.”

    I guess I’m simple minded, but I do think so. A rock is a rock, with a hardness, a crystalline structure, a cleavage, and so forth. Science in general is fact based. Philosophy perhaps is value based, except where it denies values of course. The distinction is clear to me, but then I’m narrowly informed.

    And I disagree about these media blurring the lines between facts and values. We may have plenty of people who have special interests in denying facts or warping facts to their own interests, but life was ever thus. Post Darwinists and geologists understand each other very well, and while Linnaen taxonomy has given way to something richer and deeper, this is due to a richer and deeper set of facts. A new paradigm emerges as we are ready to grasp it.

    The Getty can point you to 35 feet of material relating to Bern Porter, and none of it is his found work. The web can point me to the pointer to the 35 feet, but I have to dig deeper to find the original material. Tags will not help in this research. Tags will clutter up the joint. Don’t get me wrong, I love tags. Hell, I love the net, I love podcasting… that’s all some kind of fishing gear, right?

    Tags are new and they are not replacements for orderly catalogs or scientific classification schemes. Rather, their significance remains to be expressed. I hope to read your current article one day soon when i can borrow a copy!

  8. Actually Frank, the crystal form of a mineral has a hardness, a cleavage, and so on. A rock is an aggregate of one or more minerals. Since we’re focusing on meaning and classification…

    David, Wikipedia doesn’t really operate that loosely. For instance, it’s not unusual for one topic to be subsumed into another, and a redirect put in its place. This organization can happen from just some average Joe, but usually happens based on actions from the Wikipedia ‘sys-ops’, or super users. In addition, one can’t just add a topic–there has to be enough substantive information about the topic to make it a legitimate encyclopedic entry. There are topics deleted every day that are vanity topics, or topics on a subject that’s not appropropriate to the venue.

    Now flickr and delicious are purely open tag-based systems, but we don’t see anywhere near the level of organization and multi-user use with these, because most people use the tags for their own classification and memory retrieval. A few people might choose tags deliberately in order to start enforcing a rigor on the tags, but most people could care less.

    The two types of entities are completely different. The concept of topic pages in Wikipedia is not equivalent to the use of ‘tags’ in flickr or delicious. For instance, you wouldn’t see a deletion of a tag in flickr or delicious (maybe in delicious, this is a little more restricted), but you would in wikipedia, frequently, and this tells you much about the differences between the two environments.

    I don’t understand why people keep mixing the two.

  9. Thanks for your elaboration, Shelley.

    In the actual presentation, I tried to make clear that Wikipedia isn’t a perfect example of a tagging system, but it’s fascinating as an example of what’s happening with topics.

  10. Hooray! The bird is back. I think. I mean, I’m sure about the hooray part. Is the spring break over, I wonder?

    Sorry I got into all that hardness and cleavage nonsense… call it a Beavis moment.

  11. Hierarchies seems to be useful when there is inheritance, hardcoded common denominators. Say for species.

    Hierarchies are counterproductive when no inheritance exist and even traits are dynamic.
    Take our most commonly used hierarchy, the organisational hierarchy: Skills, expertise, location, propensity to get out of bed early or other ‘useful’ properties applied to an employee are all dynamic and far too many to be justly represented by the mere two dimensional model of a hierarchy. Here tags would do nicely.
    But – tags are kind of moot unless it replaces the hierarchy, the hierarchy will always limit.
    Say I need somebody who knows Cobol, speaks French, likes to travel – sure there’s one, but oops, he’s not in my department.
    So why not bye-bye organisational hierarchies – hello tags :-)

  12. Sig, well put. And between hierarchical taxonomies and tags are faceted classifications systems, useful where the type and range of parameters can be known, e.g., a parts catalog: I need a bolt made of brass that’s .25″ and reverse-threaded, although you may be looking for brass pieces that are reverse threaded, .25″ and are bolts.

  13. Cobol/French/Likes to travel… isn’t there a place for a relational database in here? Is that another name for “faceted classification system?”

  14. Frank, a relational database can certainly undergird a faceted system. Not all parameters will apply to all objects in the system (nails are neither left- nor right-threaded) and the faceted system will quickly do the calculations required to guide the user only down paths of relevant parameters that lead to positive results. E.g., if there are no wines rated greater than 98 that cost less than $25, the user will not be presented with “under $25” in the dropdown list if she first asks to see wines rated greater than 98. Likewise, if she asks to see wines under $25, the wine quality list will only show relevant wines, and the list of available years will adjust also.

    E.g., Endeca has a client that provides parts for the petrol industry. It has 25 million items to deal with. So, yeah, there’s a database underneath it — or, likely several databases — but the faceted system sucks the data from those databases in and expresses them in a hugely flexible user interface that lets engineers drill down as they want and never drill down to null result sets.

    Why Tag?

    Tagging is as natural as Adam naming things in Genesis. We like to describe things, and naming things is our shortcut to description. It gives us feelings of control over our world, and an investment in our future. When you look to tagging motivation, that’s first.

    A second order motivation: social rewards. Like easier discourse (participation), peer recognition of leadership (ego), to trend-following (affiliation).

    The third order motivation: self discovery. Have you run into things that are hard to tag? Even with two terms? Notice which tags are earning your attention now, vs. last month? vs. last year? What your colleagues are tagging?

    These Tags and Those Tags.

    Like tagging, Ordering is also an investment in the future. But it happens in time: before tagging and after.

    When it happens before tagging, you have imperfect information (you haven’t tagged everything yet), so your structures (groupings, taxonomies, maps, whathaveyou) must undergo recursive change. See the DNA example above.

    When you order before use, you always infuse (impose?) the points of view of the orderer infused with/on that structure. The order of a botanist is not the same as that of a shaman (mystical properties dominate), or a hunter (what various prey eat), or a florist (what shapes and colors and fragrances form an aesthetic).

    So it is radically important to separate the activities of naming and describing things from their ordering.


    But the net lets us do something new. Data mining. Order found or produced Just In Time, as we need it.

    JIT Ordering is important at three moments:

    Last, when searching. Google produces 8 gazillion results for a search, so providing some context and shape to those results may be useful.

    Before, when surfing a knowledge space. Like wandering through a MUD or a first person shooter, we often walk down paths of organization. So when I’m at the brass objects, I have a bunch of choices, trails blazed by others, including surprises.

    And First, when tagging. Suggest tags. Scour what I’m about to tag for clues. Get me in the neighborhood. If I’m the 50,000th person to tag the IBM logo, maybe I don’t need to invest more than 1 second (or any time) tagging it differently than the rest.

    So, order discovered, order imposed, and order made.

    There’s a downside to JIT Order. Organization helps us remember. It also directs investigation, revealing gaps in knowledge we can try to fill. Half of schooling is imparting structures of knowledge to assist in memory of human and natural things that are messy, like history or art.

    See you.

  18. Just a quick persnickety moment, if I may, inspired by this tiny section:
    “Lack of special status for author’s own tags indicates just how non-authoritative tagging is”

    – I think that an awful lot of the recent talk around tagging, ethnoclassification, folksonomy, whatever, has been confused by people’s desire to imagine, flickr, furl, metafilter, etc., as all doing the same thing, whereas in fact none of them are doing the same thing as each other at all and have only their use of tags in common.

    A case in point: in, it is difficult for an author to decide what tags should be used to describe their content (they get one vote, and it’s rarely casting), while in flickr, users largely generate their own content and apply tags to it (the opposite approach, although friends/contacts can have a say in what tags are appropriate, as they’re able to add tags of their own). If you look at a blog entry via*, the author is the only person who categorises their content, and the only a reader can tag it is by bookmarking it in

    I think I’m making two points. One is that when you talk about “the individual” it makes a difference whether that individual is the author or consumer of a certain resource, insofar as it may be the case that sometimes authors really do know best when they provide a hierarchical structure, and sometimes you want the wisdom of the crowd to help you examine an otherwise confusing dataset. And two: just because a system uses tags, it deosn’t necessarily follow that the social benefits that follow will be what you expect, or even exist at all (I’m very glad my Gmail tags aren’t social, for example, although I prefer using them to folders).

    Anyway. Bit tangential, really, and the rest of the talk sounds like something I’ll be sorry to miss: hope it’s a fun time for everyone there.

  Like tagging, Ordering is also an investment in the future. But it happens in time: before tagging and after.

  24. I agree that hierarchies aren’t going away, although some are.

  Like tagging, Ordering is also an investment in the future. But it happens in time: before tagging and after.

  27. Hi i am kavin, its my first time to commenting anyplace, when i
    read this piece of writing i thought i could also make comment due to this brilliant post.

