Joho the Blog
An Entry from the Archives

« Dems raising less than Republicans, possibly because the Dems are spineless || Back to Blog | Artificial buzz »

November 13, 2005

Me in the Globe about Google Print and book metadata

This short piece in the Boston Globe Ideas section started out as an article about the Dewey Decimal system in the digital age, with Google Print as a hook. But the hook ate the fish. [Tags: EverythingIsMiscellaneous google GooglePrint metadata taxonomy]


Glenn Fleishman has a fascinating post about this very issue today. What a coincidence!

The question "What is a book?" just gets harder and harder the more you look at it. I'd interviewed a bunch of folks on this topic for the Globe piece, but it all got cut as my allotted length went from 1200 words to 750 due to reshuffling of ads or some such thing. (A good place to start: FRBR - Functional Requirements for Bibliographic Records.)

Posted by D. Weinberger at November 13, 2005 12:24 PM


Comments

Really great article! As someone who has designed cataloging systems for digital libraries, I thought you did a nice job getting into the "dimensional paradox" of digital books (i.e., when a book can itself be thought of as a collection of data about the separate pages / chapters / etc that are traditionally only inside a book).

However, I do think that the "metadata" term over-complicates the article, which otherwise does a great job proving a simple explanation of a complicated topic.

Even if the metadata term has more to it than just being tech jargon, it's still a pretty convoluted way to think and talk about data (in general).

For example, you write:

"Second, we're going to need massive collections of metadata about each book."

Do you mean: data about data about each book? Or, just: data about each book?

(Sorry for being nitpicky--I have to talk with people about "metadata" all day...)

Personally, I think the metadata concept (as it's often used now) is wrongly tied to the physical, via the object concept (i.e., some data is contained as an object, and some data outside of that containment can be metadata on the object).

A digital book need not be an object (like a physical book), and this makes it elaborate to enforce it hasving an is-ness that is separate from some other-ness.

Posted by: Jay Fienberg | November 13, 2005 11:02 PM


Jay, I share some of your hesitancy about the term "metadata." I used it consciously. I think it's defensible (but not necessarily the right choice) because it's easy to think of books as informational objects. Thus info about them is metadata. (Yes, anything can be considered an informational object (if you're sufficiently abstract about it), but books are more clearly information than lollipops. On the other hand, I also refer to the nutritional labels on lollipops as metadata. So never mind.) In this context, the relevance of "metadata," though, is that it's what we use to _find_ the data, and to understand it in context. ("Non-fiction novel" is important metadata about "In Cold Blood.") That seems to me to be an appropriate use of the term "metadata."

I do agree with your bigger point, though. In fact, the whole strength of Google Print is that it uses a book's contents as metadata: Find the author by searching for the quote. That was a main point in the original version of my article. I think it got removed around Wednesday. (Damn space limitations! Damn physical newsprint!)

Not all metadata is physically separate from the data, but the physical separation when it occurs is crucial. E.g., it shapes what constitutes knowledge in our culture. My book, "Everything is Miscellaneous," argues (well, "argues" is too strong) that the digitizing of info overcomes the limitations of physical metadata and thus affects the nature, shape and authority of knowledge. Something like that.

Posted by: David Weinberger [TypeKey Profile Page] | November 14, 2005 07:43 AM


I find what you're saying very interesting, and your usage of the "metadata" term now makes more sense to me than before.

Still, it seems to me this usage of "metadata" requires forcing a physical concept into the digital realm as if it is still required, in order to look at how it is no longer required.

Historically, a lot of technolgist/ies have succeeded in forcing that physical concept into the digital realm as if it were required. Maybe this has even been an important part of our embrace of the computer, e.g., it's virutalized versions of "pages" and "files" and "documents" have been reassuring to all of us office workers.

So, I think it's great that you're getting into this and looking at how things like Google Print might change our peception and conceptual understanding of things.

In your comment, you say:

"'Non-fiction novel' is important metadata about 'In Cold Blood.'"

I'm not sure I've ever heard that kind of relationship refered to as one of metadata and data. Why isn't it just: "Non-fiction novel" is important data about "In Cold Blood"?

I wonder if, with the web, we've moved in both directions:

* we're getting more interconnected and breaking beyond physical models

* we're trying harder to force physical models on the virtual as a way of preserving walls of separation

Posted by: Jay Fienberg | November 14, 2005 02:09 PM


Post a comment

Guidelines for Commenting

Basically, you can say what you want. (Click here for the fine print.)

If you haven't left a comment here before, your comment may be put into a queue for me to approve. Sorry for the delay. Blame the damn spammers.