October 15, 2004
The future of facts (and the rise of fact servers): Are facts going to become as cheap and uninteresting as styrofoam peanuts?
The end of data: In the new world of classification and categorization, data and metadata are indistinguishable.
Walking the walk: O'Reilly's foo camp is brilliant marketing in which the product is never mentioned
Cool tool: Open source Audacity sounds good
What I'm playing: Far Cry
Email: How much of an anti-Semitic misogynist was Melvil Dewey?
Bogus contest: Name the metadata bundles discussed in "The end of data" article
Wednesdays at the Berkman Center
I've started a series of discussions at the Harvard Berkman Center (Baker House), approximately every other Wednesday evening at 6pm. The first one was on facts and the locus of authority on the Web. The next one will be on November 3rd and will undoubtedly be on something about the Web's effect on democracy, with a side conversation on how to apply for a Canadian work visa.
The series is free and open to the public. Plus, we serve pizza. See you there?
I set up an IRC chat during the last presidential debate, and about 50 people jumped on board, out-snarking one another about the candidates. (Note: This was not a fair and balanced crowd.) Kevin Marks, one of the participants, then surprised us by posting a QuickTime movie that plays the audio of the debate and shows the chat, synchronized with the audio. Lots of bad language and comments we regret. Ulp.
I plan on setting up another chat on election night. You're invited. Check my blog for details.
The future of facts (and the rise of fact servers)
The Wikipedia had to freeze the George W. Bush entry a few weeks ago because people were altering it to suit their political viewpoints at an alarming rate. So, the editors pared the page down to the non-controversial "core" of facts. There was still a lot of information there — much more than merely "He was born, he drank, he became president" — and occasional acknowledgements of controversies, such as whether Bush satisfactorily completed his National Guard service.
But, most interesting to me, towards the top, on the right, the Wikipedia ran one of the staples of its biographical entries: A fact box.
I find this two-tiered view of facts, quite common in reference works, fascinating. And in the context of a bottom-up work such as the Wikipedia, in the midst of a dust-up over what constitutes a factual account of the life of W, you have to ask: What's happening to facts?
I don't like facts and I never have. Psychologically, metaphysically and sociologically, I'm uncomfortable in their stern, disapproving, Cheney-like presence.
Psychologically, I freeze when I have to recite one. They are, for me, simply opportunities to be wrong in public. My hesitation is noticeable, leading people to think I must be struggling to make up the fact, which actually is frequently the case. That's why JOHO has been 100% fact free since it's inception. That's my pledge to you.
I also have a metaphysical problem with facts. Of course I understand that there's a real world that existed before I was born and into which I will be buried (or smudged, depending on the cause of my demise). But facts aren't the same thing as reality. They are one way reality — the way the world is apart from our awareness of it — shows itself to us. Without us, the universe would carry on fine, but facts wouldn't emerge from the darkness. Because experience is cultural, facts are cultural artifacts: They're expressed in language, they have a grammar, they are deeply contextual. Facts don't like us saying that, but it's true: "The Titanic sank in 1912" is only a fact because of a context that implicitly includes an understanding of how names stand for things, a decision to mark time by trips around the sun, a convention that numbers years from the birth of a guy I don't care much about, and a historical-cultural context that says that the sinking of a large ship is worth making an explicit proposition about.
Now, you probably snort at that line of thought because you think I'm running from the pure, brutal "Look, it happened!" that facts express. But I'm not. It was sad when the great ship went down (down to the bottom of the...), and it happened on a date we agree on. But facts are not context-free meteors that slam into our planet unbidden. They are instead a way of conjuring up the world in one of its infinite facets. They are a way of speaking, a form of rhetoric, and thus should not be treated as if they are the end-all of thought and discussion. But, sociologically, that's often how they're used: They are the knuckle sandwich of rhetoric. Facts are, of course, peculiarly important, but they are not the only peculiar and important things we say to one another. And they are not quite as reality-based, muscular and manly as they pretend. Inside every fact is a value struggling to get out.
I Love Facts
To forestall rants about how I don't believe in facts and think that, for example, the date the Titanic went down is subject to debate, let me state for the record: The Titanic sank on April 15, 1912. We should reject any explanation of facts that lets someone claim that the date of its sinking is up for grabs, relative or unknowable. Facts are crucial in disciplines I care a lot about, including science and journalism. Nevertheless, facts are form of understanding and a form of rhetoric, and thus they are always infected with slimy humanity.
So, when the Web started heating up the Internet, I was among those who thought that we were going to see a merging of voice and facts, and, more particularly, voice and objectivity. (Objectivity is the mood in which we get all factual.) To a greater extent than I'd hoped, that's happening: Just read your 50 favorite blogs. Many Big-Time Journalists go to absurd lengths to hide their political sympathies — one editor boasts he doesn't even vote — but it's reversed on blogs: If we don't know who you're voting for, how can we trust what you write?
And yet...There are classes of facts I don't want wrapped in voice. If I post a question about the battery life of a laptop, I'll trust the people who write in response more than I trust the computer company's site, but I trust the company site more for the dimensions of the machine. The company is liable for its answer in a way that a random blogger isn't; if I have to buy a new carrying case because the number was wrong, the blogger can say, "Sorry, dude, I misread the measuring tape," whereas I'll expect the company to compensate me one way or another.
Similarly, I count on mainstream newspapers to provide fact-based stories that "cover" an event: I don't expect in the foreseeable future to be counting on webloggers to tell me how many troops attacked Samara, how this was coordinated with other simultaneous battles, or how many civilians were killed. Of course I expect bloggers to fact check the media's ass but good, which implies that I don't have full confidence in the media's ability to deliver the facts. (PS: there's no such thing as "the" facts because which facts are relevant is not itself a matter of fact.) But covering events seems to require the type of centralization that only a news bureau can provide. (Hint: Any sentence of mine that of the form "only a _____ can provide" is likely to turn false particularly quickly.) Further, news organizations stand behind their stories in a way that someone talking over the virtual back fence doesn't have to. (Of course, sometimes the news media stand behind their stories Rather longer than they should.)
The role of facts in discourse may look immutable, but it is exactly the sort of thing that can change; I've been reading Foucault recently and it's startling how such deep structures can transform rapidly.(It's also startling how unbelievably brilliant Foucault was.) I don't know what will happen, but my hunch is that we are heading towards commoditizing facts, driving down their value so that they don't provide differentiating value. For example, take the table of Bush facts at the Wikipedia. With the right API, the Wikipedia could become a Fact Server that delivers the undisputed facts about any of its 1,000,000+ topics to any application that asks politely, making facts cheaper than popcorn.
Now, it would be irresponsible for a fact server to serve up dubious or putative facts, but if it only serves the commoditized facts, it won't have all that much value. So, perhaps fact servers will deliver facts along with metadata about how reliable the facts are: It's 0.99 certain that Bush was born in 1946 but it's 0.4 that he completed his National Guard duty. Will this sharpen the line between the two tiers of facts — the reliability of lower-class facts will always be the subject to argument while 0.99s are beyond serious dispute — or will it tar all facts with the welcome brush of human fallibility?
There are bunches of other questions, many of which take on an Hegelian cast. For example, the Wikipedia fact box gives Bush's date of birth but not his race. That's because our culture does not count race as relevant (haha!), and, no, you can't always tell from the photo. The Wikipedia fact box also does not state who W's parents are, yet in some cultures knowing your parentage is as important as knowing the year you were born. But, if Wikipedia acts as a fact server, it won't have to decide which 0.99 facts to include in the fact box. It will simply serve up all facts the requesting app wants. Thus, Bush's date of birth, race and parentage will show up as equal; if your culture values parentage, your app will make a big deal of that. If some other culture considers listing the date of birth to be a type of ageism, its apps will ignore that datum. Undoubtedly, some app will find intense value in the 0.99 fact that Bush is white. So, the commoditization of facts may result in the formation of cultural fact boxes that divide us on the basis of a consensus core of 0.99s that we all agree on: Cultures united in a core of commoditized facts from which they select the fact boxes that divide us. Weird. Or is it the way the world has always implicitly worked?
The delivery of facts with probabilities as part of them could lead to unpredictable consequences. Building doubt into facts could transform their rhetorical and social role. Will we recognize facts as being as perpetually subject to argument as are opinions? Will their source of authority become an integral part of them, as opposed to being an outside reference? Will the recognition that they're socially conditioned degrade them so that all facts are equal, no matter how contradictory or stupid — appending a huge "Whatever!" to all factual discussions? Are we heading towards a more sophisticated, nuanced way of thinking that will put facts in their place, or towards a new age of stupidity and obstinacy? And in the new world of facts, what will be the sound of voices conversing and voices testifying?
I believe we are currently inventing a new and important life for facts. We just don't yet know what it will be.
The end of data?
Here's an idea for the book I am perpetually working on working on. (No, that's not a typo. I've been working for over a year on a proposal that would enable me to work on the book.)
There used to be a difference between data and metadata. Data was the suitcase and metadata was the name tag on it. Data was the folder and metadata was its label. Data was the contents of the book and metadata was the Dewey Decimal number on its spine. But, in the Third Age of Order (see the previous issue), everything is becoming metadata.
For example, imagine you're at a large corporation doing a Third Order treatment of its digital library of research articles. Instead of (or, in addition to) designing a large, complex, hierarchical taxonomy, you focus on adding enough metadata to each article so that people will be able to sort and classify them any which way they want. If someone wants to find all the articles that talk about hydrocarbons written in Italian in 1965 and that have more than 30 footnotes, they'll be able to. If someone wants to make a browsable hierarchy based not on topic but on gender or on the number of co-authors, they'll be able to. You build enriched objects first so your users can forever after taxonomize the way they want to, instead of the way you think they'll want to.
Now take a closer look at these information objects. They look like contents tagged with lots of metadata, but in fact they're all metadata. If I'm looking for an article about hydrocarbons written by Barbara Rodriguez, then the article's topic ("hydrocarbons") and author's name ("Rodriguez, Barbara") are metadata, and the content is the data. But, I could just as well be trying to remember the name of the author who wrote an article that included the phrase "Hydrocarbons are the burros of the the cosmos" sometime in the 1960s, in which case the content and date are metadata and the author's name is the data. What's data and what's metadata depends on the person doing the asking.
So, in the Third Age of Order, all data is metadata. Contents are labels. Data is all surface and no insides. It's all handles and no suitcase. It's a folder whose content is just another label. It's all sticker and no bumper.
Why does this matter? It changes the primary job of information architects. It makes stores of information more useful to users. It enables research that otherwise would be difficult, thus making our culture smarter overall. But, most interestingly (at least to me), this does the ol' Einsteinian reverse flip to Aristotle. Aristotle assumed that of the 10 categories by which one could understand a thing, one must be primary: Where that thing fits into the tree of knowledge. So, you could say that Alcibiades is made of flesh or lived in Greece, but if you really want to understand him, you have to say that he is an animal of a particular kind. But, now that everything is metadata, no particular way of understanding something is any more inherently valuable than any other; it all depends on what you're trying to do. The old framework of knowledge — and authority — are getting a pretty good shake.
Right? Wrong? Old? Obvious? Pointless? Stop me before I make a fool of myself to someone not as nice as you...
My friend Robert Morris who teaches computer science at U. Mass Boston, and who has always been unnecessarily generous to me with what he knows, says that the above is pretty much old news:
The short answer is that in the business, nobody anymore contends there is a diffference between data and metadata ort her than in a context such as you mention, namely the metadata is usually that part which helps you locate and use the other part and which you can often ignore if you already know those things.
Bob points to Life Science IDs (LSIDs) as an example of a standard that does sort of distinguish data from metadata.
An LSID is an immutable, permanent, globally unique key to a piece of information. The LSID spec requires that getData always return the same bytes for the entire future of the universe, whereas getMetadata may return things about the information that could change.
LSIDs are being supported by the Interoperable Informatics Infrastructure Consortium (I3C). An LSID server sits in front of your database or application so you can continue to use your existing infrastructure.
Sounds like the architecture for a life sciences fact server...
Middle World Resources
Walking the Walk
It's no surprise that O'Reilly Publications is a cool company. It's geeky and Tim O'Reilly is, IMO, a hero of the Web. Even so, I'm often impressed with just how right they get just about everything they touch. I don't want to rave about Foo Camp, the free-form weekend camp-out for nerds and geeks, but one of the many reasons it succeeds is that even though the company pays for it and lets us occupy its building and grounds, Tim keeps the weekend free of overt O'Reilly commercial messages. In true end-to-end fashion, O'Reilly gets out of the center and allows the ends — the attendees — to connect.
As a result, we love O'Reilly all the more.
For the Hyperlinked Organization
Bit by bit, I'm replacing my desktop apps with open source ones. The latest one to go has been PolderBits, a fine sound recorder/editor for which I was happy to pay $29. But Audacity is at least as good for my minimal needs. And Audacity is open source and free.
I'm not doing sound editing, so I have no opinion about how Audacity stacks up in that regard. But it's terrific for recording onto your computer off a microphone and — more important — for recording whatever sounds you're streaming. So, if you're listening to radio over the Internet and they're playing a song you'd like to keep, just press the Audacity "record" button. (That's known as the "analog hole" to people who want to plug your every orifice with Digital Rights Management controls.)
Then you can do a whole bunch of manipulation of the sounds, but I don't.
In order to postpone the pleasure of Doom 3, I'm playing Far Cry, yet another shooter. Lots of people like it more than I do. And there are many elements to admire: The graphics are detailed and the island on which it's set is beautifully drawn. The enemy AI is the best I've seen; not only don't they get stuck running into palm trees, but when you shoot at them from a distance, they do things you might do, assuming you're not a pants-wetting civilian like me. I even don't mind their we-know-best save system, especially since you can save anywhere you want if you look up the code on the Web. But, I'm just not finding it all that engaging. Painkiller, which I finally and regretfully finished, has humor and imagination going for it. Far Cry has a beautifully rendered tropical isle and not enough imagination.
Email corrections, additions, detractions and refutations
I got great mail from bunches of you about the piece in the previous issue about Dewey. If I try to respond to it all here, I'll never get this issue out, and in the new fast-paced world of the Web, I'm trying to pare JOHO down so that it can come out more often than Punxsutawney Phil.
Several of you took issue with my statement that Melvil Dewey was "a progressive on social issues." Of course, you're right. But you're not as right as some of you think. Although it's certainly true that he was forced to resign as NY State Librarian in 1905, Wayne A. Wiegand paints a complex picture in his biography, Irrepressible Reformer (1996). The anti-Semitism that got him fired seems to have been rather conventional: He and his wife created a gated, semi-utopian community at Lake Placid that casually excluded everyone except white Christians. In his day-to-day dealings, he seems to have expressed no hatred of Jews. He also was accused of being a sexual harasser, although — in part because the language of the day was so circumspect —it's hard to tell from Wiegand's book just how lecherous he was; that he made at least some of the women who worked for him intensely uncomfortable seems certain, and it may have been much worse than that. And, indeed, the fact that women worked cheaper than men undoubtedly was important to him as he staffed up. The reality seems at best disturbing.
Wiegand argues that Dewey's forced resignation was due not just to his anti-Semitism and his abuse of women but also to the fact that he was egomaniacal and a shady bookkeeper who made lots of enemies for good and bad reasons.
In short: Dewey was complex.
(Thanks for the correction.)
Bogus Contest: Name that entity!
You know those objects I talked about in the article above, the ones that are all metadata and no data? I want to give them a name. It should be something that businesspeople can talk about without embarrassment. At the moment, believe it or not, the best I've come up with is extradata; at least that would let me talk about data, metadata and extradata. So, you do better. You might take it in a completely different direction. For example, you might suggest "i-objects," "data monads" or "chrontent," which I'd then reject and possibly laugh at.
So, go ahead. I could use a good laugh.
That's it for JOHO. Sorry for the delay. There's just too much going on. And don't forget that I'm writing absurd amounts of paranoid drivel over at my blog. Just think how much worse it's going to get after November 2 when I am terminally depressed. So, read my blog now, before the Great Depression begins.
And, if you're American, don't forget to vote. Depending, of course.
JOHO is a free, independent newsletter written and produced by David Weinberger. If you write him with corrections or criticisms, it will probably turn out to have been your fault.
To unsubscribe, send an email to [email protected] with "unsubscribe" in the subject line. If you have more than one email address, you must send the unsubscribe request from the email address you want unsubscribed. In case of difficulty, let me know: [email protected]
There's more information about subscribing, changing your address, etc., at www.hyperorg.com/forms/adminhome.html. In case of confusion, you can always send mail to me at [email protected]. There is no need for harshness or recriminations. Sometimes things just don't work out between people. .
Dr. Weinberger is represented by a fiercely aggressive legal team who responds to any provocation with massive litigatory procedures. This notice constitutes fair warning.
Any email sent to JOHO may be published in JOHO and snarkily commented on unless the email explicitly states that it's not for publication.
The Journal of the Hyperlinked Organization is a publication of Evident Marketing, Inc. "The Hyperlinked Organization" is trademarked by Open Text Corp. For information about trademarks owned by Evident Marketing, Inc., please see our Preemptive Trademarks™™ page at http://www.hyperorg.com/misc/trademarks.html
This work is licensed under a Creative Commons License.