February 20, 2012

Culture is an echo chamber: We all hate echo chambers in which a bunch of yahoos convince one another that they're right. But, our fear of echo chambers can blind us to their important social role. Just take a look at

In love with linked data: The Semantic Web requires a lot of engineering. So along comes this scrappy contender that says we ought to just make our data public and see what happens. Brilliant!

Too Big to Know: I worked on a book for a couple of years, and now it's out. Yay?

Report from the DPLA platform: Surprisingly, I'm interim head of the project building the software platform for the Digital Public Library of America. Here's what's going on.

Bogus Contest: #Stories If history were written in hashtags.


Where the heck has Joho been?

Place a 55 gallon drum of maple syrup (or, for the Brits, a 20 kilo drum of marmite) in an 8x8x8 room, and blow it up with whatever explosives the Mythbusters haven't used up. Now ask: "Where is the maple syrup/marmite?" The Law of the Conservation of Syrup guarantees that none of it has been lost. It's just been redistributed.

And that, my friends, is where Joho has been: In Joho the Blog, tweets, Google Plus, the Library Lab blog, the DPLA dev blog, Radio Berkman podcasts, Library Lab podcasts... I'm afraid that Joho the 'Zine has been so thoroughly distributed that it now exists only as a mist.

The good news? The aerosol version of Joho has a surprisingly pleasant piney smell.


Why is Joho back now?

For two reasons.

First, I was writing a piece about Reddit and echo chambers, and it felt Joholic to me.

Second, let me be blunt about this: I thought I should let let you know that my book was published, because, well, no need to go into that.


Moi moi moi

Since it's been such a long time, here's an update on me:

I'm still at the Berkman Center.

I still co-direct the Harvard Library Innovation Lab (at the Law School Library), where a small but fabulous team of developers explore the future of libraries by creating prototypes and apps.

And I am since January the interim head of the development group building the software platform for the Digital Public Library of America. More about that further down the page...

Some recent wriing 'n' stuff? just posted an op-ed on Web fame, and although my article on commons-based science in the December Scientific American is locked, here's an interview

In the Too Big to Know post in this issue, there are links to some material specific to that book...


Enough with the preliminaries! Can't Joho start already?


dividing line
Culture is an echo chamber

I have a friend in the media business who is making a good-faith effort to understand how the Internet media ecology works. I decided that Reddit would be a good case study, and that the Woody Harrelson Affair would be a useful example of how to go wrong on the Net and, by inverting it, how perhaps to go right.

What started as a brief message got longer and longer as I tried to unpack the self-references and multiple layers of irony in the Reddit thread. (To those unfamiliar with the event, The Observer has done a reasonable job telling the story.) I found myself having to explain everything at once: Y'see, Woody offered to do an IAMA AMA ("I am a ___, ask me anything"), but gave media-trained answers, and bolted when someone accused him of being a sexual cad. Reddit was well-disposed to Woody before the IAMA because he is a cultural fit with the basic Reddit demographic/psychographic, and besides he was in Zombieland, which itself was a self-referential, ironic movie that even had Bill Murray in a self-knowingly self-referential cameo. And that starts to peel the Reddit onion. There's the particular form of banter characteristic at Reddit, and the community's own vocabulary (e.g., "FTFY") and references back to previous threads with only occasional pity for newcomers, and the use of particular types of annotated graphic memes (Scumbag Steve becomes Scumbag Woody), and the use of annotated gifs as a form of commentary along the lines of a Nelson "Ha ha," and and and...

Reddit has suddenly been noticed by the mainstream because it was an important catalyst in the SOPA blackouts, and because the Woody Affair has piqued the interest of other snarky Hollywood sites, including Movieline, and even Gawker which has been highly critical of Reddit for its occasional adolescent boy's sensibilities and its permitting the presence of user-generated topic pages that were beyond the pale of decency even if within the pale of legality. (As I'm writing this, has started a campaign against these topic pages. Here's the Reddit community's response. Update: The site has, thankfully, taken down the offending pages). As the mainstream becomes more aware of Reddit, some of Reddit's homegrown mechanisms — IAMA's are the most likely candidate — are on their way out into the mainstream. And the mainstream is going to be washing into Reddit. Would a Tarantino IAMA surprise you? Can a Brian Williams IAMA be far away? Will we soon be dividing our public figures into those who would be willing to do an IAMA and those whom we imagine would not? Conan IAMA, yes — Leno IAMA, never. Natalie Portman yes — Katherine Heigl, nope. Lady Gaga, yes — Madonna, not so likely.

This is how culture develops: insular groups create their own vernaculars, sensibilities, and rhetorical forms. These then spread. But this is also an example of why we need echo chambers.

An echo chamber is a social cluster in which people only hear from others who agree with them. This can cause people to become more closed to outside ideas, and to become more extreme in their beliefs. Because the Net makes it so easy to find people with whom you agree, the fear is that the Net is encouraging the echo chamber effect.

Let me say clearly and firmly at the outset that the echo chamber phenomenon is something to worry about. (See Cass Sunstein and Eli Pariser.) No matter how extensive and common echo chambers may or may not be, we need to be bending every effort to keeping ourselves open to differences. We need to do this as individuals, as parents, as teachers, and at the institutional level.

But one consequence of the useful airing of the echo chamber argument is that I frequently hear people apologizing for getting news from a site that reflects their own partisan point of view, or for having a conversation with people with whom they agree. We need to keep in mind that talking with people with whom you share assumptions and outlooks is not a failure of conversation. It is what makes conversation possible.

For a conversation to happen, people need to share a language, interests, values, a knowledge base, conversational norms, and more. Conversation consists of iterating on small differences on top of a huge ground of similarity. Inside an echo chamber, people are not literally repeating what everyone else is saying. They're differentiating themselves by finding slightly new spins and additions: We're both Obama supporters, but you think he needs to act more folksy, or we're both long-distance bicyclists but we disagree about the best fluids to drink along the way. If we're echoing each other on these topics, they're exactly what we don't talk about. Most of the progress in our thinking is made through that sort of iteration. Echo chambers may not be where deep ideas are overturned — although I'm not sure that's right — but they are where ideas progress.

The same is true of understanding. If I want to understand the consequences of some new FCC ruling that is over my head (as they all are), I turn to sites that share my basic values about the value of the Internet — Harold Feld, for example — rather than a group lobbying against my interests. Likewise, if I want to understand if some new policy advances or retards the legalization of same sex marriage, I'm going to go to a source that shares my commitments, not to a right-wing, fundamentalist site where I have to flip their meanings — their "Latest outrage from the Courts" is probably my "Another victory!" — to get a sense of what's going on. Again, this is not a failure of understanding but a condition for it. We understand the new by assimilating it to the familiar. We may not like it, but that's the way it works.

Likewise, hanging out with people who are like you is not a failure of culture. It is its requirement. Culture results from groups using references to shared works and forms. The question is how we manage to form cultures that do not excessively isolate themselves from other cultures, just as don't want to wall conversation or understanding off from ever encountering substantially different ideas. How do we form cultures that appreciate the differentness of others?

Here again Reddit provides an example, both negative and positive.

Negatively, the use of insider lingo and references can be a way of excluding others, which has been a persistent criticism of Reddit: if you don't get that today's meme is a twist on yesterday's, you've shown you're not really not a member of the tribe. And the more obscure the references, the more cohesive the community feels. Reddit indeed gets quite obscure in its self-references, not only as a form of clubbishness, but because insider jokes are funnier the more insider they are.

Positively, Reddit provides an example of how some fresh air can be let in to a community tightly bound by explicit and implicit markers of cultural sameness. Amidst the puppy videos, the pop culture obsessions, and the horny boyisms, there are occasional IAMA's from across cultures and classes: a male host at a Tokyo club, a college student who lived in his car for six months to afford school, an African-American whose family is teetering on homelessness, a 22-year-old born without a right arm, a nurse in a burn unit, an 18-year-old with cystic fibrosis, someone living near Fukushima, an elephant trainer in Ghana, an illegal immigrant who's lived here for twenty years.

Now, lest you accuse me of being a mere Reddit fanboy, I acknowledge that these air holes in the Reddit echo chamber rarely have front-page rankings, are skimpy on reports from outside the US, and are the exception in the daily roster of IAMAs. (I would love to see a rhetorical form emerge at Reddit in which engagement with people from other cultures became routine. IAMFROM AMA?) Even so, these IAMAs nevertheless suggest several lessons:

First, we will care about other cultures when someone makes them interesting to us. We're unlikely to look up, say, Nigeria in the encyclopedia simply out of a sense of duty. We'll learn about Nigeria if someone presents a good story about it — don't underestimate the importance of good writing! But Reddit also suggests that our politeness can hold us back. The AMA form — Ask Me Anything — gives people permission to ask the questions they really want answered, even if those questions threaten to tread on norms and sensibilities. The aim is frankness, not rudeness. Maybe we would be more interested in that which is unlike us if we felt more free to actually get into it.

Second, the way out of echo chambers is through echo chambers. We're probably not going to decide to go out of our way to encounter that which is good for us; we come to Reddit for the variety it offers, into which are stuck occasional IAMA's. Besides, we need an echo chamber of shared beliefs and values so we can assimilate (= understand) that which is foreign to us.

Third, we should not let our fear of echo chambers hide the crucial role echo chambers play in conversation, understanding, and culture.

Culture is an echo chamber.

dividing line

Linked Data vs. the Semantic Web

My prior book, Everything Is Miscellaneous, expressed discomfort with a direction the Semantic Web was being taken by some. My new book is all lovey-dovey about Linked Data, which is also part of the Semantic Web. What's the diff? Well, I'll tell you how I "understand" it.

The Semantic Web attempts to make the Web yield up more of its smarts. In the canonical example (which comes from Tim Berner-Lee's 2001 Scientific American article, with James Hendler and Ora Lassila), a Web app should be able not only to enter your mother's doctor's appointment into her calendar, it should be able to look up the local transportation options and figure out the best way to get her there. It might also check the weather report for that day and decide to take her by subway instead of having her walk. To do this, the sites with the weather and transportation information would have to make that information available in ways that make sense not only to humans but also to machines. The Semantic Web suggested a particular way to do that: Ontologies.

An ontology is a conceptual map of a domain: calendars, mass transport, weather, etc. It specifies the vocabulary conforming sites are to use ("taxi" vs. "cab"), and the relations among all the terms (taxis charge by the mile, taxis have no schedules, taxis can go to any spot, taxis are covered so passengers don't get wet). My problem with ontologies in Everything Is Miscellaneous was that large and complex domains don't reduce well to such maps. There are too many ways to think about things.

Linked data takes a different approach to adding to the semantics (or meaningfulness) of the Web. Rather than having to create an ontology and then get sites to agree to it, the linked data approach says: Just make your data available and we'll figure out how to deal with it afterwards. But (says Linked Data) there are some ways you can make your data immensely more useful: Use a standard format that expresses data as "triples," and make each of those triples a link. A triple is two terms connected by a relationship: "The platypus" "lives in" "Tasmania." But (continues Linked Data) instead of expressing this in text, use a link for each of these three terms. So, "Platypus" might actually be a link to the page for the platypus in the Encyclopedia of Life, "Tasmania" might point to a Wikipedia page, and "lives in" might point to one of the standard vocabularies that are springing up on the Net. That way when a computer is trolling for information about platypuses and it comes across the triple "watermoles have venomous claws," if "watermoles" is a link to the same page in the Encyclopedia of Life, the computer will know that it's talking about the same thing as the platypus triples. If two triples are not pointing at the same online resource, someone may do a mapping that says that this page in the EoL refers to the same creature as this page in another online species site. And even if the mapping isn't done, as more clouds are released, you can swing through more links, making associations within and across domains.

Linked Data is a brilliant idea. It enables vast clouds of data to be released without first having to build a large, complex ontology. It allows these clouds to talk without even having to agree on their terms of reference. It is messy, sloppy, and imperfect. In other words, it's exactly as it needs to be if we want knowledge to grow very big.

dividing line
Too Big to Know

So, I wrote this book about how knowledge is taking on the characteristics of its new medium, just as it had taken on the characteristics of its old one.

It came out in the beginning of January, and I find myself feeling awkward about writing about it to you, probably because I can't do so without pitching it.

So, how about if I tell you two ways I think it's different from my other books, other than in its topic?

1. Although overall I am optimistic about the networking of knowledge, I think Too Big to Know does a better job accounting for the negatives of the Net.

2. I think I take technodeterminism — the criticism that technology by itself does not determine what we will make of that technology — more seriously, although I still end up believing that some of the characteristics of the Net are likely to have predictable effects, especially within a culture.

Criminies. I should probably do a better job telling you what's new and exciting about Too Big to Know. Well, it's about knowledge! And our new strategies for knowing the world now that our medium doesn't require us to strip knowledge down to fit little paper containers!! And it's about other stuff that deserve even more exclamation points!!! Exclamation points!!!!

If you're interested, there's a bunch of videos and radio interviews up on my shameless video page. You might try the Berkman talk at the top, or the Spark radio interview (26 mins) by Nora Young. The Atlantic ran an excerpt, Salon ran an interview, and I like the BoingBoing review. Also, this week I was on On the Media, one of my very favoritest shows. Google around and you'll find more, including some folks who think the book is quite wrong.

You might disagree.

dividing line
The Digital Public Library of America

The Digital Public Library of America is a bit like a book that started with little more than a really good title. Only as it's being written (so to speak) is it becoming clear exactly what it's about.

The DPLA originated from a meeting of major libraries and other institutions in the fall of 2010, and it's got a whole bunch of things going for it, particularly the interest and support of major libraries and other institutions. Although Robert Darnton — one of the meeting's conveners — has written beautifully about his vision of it [video], as far as I can tell there is not yet full agreement about some of the basics. Should it be a source of licensed digital content the way public libraries are? Should it assemble its own index and catalog, or should it query online collections on the fly? What should be on its home page? What are the guidelines that will determine which digital collections it's going to include?

Since our little group has been charged with developing the software platform for what the DPLA might be, we're building it even while those issues are being sorted out. We're providing it with open APIs so independent developers can create whatever they want, putting to use the data, metadata, and content the DPLA gathers. (I'm the interim head of the project; we'll see how that pans out.)

There are, of course, areas in which the DPLA's supporters agree: It should support open access, the software should be available as Open Source, it should be gathering metadata from a wide swath of institutions (libraries, archives, museums, and more), and it should launch in April 2013. We would like it to enable developers to create applications such as content browsers, recommendation engines, library analytics, integrations into existing Web sites and services, and apps no one has yet imagined.

So, we have engaged in a rapid, agile development process, hoping to post new builds every week. We also hope to be releasing a set of prototype applications that illustrate some of the utility of such a platform. That means, however, that we're not first building data models before architecting the system. We're plunging right in. We're hoping that the DPLA community will be ok with our basic data strategy: normalize the incoming complex data models to a simple but useful one (based on the Dublin Core) while preserving the incoming schemas in all their complexity for developers, via a noSQL database. (The latest build is running on MongoDB.) If you disagree, let us know. We're open to learning and changing.

At the same time, we're working on a technical specification, which we're developing on the project's wiki. We just don't see any way to make an April 2013 deadline except by developing schemas, code, sample apps, and the specification simultaneously.

This undertaking is especially foolhardy because the issues are so wicked. For example, if someone searches for materials about the Battle of Gettysburg, the platform ideally should respond with all the relevant books, photos, curated Web pages, DVDs, etc., that the DPLA knows about. But doing a subject search across multiple types of data that have come in without a uniform schema much less a uniform set of subject headings requires magic ahead of our current technology. Fortunately, perfection here is impossible. We're hoping for something that is more useful than nothing. (And linked data — see below — offers some exciting prospects.)

We're a tiny team. We're working in the open, hoping that the community — a particularly expert set of folks — will pitch in. So, please consider jumping into the DPLA at every level, from joining the mailing list to helping us write some code. If we succeed even a little, we will have brought into the Web ecology at least a modicum of what libraries and librarians know.

dividing line

Bogus Contest: #Stories

As you all know, a hashtag is a tag put into a tweet so that it can be clustered with all the other tweets talking about the same thing. As a story develops its hashtags might tell the narrative. In fact, all of history can be reproduced in hashtags. For example:

  • #election2000

  • #getOutTheVote

  • #floridaResults

  • #recount

  • #canadianRealEstate

  • #manOnTheMoon

  • #nasa

  • #moonShot

  • #onTheMoooooon

  • #fakeFakeFake

  • #ImWithJulius

  • #Rubicon

  • #juliusRulz

  • #etTu

  • #ImWithAugustus

  • #dadsIdolsAreBroken

  • #monotheism

  • #covenant

  • #whatHappensInSodom

  • #YouveGotMyAttention

Send your #stories to me at, and you may (but won't) win entirely bogus prizes!

