October 21, 2011

[dpla] First session

Moderated by John Palfrey.

Deanna Marcum of the Library of Congress says the LC has 148M objects and has digitized 28M of them. [I may have gotten that last number wrong. Sorry.] The LC wants to make these resources as available as possible. “That is what brings us to the table of the DPLA. It seems to be the type of organization that will help us fulfill our mission in a very important way.” [Tying the DPLA to the LC's mission is a big deal.]

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Deanna says that from the beginning, the Librarian of Congress, James Billington, has asked how the LC can serve libraries better. The answer consistently has been: We want your content where we are, not where you are. This was pre-Net, so they looked to CD ROMs, digitizing collections starting in the early 1990s, beginning with materials useful to K-12. They checked in with the 44 pilots and were amazed to find it was useful all the way down to third grade students were making “incredibly innovative” uses of this digital content. In 1995, Congress said that it’d match private funds 1:3 (Congress pays $1 for each 3 raised) for digitization efforts. The LC began to think about what in its collections should be digitized first. Sloan funded digitizing of public domain works. Those efforts continue.

Susan Hildreth is director of the Institute of Museum and Library Services, a federal grant-making agency. She wonders what resources already exist that the DPLa can use, and which resources need to be created. This is vital the IMLS’ contribution to the effort. The IMLS already has invested heavily in digitization projects. Also: metadata collection and cleanup programs. Also: training librarians. Also: conversations on these topics. So, there are already digitized items, best practices and policies, etc. for digital collections. Also, IMLS has reports of 20 years of international discussions about what digital libraries can be. And, some lessons learned: 1. Collaboration is key to long success in digitization. 2. The traditional relation between info providers and consumers is changing. 3. Digital libraries can reduce administrative costs, although we’re just at the beginning of this.

Also, Susan says we should learn some lessons from the IMLS: Support interoperability and the preservation of digital resources. Make it sustainable. Find new ways to measure the impact. Ultimately how will this make a difference to the person going on the Web to find information? The IMLS can be strategic in the DPLA’s efforts. [We like the "strategic" commitment.]

John Palfrey reinforces her statement about the excitement this is generating among librarian students.

David Ferriero, the Archivist [coolest title ever], talks. He comes to this position after heading the NY Public Library. He explains that the National Archives is the nation’s record keeper. For all federal agencies, and “courtesy preservation” for Congress. It began only in 1935. The records go back to the Continental Congress, and include White House tweets. 12B pages of textual records. Billions of electronic records, which is the fastest growing area. 8M emails from Reagan, 200+|M from the GW Bush era. And, as Bush tells David, “Not one of those is mine.” He wants every item in the Archives to be online. He remembers discussions with librarians in which they worried about how to get students to use paper. “Get over it.”

The massive amount of material they have has made the Archives “rather creative” in getting out. E.g., the Citizen Archivist program to give opportunities to the people to help digitize and process records. Docs Teach is online, loaded with lesson plans, etc.

When he was at NYPL, they worked with Google to digitize 1M works, and David saw how it has transforms scholarship. In Dec. 2009, Pres. Obama signed a declassification order requiring the Archives to review and declassify. They’ve gone through 1M pages and have release 91% to the public shelves. The CIA “finally caved on the oldest secret documents” — German docs on creating secret ink. This happened because the Archivist staff used Google Books to discover that the ink formulas had been published in 1931.

Q: Accessibility and findability? Not enough to simply put things online.
A: Deanna: It’s important. But you’re looking at three people who don’t know how to do this.
A: David: Josh Greenberg taught me that we should talk about where the people are and get our stuff out there. That’s why we use Youtube and Flickr. It’s a problem for the Archives because our records are so large and complex. Plus, kids today can’t read cursive. So we’re going to be creating ways for the public to help us transcribe cursive docs.
A: Susan: It’s a broad issue, including making our materials available to those with disabilities, in multiple languages, etc. IMLS is interested in supporting platforms for effective discovery.
A: David: Serendipity is important.

Q: Director of the Smithsonian Institutional Libraries: We also are very interested in participating in the DPLA with our 137M objects (although 124M are natural history specimens, so how many mosquitoes do we want in the DPLA?). But we have 6.4M digitized objects and are in a unique position to pull in museum, library and archive objects. We’re eager to continue to cooperate.

Q: Are there mechanisms in place to avoid reverse engineering of CIA documents.
A: The Archivist does not have the authority to release. We just facilite the process.
Q: Are you going to do more?
A: We’ve done a million. There are 400M to go. We have a deadline in 2013. I hosted a meeting about the priorities and the room was evenly split between releasing the JFK assassination docs and UFOs.

Q: [British Library] One of the real challenges is the difference between a digital library and a wonderful but confusing random set of resources. Public-private partnerships are essential. And we have just opened up all our metadata on a CCEuro license. No one can know what this will be used for, and that is its value. Also, there’s a challenge finding and developing modern librarians/curators.

Q: John Mayer: Imagine it’s 2016 and all your collections have been digitized. How does society improve once that’s in place. What’s the sf scenario of the DPLA?
A: Deanna: If we assume benefit from having access to info resources — better decisions, better understanding where they come from and where they’re going, unerstand world cultures better — we want to make these resources available any way they want. That’s what librarians have always dreamed about and we finally have a mechanism for doing that. American citizens have paid for these resources with their tax dollars.
A: David: Better informed citizenry. Hold our government accountable. Understand our future by learning from our history.
A: Susan: If all is digitized, what happens to our physical facilities. By providing all that info, it will create a greater need and desire for people to work together, in the virtual and real worlds. It’s a very exciting and liberating future. And if we have all that data, we have to have strong connectivity to our homes, schools, libraries…

Q: Bob Darnton: Many of the questions have been testimonials. Wonderful! We rejected the name “National Digital Library” because there’s nothing national about it. Getting bigger means getting more international, and that is certainly going to happen. The national library director of France has expressed support. So has Europeana. This support is a movement that goes back to the international Republic of Letters. We’re getting the feeling we can make real a dream at the founding of this country.

[It is so ineffable cool and inspiriting to have these great institutions sharing a stage and a vision.]

[dpla] DPLA plenary

I’m at what is in effect the public launch of the Digital Public Library of America — “in effect” because the DPLA has been open to all from the beginning. But today we’re in the theater of the National ARchives and have just been greeted by the Archivist of the United States, David Ferriero.

I spent yesterday at the “workstream” meetings of the DPLA. The openness of the DPLA has meant that there has been no moment at which all have agreed on precisely what the DPLA should be. Yesterday could have been a day that had people walking apart from one another or walking toward a center as yet to be fully located. It was a day of walking toward that emergent center. Given the continuing significant differences in the group, my sense that the convergence was enabled by a shared sense of the value of what we could build, by shared interests and backgrounds (a bunch of librarians and admirers of librarians), and by the carefully crafting of the day’s events and processes. (That last goes to the credit of the Berkman Center.)

I am very excited. (I’m also at maximum stress because I am giving a 8.5 minute demo this afternoon…talking to a screencast I did in my hotel room last night, leaving no room for temporal variance. You can see the live prototype here.)

Doron Weber of the Sloane Foundation is now briefly recounting the history of the DPLA, which started with a workshop a year ago. Doron today announced the beginning of a “two year grass roots effort” to build the DPLA. The DPLA is intended to be a platform for discovering our rich shared cultural heritage he says (approximately). He sketches a very broad agenda, including discovering collections, building them, partnering with other nations, sharing metadata, and exploring doing some form of collective licensing of in-copyright material. (Excellent. I personally don’t want this to become the Digital Public Library of Jane Austen.)

Doron announces that Sloane and Arcadia are each contributing $2.5M to support the DPLA over the next 18 months. Woohoo! Peter Baldwin from Arcadia gives a gracious short talk.

October 18, 2011

[berkman] Yochai Benkler on his new book

Yochai Benkler is giving a talk about his new and wonderful book, The Penguin and the Leviathan. (I interviewed him about it here.)

Yochai begins by pointing to Occupy Wall Street as teaching us much about cooperation and collaboration.

On Oct. 23, 2008, Alan Greenspan acknowledge to Rep. Henry Waxman that his model of the world was wrong. “I made a mistake in presuming that the self interest of organizations…was such that they were best capable of protecting their own shareholders.” We live in a world built around a mistaken model of human motivation, Yochai says. The basic error is not that we are sometimes self-interested, for we are. The mistake is thinking we could build our systems assuming that we are more or less uniformly self-interested. We’ve built systems that try to get incentives right, or that try to get punishment right. But now scientific selfishness has retreated, and we should model our systems on this new knowledge.

In 1968 Gary Becker said that we could model crime by thinking it of a pay-off model: the benefits of the crime vs. the cost of the penalty. So, we get Three Strikes laws. In another domain, the Jenson and Murphy paper on incentive pay for top management assumes that every level of the enterprise will try to shirk and put more in their pockets, so (the theory goes) you should increase the stock options at the top. But that hasn’t worked very well for companies in terms of return to stockholders; you get misalignment from this model. This model is like Becker’s: it’s about getting the incentives and penalties right. Yochai tells of a mother trying to get her three year old into a car by threatening to take five cents off the child’s allowance. “This model penetrates everywhere,” he says.

This intellectual arc is everywhere. Evolutionary biology has moved from group selection to selfish gene through kin altruism and direct reciprocity. Economics also: strong assumptions of self-interest. Political theory, from Downs, to Olson, to Hardin: all assume the inability to come together on a shared set of goals. Management science and organizational sociology: From Taylor to Weber to Schumpeter through Williamson. Although there are counter narratives in each of these fields, selfishness is the dominant model.

And yet on line we see how easily we cooperate. “Things that shouldn’t have worked, have worked.” He draws a 2×2: market based and non-market based vs. decentralized and centralized. In each, there have been huge successes of social production. This is in fact a new solution space.

In each of the aforementioned disciplines, there is now a development of more complex models that take account of cooperation. E.g., evolution: indirect reciprocity; cooperation emerges much more easily in the new models. Economics: shift to experimental and modeling away from self-interest, and the development of neuroeconomics. Political: Eleanor Ostrom on the commons. Management science: Work on team production and networks; high commitment, high-performance organizations.

The core insight of all of these fields is that the model of uniform self-interest is inadequate. Then there’s debate.

Yochai compares Dawkins in The Selfish Gene (1976) and Martin Nowak (2006). Dawkins says we are born selfish. Nowak says: “Perhaps the most remarkable aspect of evolution is its ability to generate cooperation in a competitive world.” It’s an old debate, Yochai says, citing Kropotkin vs. Spencer vs. Boaz vs. Margaret Mead. The debate is now swinging toward Kropotkin, e.g., neural research that shows empathy via brain scans: a partner’s brain lights up in the same way when s/he sees the other person undergoing pain. He points to the effect of oxytocin on trust, and for the first time in Berkman history makes a reference to monogamous voles.

Why does this matter, Yochai asks. He refers to an experiment by Lee Ross et al. Take a standard Prisoner’s dilemma. All predictions say that everyone should defect. Take the same game and give it to American students, Israeli fighter pilots, etc., and told them either “You’re going to play the Community Game” or “The Wall Street Game.” The former 70% opened cooperatively and kept cooperating through the 7 rounds. The latter opened at 30% cooperative. The 30% in the Community Game represent a significant segment that has to be dealt with in a cooperative system. But there’s a big middle that will one or the other depending on what they understand their context to be. So, concludes Yochai, it’s important to design systems that lets the middle understand the system as cooperative.

So, we move from tough on crime to community policing. That changes all sorts of systems, including technical, organizational, institutional, and social. Community policing has been widely adopted because it’s generally successful. We see that we have success with actual practices that depend not on reward and punishment and monitoring, but on coperation. We’re finding out about this online, but it’s not happening just online.

Yochai says that he’s just at the beginning of an investigation about this. There’s a limit to how much we can get out of evolution, he says. It’s hard to design systems on the basis of evolution. Instead, we see a lot of work across many different systems.

But we still want to know: Won’t money help? The answer is what’s called “crowding out.” We care about material interests, but we also care about fairness. We have emotional needs. We have social motivations. What if these interests don’t align? The Titmuss-Arrow debate 1970/1 about the motivations for donating blood. A 2008 study (Mellstrom and Johannsesson) paid people money to give blood. When you allow them to give the money away, it increased the number of people who gave blood. Adding money can suppress an activity more than it increases it. That’s crowding out. It’s not uniform in the population. Designing systems is much harder than coming up with a material reward that appeals to people’s self-interest. We do not have full answers here

Think of cooperative human systems in three vectors. 1. Conceptual: from rationality as univeral self-interest to diversity of motivations. 2. Design: Cooperative human systems designed on behaviorally realistic, evidence-based design. Politics: We cannot separate out incentives from fairness, ethics, empathy, solidarity.

Yochai points to a number of factors, but focuses on fairness: of outcomes, of intentions, and of processes.

Outcomes: What counts as fair is different in different cultures, especially when you move outside of market economies. In market societies, 50:50 is the norm for fairness. Once it gets to 30:70, people will walk away. But you can change that if you change the framing, e.g., “You got lucky.” But there is no single theory of justice. Yochai looks at a study of the cement trucking industry. It turns out that there are large pay disparities. They also differ in what they say they pay for: performance, or equally time. They don’t always do what they say, though. But when you look at real performance measures, you have fewer accident and out of service events if the company is accurate in what it says, no matter what it says.

We don’t have an agreed upon theory of justice, he says. This explains the 99% vs. 53% debate around the Occupy Wall Street. This is a debate over basic moral commitments without which a system cannot function. There is no way to resolve it either through neutral principles or by efficiency arguments.

Intentions also matter to fairness. When you Where bad intentions excluded (e.g., it was just a roll of the dice), then there’s much less negative reciprocity.

Processes: Tyler (2003) showed that procedural justice correlated with internalized compliance. Yochai points to the militarization of the police as they deal with the OWS. The image projected to the crowd is one of lack of regard for process. He compares this to a massive demonstration of Israel in which the police stood a good distance away, and a different relationship was fostered.

We can see a revival of the “sharing nicely” idea we teach our children. In science. In business. Science is beginning to push back against the assumption of selfishness. It turns out that we aren’t universally self-interested. Different people respond differently, and each person responds differently in different contexts.

We need a new field of cooperative human systems design that accounts for the diversity of motivation, and that takes seriously the issue of “crowding out”: adding incentives can result in worse outcomes.

And, Yochai concludes, we need a renewed view of our shared humanity.

Q: Fascinating. But: The passage from evolution to the social sciences has long been discredited. Also, it’s too simple to say that the solution to the banking problem is that we need more cooperation. The banks are supported by a set of interests bigger than that.
A: You say sociobiology has been discredited. That’s true of the early to mid 1980s but is no longer a good description. The social sciences and anthro have been moving to evolutionary models. Economics too. What was in the 1980s was resolved, now, especially in the social sciences, is unresolved. Second, sure, bankers self-select and control the system. The real answer is that it’s a lot of work. When you have a system optimized for money, and money is the social signal, it self-selects for people driven by that. We need long-term interventions to increase cooperation. E.g., the person who can work with Open Source at, say, IBM, is different than the person who can work her/his way up a hierarchy; the company therefore has to train itself to value those who cooperate.

Q: I just went through MIT’s tutorial that instructed me how my ideas would be licensed. I said that maybe there should information in your office about how to contribute more openly. How do systematize open, collaborative forms across the entire educational system?
A: Lots of people in this room are working on this problem in different ways. We fight, we argue, we persuade. Look at university open access publication. We use our power within the hierarchy of universities to raise a flag and to say we can do it a new way. That allows the next person to use us as an example. After I released Wealth of Networks for free on the Web, I got emails from all sorts of people wanting to know how to negotiate that deal for themselves. Universities should be easy.

Q: What are the burning policy implications of this shift in the way we rule the world? What would you change first?
A: I should note that I don’t address that in the book. We need an assessment of community policing and the big board [?] approach. The basic question is whether we continue to build a society based on maximizing total group, or one that trades off some growth for a more equitable distribution of outcomes. The point is much broader than open access, patent, copyright, etc. The deregulatory governance model is based on an erroneous model of interests. But all of my work is done on the micro level, not the level of organizations. But we know that the idea that musicians need the payoffs afforded by infinite copyright is false; we have empirically data about that. So there are places where the relation between the micro interests and institutional interventions is tight. But I don’t talk about that much in the book.

Q: I’ve looked at pay inequality in Japan and the US. The last thing that matters to the level of compliance with regulations is the gap between CEO and workers. The deterrents are very effective in the US, explaining [couldn't hear it]. Compliance is much better in the US because the penalties are effective deterrents.
A: First, once you’re talking about the behavior of an organization, we don’t have the same kind of data on what happens within a corporate decision. When people see themselves as agents, there can be conflicts between the individual and the organization. For that you need external enforcement.
Q: Jail time makes a huge difference.
A: Then how do you explain the findings that amount of tax options predicts probability of tax fraud. Same baseline enforcement, but whether you had stock options predicts tax fraud. Adding money and punishment certainly has an effect on behavior. But it depends on whether that intervention has better effects than other interventions. But we only have a little bit of data.

Q: If a high school principal came to you who serves many interests and types of people, how could your ideas influence her or him?
A: My mother founded two schools and a volunteer organization. The lessons are relatively straightforward: Higher degrees of authority and trust, structure with clearly set goals, teamwork, less hierarchical distance between students and teachers, less high-stress testing.


Chris Poole is so right about identity

Jon Mitchell at ReadWriteWeb reports on a ten-minute talk Chris Poole (founder of 4chan and Canvas) gave at Web 2.0. Chris argues that Facebook and Google are getting identity wrong. “Identity is prismatic.”

Being confined to a single identity on the Web is like a wiki accepting only a single final draft, only far more tragic.

October 17, 2011

[2b2k] Why this article?

An possible explanation of the observation of neutrinos traveling faster than light has been posted at by Ronald van Elburg. I of course don’t have any of the conceptual apparatus to be able to judge that explanation, but I’m curious about why, among all the explanations, this is one I’ve now heard about it.

In a properly working knowledge ecology, the most plausible explanations would garner the most attention, because to come to light an article would have to pass through competent filters. In the new ecology, it may well be that what gets the most attention are articles that appeal to our lizard brains in various ways: they make overly-bold claims, they over-simplify, they confirm prior beliefs, they are more comprehensible to lay people than are ideas that require more training to understand, they have an interesting backstory (“Ashton Kutcher tweets a new neutrino explanation!”)…

By now we are all familiar with the critique of the old idea of a “properly working knowledge ecology”: Its filters were too narrow and were prone to preferring that which was intellectually and culturally familiar. There is a strong case to be made that a more robust ecology is wilder in its differences and disagreements. Nevertheless, it seems to me to be clearly true (i.e., I’m not going to present any evidence to support the following) that to our lizard brains the Internet is a flat rock warmed by a bright sun.

But that is hardly the end of the story. The Internet isn’t one ecology. It’s a messy cascade of intersecting environents. Indeed, the ecology metaphor doesn’t suffice, because each of us pins together our own Net environments by choosing which links to click on, which to bookmark, and which to pass along to our friends. So, I came across the possible neutrino explanation at Metafilter, which I was reading embedded within Netvibes, a feed aggregator that I use as my morning newspaper. A comment at Metafilter pointed to the top comment at Reddit’s AskScience forum on the article, which I turned to because on this sort of question I often find Reddit comment threads helpful. (I also had a meta-interest in how articles circulate.) If you despise Reddit, you would have skipped the Metafilter comment’s referral to that site, but you might well hae pursued a different trail of links.

If we take the circulation of Ronald van Elburg’s article as an example, what do we learn? Well, not much because it’s only one example. Nevertheless, I think it at least helps make clear just how complex our “media environment” has become, and some of the effects it has on knowledge and authority.

First, we don’t yet know how ideas achieve status as centers of mainstream contention. Is von Elburg’s article attaining the sort of reliable, referenceable position that provides a common ground for science? It was published at Arxiv, which lets any scientist with an academic affiliation post articles at any stage of readiness. On the other hand, among the thousands of articles posted every day, the Physics Arxiv blog at Technology Review blogged about this one. (Even who’s blogging about what where is complex!) If over time von Elburg’s article is cited in mainstream journals, then, yes, it will count as having vaulted the wall that separates the wannabes from the contenders. But, to what extent are articles not published in the prestigious journals capable of being established as touchpoints within a discipline? More important, to what extent does the ecology still center around controversies about which every competent expert is supposed to be informed? How many tentpoles are there in the Big Tent? Is there a Big Tent any more?

Second, as far as I know, we don’t yet have a reliable understanding of the mechanics of the spread of ideas, much less an understanding of how those mechanics relate to the worth of ideas. So, we know that high-traffic sites boost awareness of the ideas they publish, and we know that the mainstream media remain quite influential in either the creation or the amplification of ideas. We know that some community-driven sites (Reddit, 4chan) are extraordinarily effective at creating and driving memes. We also know that a word from Oprah used to move truckloads of books. But if you look past the ability of big sites to set bonfires, we don’t yet understand how the smoke insinuates its way through the forest. And there’s a good chance we will never understand it very fully because the Net’s ecology is chaotic.

Third, I would like to say that it’s all too complex and imbued with value beliefs to be able to decide if the new knowledge ecology is a good thing. I’d like to be perceived as fair and balanced. But the truth is that every time I try to balance the scales, I realize I’ve put my thumb on the side of traditional knowledge to give it heft it doesn’t deserve. Yes, the new chaotic ecology contains more untruths and lies than ever, and they can form a self-referential web that leaves no room for truth or light. At the same time, I’m sitting at breakfast deciding to explore some discussions of relativity by wiping the butter off my finger and clicking a mouse button. The discussions include some raging morons, but also some incredibly smart and insightful strangers, some with credentials and some who prefer not to say. That’s what happens when a population actually engages with its culture. To me, that engagement itself is more valuable than the aggregate sum of stupidity it allows.

(Yes, I know I’m having some metaphor problems. Take that as an indication of the unsettled nature of our thought. Or of bad writing.)

[2b2k] Bookbinding and the Digital Bible

Avi Solomon at BoingBoing has a terrific interview with Michael Greer about the appeal of bookbinding, and about Michael’s “Digital Bible.”

I love the photo:

Digital Bible: Book with ones and zeroes as text

October 13, 2011

Berkman Center applications

The Berkman Center is accepting applications for fellowships. Good luck!


October 11, 2011

[2b2k] Retraction system creaking under the load

According to a post at Nature by Richard Van Noorden, the rate of retracted scientific articles is growing far faster than the rate of published or posted articles. No one is sure why, but it is exposing inconsistencies in policies for dealing with retracted articles.

Suggested reforms include better systems for linking papers to their retraction notices or revisions, more responsibility on the part of journal editors and, most of all, greater transparency and clarity about mistakes in research.

It’s encouraging that it’s taken as obvious that the proper response is links and transparency. Gotta love science.


Classifying folktales

Via Metafilter:

The Aarne-Thompson Classification System

Originally published by Finnish forkloristAntti Aarne and expanded by American Stith Thompson and German Hans-Jörg Uther, the Aarne-Thompson Classification System is a system for classifying folktales based on motifs.

Some Examples:
Beauty and the Beast: Type 425C
Bluebeard: 312
The Devil Building a Bridge: Type 1191
The Foolish Use of Magic Wishes Type 750A
Hansel and Gretel and other abandoned children: Type 327
Women forced to marry hogs: Type 441
The Runaway Pancake: Type 2025
Wikipedia has a complete breakdown and here has examples of most of the tale types.


October 10, 2011

Erik Martin on what makes Reddit special

Erik Martin, the general manager of Reddit, explains what’s so special about the discussion site. I’m particularly interested in the nature of authority on the site, and its introduction of new journalistic rhetorical forms.


