June 15, 2013

[2b2k][eim] My Stuttgart syllabus

I’ve just finished leading two days of workshops at University of Stuttgart as part of my fellowship at the Internazionales Zentrum für Kultur- und Technikforschung. (No, I taught in English.) This was for me a wonderful experience. First of all, the students were engaged, smart, talked from diverse standpoints, and fun. Second, it reminded me how to teach. I had so much trouble trying to structure sessions, feeling totally unsure how one does so. But the eight 1.5 hour sessions reminded me why I loved teaching.

For my own memory, here are the sessions (and if any of you were there and took notes, I’d love to see them):


#1 Cyberutopianism, technodeterminism, and Internet exceptionalism defined, with JP Barlow’s Declaration of the Independent of Cyberspace as an example. Class introductions.

#2 Information Age to Age of Connected. Why Ted Nelson’s Xanadu did not succeed the way the Web did. Rough technical architecture of the Net and (perhaps) its embedded political values. Hyperlinks.

#3 Digital order. Everything is miscellaneous? From information Retrieval to search engines. Schema-based databases to tagging.

#4 Networked knowledge. What knowledge looks like once it’s been freed of paper. Four challenges to networked knowledge (with many more added by the students.)

On Saturday we talked about topics that the students decided were interesting:

#1 Mobile net. Is Facebook making us more or less social? Why do we fill up every interstice by using Facebook on mobiles? What does this say about us and the notion of the self?

#2 Downloading. Do you download music illegally? What is your justification? How might artists respond? Why is the term “intellectual property” so loaded?

#3 Education. What makes a great in-person course? What makes for a miserable one? Oddly, many of the characteristics of miserable classes are also characteristics of MOOCs. What might we do about that? How much of this is caused by the fact that MOOCs are construed as courses in the traditional sense?

#4 Internet culture. Is there such a thing? If there are many, is any particular one to be privileged? How does the Net look to a culture that is dedicated to warding off what it says as corrupting influences? End with LolCatBible and the astounding TheJohnnyCashProject

Thank you, students. This experience meant a great deal to me.


May 17, 2013

Lobby for FaceBook, Yahoo, NewsCorp and Elsevier opposes the White House Open Access order, among others

Peter Suber points out that FaceBook, Yahoo, Elsevier and Yahoo have joined the lobby that has issued a clarion call against open access that blurs the line between lies and gibberish. Peter blows the statements apart, leaving nothing but clean air and a whiff of ozone. is publicizing its monthly “iAWFUL” (Internet advocates watchlist for ugly laws) list of policies that it doesn’t like. The list has little to do with advocating for the Internet, and everything to do with supporting the interests of Internet businesses (“committed to tearing down barriers to e-commerce”). For example, this month’s iAWFUL list includes data breach notification bills and a CT bill that “would force publishers to sell digital books at ‘reasonable” prices to state libraries.” That’s in addition to opposing actions (including the recent epochal White House Memorandum) that support public access to research — often research that the public has paid for. But they have it all bollixed up.

What makes it more distressing, then, is that reputable journals, including Computerworld, CIO and PC World, are running NetChoice’s iAWFUL PR puffery.

Thankfully, Peter Suber is on the case.


April 9, 2013

Elsevier acquires Mendeley + all the data about what you read, share, and highlight

I liked the Mendeley guys. Their product is terrific — read your scientific articles, annotate them, be guided by the reading behaviors of millions of other people. I’d met with them several times over the years about whether our LibraryCloud project (still very active but undergoing revisions) could get access to the incredibly rich metadata Mendeley gathers. I also appreciated Mendeley’s internal conflict about the urge to openness and the need to run a business. They were making reasonable decisions, I thought. At they very least they felt bad about the tension :)

Thus I was deeply disappointed by their acquisition by Elsevier. We could have a fun contest to come up with the company we would least trust with detailed data about what we’re reading and what we’re attending to in what we’re reading, and maybe Elsevier wouldn’t win. But Elsevier would be up there. The idea of my reading behaviors adding economic value to a company making huge profits by locking scholarship behind increasingly expensive paywalls is, in a word, repugnant.

In tweets back and forth with Mendeley’s William Gunn [twitter: mrgunn], he assures us that Mendeley won’t become “evil” so long as he is there. I do not doubt Bill’s intentions. But there is no more perilous position than standing between Elsevier and profits.

I seriously have no interest in judging the Mendeley folks. I still like them, and who am I to judge? If someone offered me $45M (the minimum estimate that I’ve seen) for a company I built from nothing, and especially if the acquiring company assured me that it would preserve the values of that company, I might well take the money. My judgment is actually on myself. My faith in the ability of well-intentioned private companies to withstand the brute force of money has been shaken. After all this time, I was foolish to have believed otherwise.

MrGunn tweets: “We don’t expect you to be joyous, just to give us a chance to show you what we can do.” Fair enough. I would be thrilled to be wrong. Unfortunately, the real question is not what Mendeley will do, but what Elsevier will do. And in that I have much less faith.


I’ve been getting the Twitter handles of Mendeley and Elsevier wrong. Ack. The right ones: @Mendeley_com and @ElsevierScience. Sorry!


April 1, 2013

Podcast about the DPLA’s status and its relation to public libraries

The latest podcast in the Digital Campus series focuses solely on the current state of the Digital Public Library of America. The discussion includes Dan Cohen who has just accepted the position of Executive Director of the DPLA, which is just wonderful news. Not only does he have a rare combination of skills and experiences — ever hear of Zotero, hmm? — but he is also — and there’s no other way of putting this — nice.

Also on the podcast is Nicholas Carr, who wrote an excellent, skeptical (or at least questioning) article for MIT Tech Review on the DPLA a year ago. Also, Mills Kelly and Tom Scheinfeldt. And me.

Dan explains what the DPLA is. Nick wonders if if the DPLA will hurt public libraries. I try to explain why I think it won’t. Amanda suggests the DPLA is the Mr. Potato Head of libraries. I thought it was a good discussion.

March 28, 2013

[annotations][2b2k] Rob Sanderson on annotating digitized medieval manuscripts

Rob Sanderson [twitter:@azaroth42] of Los Alamos is talking about annotating Medieval manuscripts.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

He says many Medieval manuscripts are being digitized. The Mellon Foundation is funding many such projects. But these have tended to reinvent the same tech, and have not been designed for interoperability with other projects. So the Digital Medieval Initiative was founded, with a long list of prestigious partners. They thought about what they’d like: distributed, linked data, interoperable, etc. For this they need a shared description format.

The traditional approach is annotate an image of a page. But it can be very difficult to know which images to annotate; he gives as an example a page that has fold-outs. “The naive assuption is that an image equals a page.” But there may be fragments, or only portions of the page have been digitized (e.g., the illuminations), etc. There may be multiple images on a page, revealed by multi-spectral imaging. There may be multiple orientations of the page, etc.

The solution? The canvas paradigm. A canvas is an empty space corresponding to the rectangle (or whatever) of the page. You allow rich resources to be associated with it, and allow users to comment. For this, they use Open Annotation. You can specify a choice of images. You can associate text with an area of the canvas. There are lots of different ways to visualize those comments: overlays, side-by-side, etc.

You can build hybrid pages. For example, and old scan might have a new color scan of its illustrations pointing at it. Or you could have a recorded performance of a piece of music pointing at the musical notation.

In summary, the SharedCanvas model uses open standards (HTML 5, Open Annotation, TEI, etc.) and can be implement distributed across reporsitories, encouraging engagement by domain experts.

March 16, 2013

What The New Yorker doesn’t say about Aaron

I first read Larissa MacFarquhar’s New Yorker article on Aaron Swartz too quickly. But it doesn’t skim well. I found that encouraging.

I finally sat down to read it thoroughly a couple of days ago, and liked it very much. It’s beautifully written. More important, she does not have an hypothesis to bolster or an explanation to flog. She begins and ends with long quotes from the people around Aaron, without commenting on them. She is not arguing that he killed himself because he was clinically depressed — with the political subtext that comes from that reading — and she is not arguing that he killed himself because grownups overly burdened him, or even because of prosecutorial overreach. Larissa lets Aaron, his friends, and his family speak for themselves.

You come out of the piece with the idea that Aaron was complex, and that life wasn’t easy for him. The writer of the article’s subhead over-simplifies this into the sort of simple Theory of Aaron that the article itself avoids: “Aaron Swartz was brilliant and beloved. But the people who knew him best saw a darker side.” The article is more about complexity than darkness. It ties Aaron’s path from idea to cause to his moral commitments and to his many-cornered personality. Given the impossibility of capturing any human life in words, it does a good job.

But in the days after reading it, I’ve been bothered by something that Larissa leaves out. You don’t come out of it with a sense of what Aaron accomplished or of the impact of those accomplishments. I understand that Larissa was not attempting to write the definitive biography, and that she was more interested in exploring Aaron’s character in the context of those who loved him. But I’m afraid that a reader who comes to her article without knowing what Aaron actually did will leave with the impression that Aaron was too feckless and inconstant to translate his passions into achievement.

If Larissa decides to turn her article into a book, it will be important to bring readers to understand the maturity of Aaron’s achievements. Without that, a portrait of Aaron — no matter how open and beautiful — is necessarily misleading.


March 6, 2013

[2b2k] Cliff Lynch on preserving the ever-expanding scholarly record

Cliff Lynch is giving talk this morning to the extended Harvard Library community on information stewardship. Cliff leads the Coalition for Networked Information, a project of the Association of Research Libraries and Educause, that is “concerned with the intelligent uses of information technology and networked information to enhance scholarship and intellectual life.” Cliff is helping the Harvard Library with the formulation of a set of information stewardship principles. Originally he was working with IT and the Harvard Library on principles, services, and initial projects related to digital information management. Given that his draft set of principles are broader than digital asset management, Cliff has been asked to address the larger community (says Mary Lee Kennedy).

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Cliff begins by saying that the principles he’s drafted are for discussion; how they apply to any particular institution is always a policy issue, with resource implications, that needs to be discussed. He says he’ll walk us through these principles, beginning with some concepts that underpin them.

When it comes to information stewardship, “university community” should include grad students whose research materials the university supports and maintains. Undergrads, too, to some extent. The presence of a medical school here also extends and smudges the boundaries.

Cliff then raises the policy question of the relation of the alumni to the university. There are practical reasons to keep the alumni involved, but particularly for grads of the professional schools, access to materials can be crucial.

He says he uses “scholarly record” for human-created things that convey scholarly ideas across time and space: books, journals, audio, web sites, etc. “This is getting more complicated and more diverse as time goes on.” E.g., author’s software can be part of that record. And there is a growing set of data, experimental records, etc., that are becoming part of the scholarly record.

Research libraries need to be concerned about things that support scholarship but are not usually considered part of the historical record. E.g., newspapers, popular novels, movies. These give insight into the scholarly work. There are also datasets that are part of the evidentiary record, e.g., data about the Earth gathered from sensors. “It’s so hard to figure out when enough is enough.” But as more of it goes digital, it requires new strategies for acquisition, curation and access. “What are the analogs of historical newspapers for the 21st century?” he asks. They are likely to be databases from corporations that may merge and die and that have “variable and often haphazard policies about how they maintain those databases.” We need to be thinking about how to ensure that data’s continued availability.

Provision of access: Part of that is being able to discover things. This shouldn’t require knowing which Harvard-specific access mechanism to come to. “We need to take a broad view of access” so that things can be found through the “key discovery mechanisms of the day,” beyond the institution’s. (He namechecks the Digital Public Library of America.)

And access isn’t just for “the relatively low-bandwidth human reader.” [API’s, platforms and linked data, etc., I assume.]

Maintaining a record of the scholarly work that the community does is a core mission of the university. So, he says, in his report he’s used the vocabulary of obligation; that is for discussion.

The 5 principles

1. The scholarly output of the community should be captured, preserved, organized, and made accessible. This should include the evidence that underlies that output. E.g., the experimental data that underlies a paper should be preserved. This takes us beyond digital data to things like specimens and cell lines, and requires including museums and other partners. (Congress is beginning to delve into this, Cliff notes, especially with regard to preserving the evidence that enables experiments to be replicated.)

The university is not alone in addressing these needs.

2. A university has the obligation to provide its community with the best possible access to the overall scholarly record. This is something to be done in partnership with research libraries aaround the world. But Harvard has a “leadership role to play.”

Here we need to think about providing alumni with continued access to the scholarly record. We train students and then send them out into the world and cut off their access. “In many cases, they’re just out of luck. There seems to be something really wrong there.”

Beyond the scholarly record, there are issues about providing access to the cultural record and sources. No institution alone can do this. “There’s a rich set of partnerships” to be formed. It used to be easier to get that cultural record by buying it from book jobbers, DVD suppliers, etc. Now it’s data with differing license terms and subscription limitations. A lot out of it’s out on the public Web. “We’re all hoping that the Internet Archive will do a good job,” but most of our institutions of higher learning aren’t contributing to that effort. Some research libraries are creating interesting partnerships with faculty, collecting particular parts of the Web in support of particular research interests. “Those are signposts toward a future where the engagement to collect and preserve the cultural records scholar need is going to get much more complex” and require much more positive outreach by libraries, and much more discussion with the community (and the faculty in particular) about which elements are going to be important to preserve.

“Absolutely the desirable thing is share these collections broadly,” as broadly as possible.

3. “The time has come to recognize that good stewardship means creating digital records of physical objects” in order to preserve them and make them accessible. They should be stored away from the physical objects.

4. A lot goes on here in addition to faculty research. People come through putting on performances, talks, colloquia. “You need a strategy to preserve these and get them out there.”

“The stakes are getting much higher” when it comes to archives. The materials are not just papers and graphs. They include old computers and storage materials, “a microcosm of all of the horrible consumer recording technology of the 20th century,” e.g., 8mm film, Sony Betamax, etc.

We also need to think about what to archive of the classroom. We don’t have to capture every calculus discussion section, but you want to get enough to give a sense of what went on in the courses. The documentation of teaching and learning is undergoing a tremendous change. The new classroom tech and MOOCs are creating lots of data, much of it personally identifiable. “Most institutions have little or no policies around who gets to see it, how long they keep it, what sort of informed consent they need from students.” It’s important data and very sensitive data. Policy and stewardship discussions are need. There are also record management issues.

5. We know that scholarly communication is…being transformed (not as fast as some of us would like â?? online scientific journals often look like paper versions) by the affordances of digital technology. “Create an ongoing partnership with the community and with other institutions to extend and broaden the way scholarly communication happens. The institutional role is terribly important in this. We need to find the balances between innovation and sustainability.


Q: Providing alumni with remote access is expensive. Harvard has about 100,000 living alumni, which includes people who spent one semester here. What sort of obligation does a university have to someone who, for example, spent a single semester here?

A: It’s something to be worked out. You can define alumnus as someone who has gotten a degree. You may ask for a co-payment. At some institutions, active members of the alumni association get some level of access. Also, grads of different schools may get access to different materials. Also, the most expensive items are typically those for which there are a commercial market. For example, professional grade resources for the financial industry probably won’t allow licensing to alumni because it would cannibalize their market. On the other hand, it’s probably not expensive to make JSTOR available to alumni.

Q: [robert darnton] Very helpful. We’re working on all 5 principles at Harvard. But there is a fundamental problem: we have to advance simultaneously on the digital and analog fronts. More printed books are published each year, and the output of the digital increases even faster. The pressures on our budget are enormous. What do you recommend as a strategy? And do you think Harvard has a special responsibility since our library is so much bigger, except for the Library of Congress? Smaller lilbraries can rely on Hathi etc. to acquire works.

A: “Those are really tough questions.” [audience laughs] It’s a large task but a finite one. Calculating how much money would take an institution how far “is a really good opportunity for fund raising.” Put in place measures that talk about the percentage of the collection that’s available, rather than a raw number of images. But, we are in a bad situation: continuing growth of traditional media (e.g., books), enormous expansion of digital resources. “My sense is…that for Harvard to be able to navigate this, it’s going to have to get more interdependent with other research libraries.” It’s ironic, because Harvard has been willing to shoulder enormous responsibility, and so has become a resource for other libraries. “It’s made life easier for a lot of the other research libraries” because they know Harvard will cover around the margins. “I’m afraid you may have to do that a little more for your scholars, and we are going to see more interdependence in the system. It’s unavoidable given the scope of the challenge.” “You need to be able to demonstrate that by becoming more interdependent, you’re getting more back than you’re giving up.” It’s a hard core problem, and “the institutional traditions make the challenge here unique.”

February 17, 2013

DPLA does metadata right

The Digital Public Library of America‘s policy on metadata was discussed during the recent board of directors call, and the DPLA is, in my opinion, getting it exactly and admirably right. (See Infodocket for links.) The metadata that the DPLA aggregates will be openly available and in the public domain. But just so there won’t be any doubt or confusion, the policy begins by saying that it does not believe that most metadata is subject to copyright in the first place. Then, to make sure, it adds:

To the extent that the DPLA’s own contributions to selecting and arranging such metadata may be protected by copyright, the DPLA dedicates such contributions to the public domain pursuant to a CC0 license.

And then, clearly and plainly:

Given the purposes of the policy and the copyright status of the metadata, and pursuant to the DPLA’s terms of service, the DPLA ‘s users are free to harvest, collect, modify, and/or otherwise use any metadata contained in the DPLA.



January 27, 2013

Alfred Russel Wallace’s letters go online, with a very buried CC license that maybe doesn’t apply anyway

The letters of Lord Alfred Russel Wallace, co-discoverer of the theory of evolution by natural selection, are now online. As the Alfred Russel Wallace Correspondence Project explains, the collection consists of 4,000 letters gathered from about 100 different institutions, with about half in the British Natural History Museum and British Library.

The Correspondence Project has, admirably, been releasing the scans without waiting for transcription; more faster is better! Predictably annoyingly, the letters, written by a man who died ten years before the Perpetual Copyright date of 1923, seem to be (but are they?) carefully obstructed by copyright: The Natural History Museum, which houses the collection, asserts copyright over “data held in the Wallace Letters Online database (including letter summaries)” [pdf oddly unreadable in Mac Preview]. Beyond the summaries, exactly what data is this referring to? Not sure. Don’t know.

But that isn’t the full story anyway, for the NHM sends us to the Wallace Fund for more information about the copyright. That page tells us that the unpublished letters are copyrighted until 2039, with this very helpful footnote:

Unless the work was published with the permission of his Literary Estate before 1 August 1989, in which case the work will be in copyright for 70 years after Wallace’s death, unless he died more than 20 years before the work’s publication, in which case copyright would expire 50 years after publication.


Eventually it gets to some good news:

Authors wishing to publish such works would ordinarily need to obtain permission from the copyright holder before doing so. However, on July 31st 2011, in an attempt to facilitate the scholarly study of ARW’s writings, the co-executors of ARW’s Literary Estate agreed to allow third parties to publish ARW’s copyright works non-commercially without first having to ask the Literary Estate for permission, under the terms and conditions of Creative Commons license “Attribution-NonCommercial-ShareAlike 3.0 Unported”

So, are the letters published on the NHM site actually available under a Creative Commons non-commercial license? The Wallace Fund that aggregated them seems to think so. The NHM that published them maybe thinks not.

Because copyright is just so magical.


TWO HOURS LATER: Please see the first comment, from George Beccaloni, Director of the Wallace Correspondence Project. Thanks, George.

He explains that the transcribed text is available under a Creative Commons non-commercial license, but the digitized images are not. Plus some further complications, such as the content of the database being under copyright, although it is not clear from the site what data that is.

Since the aim of CC is to make it easier for people to re-use material, may I suggest (in the friendliest of fashions) that this be prominently clarified on the sites themselves?


January 15, 2013

Why we mourn

CNN asked me to write 600-800 words about Aaron Swartz. I demurred at first, suggested some other people who knew Aaron better — I met Aaron when he was young, stayed in touch, had the occasional meal with him, admired him and loved him more than he knew — and agreed when CNN came back to me.

I have trepidation about what I wrote, which CNN has now posted. I don’t like the implication that we can sum up any life so glibly. But I also wanted to do a little to nudge attention from Aaron solely as a champion of open information. I also decided not to assess the blame that is so well deserved, because that’s well discussed already.

A handful of better sources and expressions:

There’s so much more, because no life can be told.

Here is Aaron in his own words, in a presentation at the Freedom to Connect conference last May.

And here is Larry Lessig on Democracy Now:


