Joho the Blog » open access

May 3, 2015

NPR frees up 800,000 stories

NPR has announced that it’s making 800,000 pieces of audio embeddable anywhere you want, including on this blog:

When you browse their site you’ll find an “embed” button to the right of a story’s “Play” button. Click ‘n’ paste. (And at the bottom of the widget that you embed you’ll see a tiny, gray copyright notice.)

Thank you, NPR.

1 Comment »

February 19, 2015

The joy of the public domain

When Doc Searls and I published our New Clues, we put it into the public domain. Even two months later, it feels good. In fact, seeing it reprinted in its entirety on someone else’s site fills me with an irrational exuberance.

Normally we would have put it under a Creative Commons BY license that entitles anyone to reuse it in whole or in part so long as they attribute it to us. CC BY is great. It takes the “#1. Ask permission” step out of the process by which what you write can be absorbed by your culture. Or anyone’s culture.

The public domain is different. A CC-BY license keeps a work copyrighted, but permits use without first asking permission. Works in the public domain are not copyrighted. Ok, so it’s more complex than that, but that’s basically it. A work in the public domain is like a folk song: you can sing it, you can change the words, you can record it and charge for the recording, you can print the lyrics on the front of your ice cream containers. You can even claim that you wrote it, although that would be wrong of you.

In practical terms, putting New Clues into the public domain [here’s how] really doesn’t do much that CC BY doesn’t do. Yes, someone could reprint our public domain document without crediting Doc and me, but they could do that with CC BY also — we’d have the right to insist that they provide attribution, but Doc and I are likely to use moral suasion in either case, by which I mean that we’d write a polite email to the evil doer. So, pragmatically, there isn’t much difference.

So why does putting it into the public domain make me happier? I get as close to smiling as my stony visage permits when I see a site that’s copied and pasted the whole thing. It makes it feel that what Doc and I wrote was really about what it says and less about what the writing says about Doc and me. The focus is where it should be.

And it feels deeply good to know that we have created something that can spread as far and deeply into the culture — and thus into people’s lives — as our culture wants. The only barriers are those of interest. And we’re not going to try to tease you with a snippet, with a taste. Not interested? Fine. It’s still there for anyone who is.

I expressed this to Peter Suber, who is dedicated full time to expanding the sphere and influence of Open Access works. Peter pointed out that my reaction rests in part on the privileged position I occupy: I can do some writing for free, and because Doc and I are known a bit within the domain of people who blab about the Internet, there’s a disincentive for people who might want to pass off our words as our own. If we were, say, unknown high school students it’d be easier for someone to get away with crudely plagiarizing our work. True enough.

Even so, putting work into the public domain feels good. I recommend you try it.


Peter Hirtle points out that Creative Commons 0 isn’t exactly the same as public domain, although functionally it’s identical. The whole question of trying to eliminate all copyright interests in a work is vexed. Peter points here for details and evidence of the complexity of the issue. Thanks, Peter!


February 2, 2015

Future of libraries, Kenya style

This video will remind you, if you happen to have forgotten, what libraries mean to much of the world:

Internet, mesh, people eager to learn, the same people eager to share. A future for libraries.

You can contribute here.

1 Comment »

November 24, 2014

[siu] Accessing content

Alex Hodgson of ReadCube is leading a panel called “Accessing Content: New Thinking and New Business Models or Accessing Research Literature” at the Shaking It Up conference.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Robert McGrath is from ReadCube, a platform for managing references. You import your pdfs, read them with their enhanced reader, and can annotate them and discover new content. You can click on references in the PDF and go directly to the sources. If you hit a pay wall, they provide a set of options, including a temporary “checkout” of the article for $6. Academic libraries can set up a fund to pay for such access.

Eric Hellman talks about Everyone in the book supply chain wants a percentage. But free e-books break the system because there are no percentages to take. “Even libraries hate free ebooks.” So, how do you give access to Oral Literature in Africain Africa? ran a campaign, raised money, and liberated it. How do you get free textbooks into circulation? Teachers don’t know what’s out there. is creating MARC records for these free books to make it easy for libraries to include the. The novel Zero Sum Game is a great book that the author put it out under a Creative Commons license, but how do you find out that it’s available? Likewise for Barbie: A Computer Engineer, which is a legal derivative of a much worse book. has over 1,000 creative commons licensed books in their collection. One of’s projects: an author pledges to make the book available for free after a revenue target has been met. [Great! A bit like the Library License project from the Harvard Library Innovation Lab. They’re now doing Thanks for Ungluing which aggregates free ebooks and lets you download them for free or pay the author for it. [Plug: John Sundman’s Biodigital is available there. You definitely should pay him for it. It’s worth it.]

Marge Avery, ex of MIT Press and now at MIT Library, says the traditional barriers sto access are price, time, and format. There are projects pushing on each of these. But she mainly wants to talk about format. “What does content want to be?” Academic authors often have research that won’t fit in the book. Univ presses are experimenting with shorter formats (MIT Press Bits), new content (Stanford Briefs), and publishing developing, unifinished content that will become a book (U of Minnesota). Cambridge Univ Press published The History Manifesto, created start to finish in four months and is available as Open Access as well as for a reasonable price; they’ve sold as many copies as free copies have been downloaded, which is great.

William Gunn of Mendeley talks about next-gen search. “Search doesn’t work.” Paul Kedrosky was looking for a dishwasher and all he found was spam. (Dishwashers, and how Google Eats Its Own Tail). Likewise, Jeff Atwood of StackExchange: “Trouble in the House of Google.” And we have the same problems in scholarly work. E.g., Google Scholar includes this as a scholarly work. Instead, we should be favoring push over pull, as at Mendeley. Use behavior analysis, etc. “There’s a lot of room for improvement” in search. He shows a Mendeley search. It auto-suggests keyword terms and then lets you facet.

Jenn Farthing talks about JSTOR’s “Register and Read” program. JSTOR has 150M content accesses per year, 9,000 institutions, 2,000 archival journals, 27,000 books. Register and Read: Free limited access for everyone. Piloted with 76 journals. Up to 3 free reads over a two week period. Now there are about 1,600 journals, and 2M users who have checked out 3.5M articles. (The journals are opted in to the program by their publishers.)


Q: What have you learned in the course of these projects?

ReadCube: UI counts. Tracking onsite behavior is therefore important. Iterate and track.

Marge: It’d be good to have more metrics outside of sales. The circ of the article is what’s really of importance to the scholar.

Mendeley: Even more attention to the social relationships among the contributors and readers.

JSTOR: You can’t search for only content that’s available to you through Read and Register. We’re adding that. started out as a crowdfunding platform for free books. We didn’t realize how broken the supply chain is. Putting a book on a Web site isn’t enough. If we were doing it again, we’d begin with what we’re doing now, Thanks for Ungluing, gathering all the free books we can find.

Q: How to make it easier for beginners?

Unglue .it: The publishing process is designed to prevent people from doing stuff with ebooks. That’s a big barrier to the adoption of ebooks.

ReadCube: Not every reader needs a reference manager, etc.

Q: Even beginning students need articles to interoperate.

Q: When ReadCube negotiates prices with publishers, how does it go?

ReadCube: In our pilots, we haven’t seen any decline in the PDF sales. Also, the cost per download in a site license is a different sort of thing than a $6/day cost. A site license remains the most cost-effective way of acquiring access, so what we’re doing doesn’t compete with those licenses.

Q: The problem with the pay model is that you can’t appraise the value of the article until you’ve paid. Many pay models don’t recognize that barrier.

ReadCube: All the publishers have agreed to first-page previews, often to seeing the diagrams. We also show a blurred out version of the pages that gives you a sense of the structure of the article. It remains a risk, of course.

Q: What’s your advice for large legacy publishers?

ReadCube: There’s a lot of room to explore different ways of brokering access — different potential payers, doing quick pilots, etc.

Mendeley: Make sure your revenue model is in line with your mission, as Geoff said in the opening session.

Marge: Distinguish the content from the container. People will pay for the container for convenience. People will pay for a book in Kindle format, while the content can be left open.

Mendeley: Reading a PDF is of human value, but computing across multiple articles is of emerging value. So we should be getting past the single reader business model.

JSTOR: Single article sales have not gone down because of Read and Register. They’re different users. Traditional publishers should cut their cost basis. They have fancy offices in expensive locations. They need to start thinking about how they can cut the cost of what they do.


[siu] Panel: Capturing the research lifecycle

It’s the first panel of the morning at Shaking It Up. Six men from six companies give brief overviews of their products. The session is led by Courtney Soderberg from the
Center for Open Science, which sounds great. [Six panelists means that I won’t be able to keep up. Or keep straight who is who, since there are no name plates. So, I’ll just distinguish them by referring to them as “Another White Guy,” ‘k?]

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Riffyn: “Manufacturing-grade quality in the R&D process.” This can easily double R&D productivity “because you stop missing those false negatives.” It starts with design

Github: “GitHub is a place where people do software development together.” 10M people. 15M software repositories. He points to Zenodo, a respository for research outputs. Open source communities are better at collaborating than most academic research communities are. The principles of open source can be applied to private projects as well. A key principle: everything has a URL. Also, the processes should be “lock-free” so they can be done in parallel and the decision about branching can be made later.

Texas Advanced Computing Center: Agave is a Science-as-a-Service platform. It’s a platform, that provides lots of services as well as APIs. “It’s SalesForce for science.”

CERN is partnering with GitHub. “GitHub meets Zenodo.” But it also exports the software into INSPIRE which links the paper with the software. [This
might be the INSPIRE he’s referring to. Sorry. I know I should know this.

Overleaf was inspired by etherpad, the collaborative editor. But Etherpad doesn’t do figures or equations. OverLeaf does that and much more.

Publiscize helps researchers translate their work into terms that a broader audience can understand. He sees three audiences: intradisciplinary, interdisciplinary, and the public. The site helps scientists create a version readable by the public, and helps them disseminate them through social networks.


Some white guys provided answers I couldn’t quite hear to questions I couldn’t hear. They all seem to favor openness, standards, users owning their own data, and interoperability.

[They turned on the PA, so now I can hear. Yay. I missed the first couple of questions.]

Github: Libraries have uploaded 100,000 open access books, all for free. “Expect the unexpected. That happens a lot.” “Academics have been among the most abusive of our platform…in the best possible way.”

Zenodo: The most unusual uses are the ones who want to instal a copy at their local institutions. “We’re happy to help them fork off Zenodo.”

Q: Where do you see physical libraries fitting in?

AWG: We keep track of some people’s libraries.

AWG: People sometimes accidentally delete their entire company’s repos. We can get it back for you easily if you do.

AWG: Zenodo works with Chris Erdmann at Harvard Library.

AWG: We work with FigShare and others.

AWG: We can provide standard templates for Overleaf so, for example, your grad students’ theses can be managed easily.

AWG: We don’t do anything particular with libraries, but libraries are great.

Courtney:We’re working with ARL on a shared notification system

Q: Mr. GitHub (Arfon Smith), you said in your comments that reproducibility is a workflow issue?

GitHub: You get reproducibility as a by-product of using tools like the ones represented on this panel. [The other panelists agree. Reproducibility should be just part of the infrastructure that you don’t have to think about.]


November 15, 2013

[2b2k] Big Data and the Commons

I’m at the Engaging Big Data 2013 conference put on by Senseable City Lab at MIT. After the morning’s opener by Noam Chomsky (!), I’m leading one of 12 concurrent sessions. I’m supposed to talk for 15-20 mins and then lead a discussion. Here’s a summary of what I’m planning on saying:

Overall point: To look at the end state of the knowledge network/Commons we want to get to

Big Data started as an Info Age concept: magnify the storage and put it on a network. But you can see how the Net is affecting it:

First, there are a set of values that are being transformed:
– From accuracy to scale
– From control to innovation
– From ownership to collaboration
– From order to meaning

Second, the Net is transforming knowledge, which is changing the role of Big Data
– From filtered to scaled
– From settled to unsettled and under discussion
– From orderly to messy
– From done in private to done in public
– From a set of stopping points to endless lilnks

If that’s roughly the case, then we can see a larger Net effect. The old Info Age hope (naive, yes, but it still shows up at times) was that we’d be able to create models that ultimate interoperate and provide an ever-increasing and ever-more detailed integrated model of the world. But in the new Commons, we recognize that not only won’t we ever derive a single model, there is tremendous strength in the diversity of models. This Commons then is enabled if:

  • All have access to all
  • There can be social engagement to further enrich our understanding
  • The conversations default to public

So, what can we do to get there? Maybe:

  • Build platforms and services
  • Support Open Access (and, as Lewis Hyde says, “beat the bounds” of the Commons regularly)
  • Support Linked Open Data

Questions if the discussion needs kickstarting:

  • What Big Data policies would help the Commons to flourish?
  • How can we improve the diversity of those who access and contribute to the Commons?
  • What are the personal and institutional hesitations that are hindering the further development of the Commons?
  • What role can and should Big Data play in knowledge-focused discussions? With participants who are not mathematically or statistically inclined?
  • Does anyone have experience with Linked Data? Tell us about it?


I just checked the agenda, which of course I should have done earlier, and discovered that of the 12 sessions today, 1211 are being led by men. Had I done that homework, I would not have accepted their invitation.


November 9, 2013

Aaron Swartz and the future of libraries

I was unable to go to our local Aaron Swartz Hackathon, one of twenty around the world, because I’d committed (very happily) to give the after dinner talk at the University of Rhode Island Graduate Library and Information Studies 50th anniversary gala last night.

The event brought together an amazing set of people, including Senator Jack Reed, the current and most recent presidents of the American Library Association, Joan Ress Reeves, 50 particularly distinguished alumni (out of the three thousand (!) who have been graduated), and many, many more. These are heroes of libraries. (My cousin’s daughter, Alison Courchesne, also got an award. Yay, Alison!)

Although I’d worked hard on my talk, I decided to open it differently. I won’t try to reproduce what I actually said because the adrenalin of speaking in front of a crowd, especially one as awesome as last night’s, wipes out whatever short term memory remains. But it went very roughly something like this:

It’s awesome to be in a room with teachers, professors, researchers, a provost, deans, and librarians: people who work to make the world better…not to mention the three thousand alumni who are too busy do-ing to be able to be here tonight.

But it makes me remember another do-er: Aaron Swartz, the champion of open access, open data, open metadata, open government, open everything. Maybe I’m thinking about Aaron tonight because today is his birthday.

When we talk about the future of libaries, I usually promote the idea of libraries as platforms — platforms that make openly available everything that libraries know: all the data, all the metadata, what the community is making of what they get from the library (privacy accommodated, of course), all the guidance and wisdom of librarians, all the content especially if we can ever fix the insane copyright laws. Everything. All accessible to anyone who wants to write an application that puts it to use.

And the reason for that is because in my heart I don’t think librarians are going to invent the future of libraries. It’s too big a job for any one group. It will take the world to invent the future of libraries. It will take 14 year olds like Aaron to invent the future of libraries. We need supply them with platforms that enable them.

I should add that I co-direct a Library Innovation Lab where we do work that I’m very proud of. So, of course libraries will participate in the invention of their future. But it’ll take the world — a world that contains people with the brilliance and commitment of an Aaron Swartz — to invent that future fully.


Here are wise words delivered at an Aaron Hackathon last night by Carl Malamud: Hacking Authority. For me, Carl is reminding us that the concept of hacking over-promises when the changes threaten large institutions that represent long-held values and assumptions. Change often requires the persistence and patience that Aaron exhibited, even as he hacked.


October 24, 2013


The Emily Dickinson archive went online today. It’s a big deal not only because of the richness of the collection, and the excellent technical work by the Berkman Center, but also because it is a good sign for Open Access. Amherst, one of the major contributors, had open accessed its Dickinson material earlier, and now the Harvard University Press has open accessed some of its most valuable material. Well done!

The collection makes available in one place the great Dickinson collections held by Amherst, Harvard, and others. The metadata for the items is (inevitably) inconsistent in terms of its quantity, but the system has been tuned so that items with less metadata are not systematically overwhelmed by its search engine.

The Berkman folks tell me that they’re going to develop an open API. That will be extra special cool.

Be the first to comment »

October 20, 2013

[templelib] Charles Watkinson: “The Library in the Digital Age”

At Temple University’s symposium in honor of the inauguration of the University’s new president, on Oct. 18, 2013.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Charles Watkinson is Director, Purdue Univ. Press. He says he wishes everyone were like Bryn [see prior post]. But univ. presses generally only receive 15% of their income from the university. So, Bryn’s model isn’t generally applicable.

His toddlers watch Dinosaur Train. “I know you perceive university presses as dinosaurs” but as in the show, some dinosaurs are different from others.

John Thompson in Books in the Digital Age talks about “publishing fields.” He says it’s complex but not without order. We’re seeing the emergence of several different mission-driven publishers: university presses, scholarly societies, library presses. He will talk about univ and library presses. (He points to Envisioning Emancipation as a univ. press at its best.) He goes through some of the similarities and differences between the two presses.

He takes as a case study the Purdue U Press and Purdue Scholarly Publishing Services as an example of how these types of presses can be complementary. (He mentions Anne Kenney’s partnering of Cornell Library with DukePurdue U Press on Project Euclid.)

The aim, Charles says, is to meet the full spectrum of needs, ranging from pre-print to published books. He points to the differences in brand styles of the two and how they can be merged.

So, “What can we do together that we couldn’t do apart?”

“We can serve campus needs better.” He points to the Journal of Purdue Undergraduate Research, which combines library skills (instruction, assessment, institutional outreach) with publisher skills (solicitation for content, project management, editing, design).

Also, together they can support disciplines. E.g., Habri Central Library skills: bibliographic research, taxonomy, metadata, licensing, preservation. Publisher skills: financial management, acquisition of original content, marketing.

Also, solve issues in the system. E.g., the underlying data behind tech reports, e.g., JTRP. Library skills: digitization, metadata, online hosting, linked data, preservation. Publisher skills: peer review administration, process redesign, project management.

Questions for these merged entities: What disciplines can best be served together? How to build credibility? How to turn projects into programs? What is the future role of earned revenues? Will all products be Open Access? What is the sustainability plan for OA?

Maybe libraries should turn to university presses for advice and help with engagement since “that’s what university presses do.”


Bryn Geffert: Libraries as publishers

At Temple University’s symposium in honor of the inauguration of the University’s new president, on Oct. 18, 2013.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Bryn Geffert is College Librarian at Amherst.

Imagine a biologist at Amherst who writes a science article. Who paid for her to write that article? Amherst. But who paid Amherst? Students. Alumni and donors. US funds.

Now it’s accepted by Elsevier. The biologist gives it to Elsevier as a gift, in effect. Elsevier charges Amherst $24,000/year for a subscription to this particular journal. It’s Looney Tunes, Bryn says. There isn’t a worse imaginable model.

Since 1986, serial [= journal] prices have increased 400%. Why? Because a few publishers have a monopoly: Wiley, Elsevier, Springer. With increasing prices for serials, libraries have less money for books. In 1986, academic libraries spent 46% of budgets on books. Now it’s down to 22%. And the effect on book publishers is even worse: when they can’t sell books to libraries, they shut down publishing in entire disciplinary fields. The average sales per academic book is now 200 copies. Since 1993, 5 disciplines have lost presses. E.g., the number of presses sserving British Lit have dropped by about half. More and more academic works are going to bad commercial presses — bad in that they don’t improve what they get.

These these are just the problems of wealthy institutions. How about the effect on developing countries? He gives three examples of work of direct relevance to local cultures where the local culture cannot afford to buy the work.

University presses are dying. Money to purchase anything except journals is dying. Academic presses are dying. And we’re paying no attention to the world around us.

Why does Amherst care? Their motto is “terras irradient”: light the world. But nothing in this model supports that model.

What do we have to do? He goes through these quickly because, he says, we are familiar with them:

  1. Open Access policies
  2. Legislation that mandates that federally supported research be Open Access
  3. Go after the monopolies that are violating anti-trust
  4. Libraries have to boycott offenders.

But even so, we need to design a new system.

Amherst is asking what the mission of a university press is. Part of it: make good work even better and make it as widely available as possible.

What is the mission of the academic libraries? Make good info as widely available as possible.

So, combine forces. U of Mich put its press under the library. This inspired Amherst. But Amherst doesn’t have a press. So, they’re creating one.

  • Everything will be online, Open Access (Creative Commons)

  • They will hustle to get manuscripts

  • All will be peer reviewed and rigorously edited

But how will they pay for it? Amherst’s Frost Library is giving two positions to the press. In return, those editors will solicit manuscripts. The President will raise money to endow a chair of the editor of the press. They’ll take some money from the Library to pay freelancers for copy-editing. Some other units at Amherst are kicking in other services, including design and building an online platform.

People say this is too small to make a difference. But other schools are starting to do similar things. This means that Amherst is a recipient of free content from them. Bryn can imagine a time when there’s so much OA content that the savings realized offset the costs of publishing OA content.

The goal is to move away from individual presses looking out for their own interests to one in which there’s free sharing. “I want to see a world in which the students at a university in Nairobi have access to the same information as students at Columbia.”


Next Page »

Creative Commons License
Joho the Blog by David Weinberger is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.

Creative Commons license: Share it freely, but attribute it to me, and don't use it commercially without my permission.

Joho the Blog gratefully uses WordPress blogging software.
Thank you, WordPress!