Joho the Blog » libraries

June 13, 2013

[eim][misc] Tagging rises

Both Facebook and Apple have announced the use of tags. Yay!

Tags have continued to percolate through the ecosystem after their most auspicious introduction in Delicious.com. (Note the phrase “most auspicious”; tags have always been with us.) It’s great to see them increase both because they are a great way to get use out of the craziness while preserving it in its original form for others, and because there is great value in scaling tags, as Flickr has shown.

So, yay for tags. And yay for the crazy.

Be the first to comment »

May 20, 2013

[misc] The loneliness of the long distance ISBN

NOTE on May 23: OCLC has posted corrected numbers. I’ve corrected them in the post below; the changes are mainly fractional. So you can ignore the note immediately below.

NOTE a couple of hours later: OCLC has discovered a problem with the analysis. So please ignore the following post until further notice. Apologies from the management.

Ever since the 1960s, publishers have used ISBN numbers as identifiers of editions of books. Since the world needs unique ways to refer to unique books, you would think that ISBN would be a splendid solution. Sometimes and in some instances it is. But there are problems, highlighted in the latest analysis run by OCLC on its database of almost 300 million records.

Number of ISBNs

Percentage of the records

0

77.71%

2

18.77%

1

1.25%

4

1.44%

3

0.21%

6

0.14%

8

0.04%

5

0.02%

10

0.02%

12

0.01%

So, 78% of the OCLC’s humungous collection of books records have no ISBN, and only 1.6% have the single ISBN that God intended.

As Roy Tennant [twitter: royTennant] of OCLC points out (and thanks to Roy for providing these numbers), many works in this collection of records pre-date the 1960s. Even so, the books with multiple ISBNs reflect the weakness of ISBNs as unique identifiers. ISBNs are essentially SKUs to identify a product. The assigning of ISBNs is left up to publishers, and they assign a new one whenever they need to track a book as an inventory item. This does not always match how the public thinks about books. When you want to refer to, say, Moby-Dick, you probably aren’t distinguishing between one with illustrations, a large-print edition, and one with an introduction by the Deadliest Catch guys. But publishers need to make those distinctions, and that’s who ISBN is intended to serve.

This reflects the more general problem that books are complex objects, and we don’t have settled ways of sorting out all the varieties allowed within the concept of the “same book.” Same book? I doubt it!

Still, these numbers from OCLC exhibit more confusion within the ISBN number space than I’d expected.

MINUTES LATER: Folks on a mailing list are wondering if the very high percentage of records with two ISBNs is due to the introduction of 13-digit ISBNs to supplement the initial 10-digit ones.

1 Comment »

May 15, 2013

<no_sarcasm>Lucky me</no_sarcasm>

I had a lovely time at the University of Toronto Faculty of Information yesterday afternoon. About twenty of us talked for two hours about library innovation. It reminded me: how much I like hanging out with librarians; how eager people are to invent, collaborate, and play; how lucky I am to work in an open space for innovation (the Harvard Library Innovation Lab) with such a talented, creative group; how much I love Toronto.

1 Comment »

April 18, 2013

[misc] StackLife goes live – visually browse millions of books

I’m very proud to announce that the Harvard Library Innovation Lab (which I co-direct) has launched what we think is a useful and appealing way to browse books at scale. This is timed to coincide with the launch today of the Digital Public Library of America. (Congrats, DPLA!!!)

StackLife (nee ShelfLife) shows you a visualization of books on a scrollable shelf, which we turn sideways so you can read the spines. It always shows you books in a context, on the ground that no book stands alone. You can shift the context instantly, so that you can (for example) see a work on a shelf with all the other books classified under any of the categories professional cataloguers have assigned to it.

We also heatmap the books according to various usage metrics (“StackScore”), so you can get a sense of the work’s community relevance.

There are lots more features, and lots more to come.

We’ve released two versions today.

StackLife DPLA mashes up the books in the Digital Public Library of America’s collection (from the Biodiversity Heritage Library) with books from The Internet Archive‘s Open Library and the Hathi Trust. These are all online, accessible books, so you can just click and read them. There are 1.7M in the StackLife DPLA metacollection. (Development was funded in part by a Sprint grant from the DPLA. Thank you, DPLA!)

StackLife Harvard lets you browse the 12.3M books and other items in the Harvard Library systems 73 libraries and off-campus repository. This is much less about reading online (unfortunately) than about researching what’s available.

Here are some links:

StackLife DPLA: http://stacklife-dpla.law.harvard.edu
StackLife Harvard: http://stacklife.law.harvard.edu
The DPLA press release: http://library.harvard.edu/stacklife-browse-read-digital
The DPLA version FAQ: http://stacklife-dpla.law.harvard.edu/#faq/

The StackLife team has worked long and hard on this. We’re pretty durn proud:

Annie Cain
Paul Deschner
Kim Dulin
Jeff Goldenson
Matthew Phillips
Caleb Troughton

4 Comments »

March 6, 2013

[2b2k] Cliff Lynch on preserving the ever-expanding scholarly record

Cliff Lynch is giving talk this morning to the extended Harvard Library community on information stewardship. Cliff leads the Coalition for Networked Information, a project of the Association of Research Libraries and Educause, that is “concerned with the intelligent uses of information technology and networked information to enhance scholarship and intellectual life.” Cliff is helping the Harvard Library with the formulation of a set of information stewardship principles. Originally he was working with IT and the Harvard Library on principles, services, and initial projects related to digital information management. Given that his draft set of principles are broader than digital asset management, Cliff has been asked to address the larger community (says Mary Lee Kennedy).

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Cliff begins by saying that the principles he’s drafted are for discussion; how they apply to any particular institution is always a policy issue, with resource implications, that needs to be discussed. He says he’ll walk us through these principles, beginning with some concepts that underpin them.

When it comes to information stewardship, “university community” should include grad students whose research materials the university supports and maintains. Undergrads, too, to some extent. The presence of a medical school here also extends and smudges the boundaries.

Cliff then raises the policy question of the relation of the alumni to the university. There are practical reasons to keep the alumni involved, but particularly for grads of the professional schools, access to materials can be crucial.

He says he uses “scholarly record” for human-created things that convey scholarly ideas across time and space: books, journals, audio, web sites, etc. “This is getting more complicated and more diverse as time goes on.” E.g., author’s software can be part of that record. And there is a growing set of data, experimental records, etc., that are becoming part of the scholarly record.

Research libraries need to be concerned about things that support scholarship but are not usually considered part of the historical record. E.g., newspapers, popular novels, movies. These give insight into the scholarly work. There are also datasets that are part of the evidentiary record, e.g., data about the Earth gathered from sensors. “It’s so hard to figure out when enough is enough.” But as more of it goes digital, it requires new strategies for acquisition, curation and access. “What are the analogs of historical newspapers for the 21st century?” he asks. They are likely to be databases from corporations that may merge and die and that have “variable and often haphazard policies about how they maintain those databases.” We need to be thinking about how to ensure that data’s continued availability.

Provision of access: Part of that is being able to discover things. This shouldn’t require knowing which Harvard-specific access mechanism to come to. “We need to take a broad view of access” so that things can be found through the “key discovery mechanisms of the day,” beyond the institution’s. (He namechecks the Digital Public Library of America.)

And access isn’t just for “the relatively low-bandwidth human reader.” [API's, platforms and linked data, etc., I assume.]

Maintaining a record of the scholarly work that the community does is a core mission of the university. So, he says, in his report he’s used the vocabulary of obligation; that is for discussion.

The 5 principles

1. The scholarly output of the community should be captured, preserved, organized, and made accessible. This should include the evidence that underlies that output. E.g., the experimental data that underlies a paper should be preserved. This takes us beyond digital data to things like specimens and cell lines, and requires including museums and other partners. (Congress is beginning to delve into this, Cliff notes, especially with regard to preserving the evidence that enables experiments to be replicated.)

The university is not alone in addressing these needs.

2. A university has the obligation to provide its community with the best possible access to the overall scholarly record. This is something to be done in partnership with research libraries aaround the world. But Harvard has a “leadership role to play.”

Here we need to think about providing alumni with continued access to the scholarly record. We train students and then send them out into the world and cut off their access. “In many cases, they’re just out of luck. There seems to be something really wrong there.”

Beyond the scholarly record, there are issues about providing access to the cultural record and sources. No institution alone can do this. “There’s a rich set of partnerships” to be formed. It used to be easier to get that cultural record by buying it from book jobbers, DVD suppliers, etc. Now it’s data with differing license terms and subscription limitations. A lot out of it’s out on the public Web. “We’re all hoping that the Internet Archive will do a good job,” but most of our institutions of higher learning aren’t contributing to that effort. Some research libraries are creating interesting partnerships with faculty, collecting particular parts of the Web in support of particular research interests. “Those are signposts toward a future where the engagement to collect and preserve the cultural records scholar need is going to get much more complex” and require much more positive outreach by libraries, and much more discussion with the community (and the faculty in particular) about which elements are going to be important to preserve.

“Absolutely the desirable thing is share these collections broadly,” as broadly as possible.

3. “The time has come to recognize that good stewardship means creating digital records of physical objects” in order to preserve them and make them accessible. They should be stored away from the physical objects.

4. A lot goes on here in addition to faculty research. People come through putting on performances, talks, colloquia. “You need a strategy to preserve these and get them out there.”

“The stakes are getting much higher” when it comes to archives. The materials are not just papers and graphs. They include old computers and storage materials, “a microcosm of all of the horrible consumer recording technology of the 20th century,” e.g., 8mm film, Sony Betamax, etc.

We also need to think about what to archive of the classroom. We don’t have to capture every calculus discussion section, but you want to get enough to give a sense of what went on in the courses. The documentation of teaching and learning is undergoing a tremendous change. The new classroom tech and MOOCs are creating lots of data, much of it personally identifiable. “Most institutions have little or no policies around who gets to see it, how long they keep it, what sort of informed consent they need from students.” It’s important data and very sensitive data. Policy and stewardship discussions are need. There are also record management issues.

5. We know that scholarly communication is…being transformed (not as fast as some of us would like â?? online scientific journals often look like paper versions) by the affordances of digital technology. “Create an ongoing partnership with the community and with other institutions to extend and broaden the way scholarly communication happens. The institutional role is terribly important in this. We need to find the balances between innovation and sustainability.

Q&A

Q: Providing alumni with remote access is expensive. Harvard has about 100,000 living alumni, which includes people who spent one semester here. What sort of obligation does a university have to someone who, for example, spent a single semester here?

A: It’s something to be worked out. You can define alumnus as someone who has gotten a degree. You may ask for a co-payment. At some institutions, active members of the alumni association get some level of access. Also, grads of different schools may get access to different materials. Also, the most expensive items are typically those for which there are a commercial market. For example, professional grade resources for the financial industry probably won’t allow licensing to alumni because it would cannibalize their market. On the other hand, it’s probably not expensive to make JSTOR available to alumni.

Q: [robert darnton] Very helpful. We’re working on all 5 principles at Harvard. But there is a fundamental problem: we have to advance simultaneously on the digital and analog fronts. More printed books are published each year, and the output of the digital increases even faster. The pressures on our budget are enormous. What do you recommend as a strategy? And do you think Harvard has a special responsibility since our library is so much bigger, except for the Library of Congress? Smaller lilbraries can rely on Hathi etc. to acquire works.

A: “Those are really tough questions.” [audience laughs] It’s a large task but a finite one. Calculating how much money would take an institution how far “is a really good opportunity for fund raising.” Put in place measures that talk about the percentage of the collection that’s available, rather than a raw number of images. But, we are in a bad situation: continuing growth of traditional media (e.g., books), enormous expansion of digital resources. “My sense is…that for Harvard to be able to navigate this, it’s going to have to get more interdependent with other research libraries.” It’s ironic, because Harvard has been willing to shoulder enormous responsibility, and so has become a resource for other libraries. “It’s made life easier for a lot of the other research libraries” because they know Harvard will cover around the margins. “I’m afraid you may have to do that a little more for your scholars, and we are going to see more interdependence in the system. It’s unavoidable given the scope of the challenge.” “You need to be able to demonstrate that by becoming more interdependent, you’re getting more back than you’re giving up.” It’s a hard core problem, and “the institutional traditions make the challenge here unique.”

Be the first to comment »

February 17, 2013

DPLA does metadata right

The Digital Public Library of America‘s policy on metadata was discussed during the recent board of directors call, and the DPLA is, in my opinion, getting it exactly and admirably right. (See Infodocket for links.) The metadata that the DPLA aggregates will be openly available and in the public domain. But just so there won’t be any doubt or confusion, the policy begins by saying that it does not believe that most metadata is subject to copyright in the first place. Then, to make sure, it adds:

To the extent that the DPLA’s own contributions to selecting and arranging such metadata may be protected by copyright, the DPLA dedicates such contributions to the public domain pursuant to a CC0 license.

And then, clearly and plainly:

Given the purposes of the policy and the copyright status of the metadata, and pursuant to the DPLA’s terms of service, the DPLA ‘s users are free to harvest, collect, modify, and/or otherwise use any metadata contained in the DPLA.

Nice!

2 Comments »

December 19, 2012

DPLA looking for an executive director

The Digital Public Library of America is looking for an executive director. This is an incredible opportunity to make a difference.

I think it’d be fantastic if this person were to come out of the large, community-based Web collaboration space, but there are many other ways for the DPLA to go right. The search committee is pretty fabulous, so I have confidence that this is going to be an amazing hire.

The DPLA team gave a presentation at Berkman yesterday, and has been showing some initial work, including a collaboration with Europeana and wireframes of a front page. It’s looking very good for the April launch date.

Our little group, the Harvard Library Innovation Lab, is working on a visual browser for books within the DPLA collection, so we’re pretty excited.

Be the first to comment »

September 5, 2012

[2b2k] Library as platform

Library Journal just posted my article “Library as Platform.” It’s likely to show up in their print version in October.

It argues that there are reasons why libraries ought to think of themselves not as portals but as open platforms that give access to all the information and metadata they can, through human readable and computer readable forms.

1 Comment »

July 30, 2012

E-book licensing by libraries: an overview

The Berkman Center’s David O’Brien, Urs Gasser, and John Palfrey have just posted a 29-page “briefing paper” on the various models and licenses by which libraries are providing access to e-books.

It’s not just facts ‘n’ stats by any means, but here are some anyway:

“According to the 2011 Library Journal E-Book Survey, 82% of libraries currently offer access to e-books, which reflects an increase of 10 percentage points from 2010. … Libraries maintain an average of 4,350 e-book copies in a collection.”

“[T]he publisher-to-library market across all formats and all libraries (e.g., private, public, governmental, academic, research, etc.) is approximately $1.9B; of this, the market for public libraries is approximately $850M”

92% of libraries use OverDrive as their e-book dealer

Of the major publishers, only Random House allows unrestricted lending of e-books.

I found the section on business models to be particularly clarifying.

Be the first to comment »

July 24, 2012

[preserve] Lightning Talks

A series of 5-min lightning talks.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Christie Moffatt of the National Library of Medicine talks about a project collecting blogs talking about health. It began in 2011. The aim is to understand Web archiving processes and how this could be expanded. Three examples: Wheelchair Kamikaze. Butter Compartment. Doctor David’s Blog. They were able to capture them pretty well, but with links to outside, outside of scope content, and content protected by passwords, there’s a question about what it means to “capture” a blog. The project has shown the importance of test crawls, and attending to the scope, crawling frequency and duration. The big question is which blogs they capture. Doctors who cook? Surgeons who quilt? Other issues: Permissions. Monitoring when the blogs end, change focus, or move to a new url. E.g., a doctor retired and his blog changed focus to about fishing.

Terry Plum from Simmons GSLIS talks about a digital curriculum lab. It was set up to pull in students and faculty around a few different areas. They maintain a collection of open source applications for archives, museums, and digital libraries. There are a variety of teaching aids. The DCL is built into a Cultural Heritage Informatics track at Simmons.

Daniel Krech of Library of Congress works at the Repository Development Center. The RDC works with people managing collections. The RDC works on human-machine interfaces. One project involves “sets” (collections). “We’ve come up with some new and interesting ways to think about data.” They use knot, set, and hyper theory, but they also sometimes use a physical instantiation of a set — it looks like knotted yarn — to help understand some very abstract ideas.

Kelsey [Keley?]Shepherd of Amherst represents the Five College Digital Task Force. (She begins by denying that the Scooby Gang was based on the five colleges.) They don’t share a digital library but want to collaborate on digital preservation. They are creating shared guidelines for preservation-ready digital objects. They are exploring models for funding and organizational structure. And they are collaborating on implementing a trusted digital perservation repository. But each develops its own digital preservation policy.

Jefferson Baily talks about Personal Digital Archiving at the Library of Congress. He talks about the source diary for The Widwife’s Tale. That diary sat on a shelf for 200 years before being discovered as an invaluable window on the past. Often these archives are the responsibility of the record creators. The LoC therefore wants to support community archives, enthusiasts, and citizen archivists. They are out and about, promoting this. See digitalpreservation.gov

Carol Minton Morris with DuraSpace and the NDSA (National Digital Stewardship Alliance) talks about funding archiving through “hip pocket resources.” They’re looking into Kickstarter.com. Technology and publishing projects at Kickstarter have only raised $9M out of the $100M raised there; most of it goes to the arts. She points to some other microfinance sites, including IndieGoGo and DonorsChoose.org. She encourages the audience to look into microfinancing.

Kristopher Nelson from LoC Office of Strategic Initiatives talks about the National Digitial Stewardship Residency, which aims at building a community of professionals who will advance digital archiving. It wants to bridge classroom education and professional experience, and some real world experience. It will start in June 2013 with 10 residents participating in the 9 month program.

Moryma Aydelott, program specialist at LoC talks about Tackling Tangible Metadata. The LoC’s digital data is on lots of media: 300T on everything from DVDs to DAT tapes and Zip disks. Her group provides a generic workflow for dealing with this stuff — any division, any medium. They have a wheeling cart for getting at this data. They make the data available “as is.” It can be hard to figure out what type of file it is, and what application is needed to read it. Right now, it’s about getting it on the server. They’ve done about 6.5T of material, 700-800 titles, so far. But the big step forward is in training and in documenting processes.

Be the first to comment »

Next Page »