Joho the Blog » libraries

July 9, 2014

Request for app: Annotation inhaler

During this seemingly-endless interregnum when we have e-books that suck at letting us take notes, I buy paper books when I’m doing research. I have a complex little application I’ve endlessly developed over the years that lets me type notes into a plain text editor or OPML-based outliner using a minimal markup. The app turns the notes into a database that I can then slice ‘n’ dice. Someday I’ll get it stable and done enough to publish. And that day is never.

A couple of years ago I wrote a Chrome extension (“Kindle Highlights Exporter”) that scrapes all of the passages you’ve highlighted with your Kindle, exporting them as a csv, xml, or json file. The only problem is that I seem to be the only person it works for. More precisely, it crashed for the only person I ever showed it to, my supersmart developer nephew. It still works for me, though. If you want (yet another) chance to laugh at me, feel free to download it and install it. Suckers.

So, how about if someone were to write some software that lets me import photographs of the pages of a book that I’ve highlighted in, say, yellow. The app finds the highlighted portions of each page, looks for the page number, does the requisite OCR, and returns a well-marked-up set of those annotations. (These days, outputting in the Open Annotation standard, as well as the usual suspects, would be extra cool.) That way, when I’m done with a book, I could snap images of all the pages with highlights and get a list at the end, instead of doing what I do now: type them in as I read.

I’d give it a try, but processing images is waaay beyond my hobbyist-programmer capabilities. As for the possible copyright violation: OH FOR HEAVENS SAKE WHAT THE HELL IS WRONG WITH US? (Note: The previous sentence should not be construed as legal advice.)

In any case, as the digital/networked world continues to develop its superpowers, the mud wall that confines the physical becomes more and more aggravating.

3 Comments »

June 29, 2014

[aif] Re-imagining public libraries

I’m at an early Sunday morning (7:45am) session on re-imagining libraries with John Palfrey of the DPLA, Brian Bannon (Commissioner of the Chicago Public Library), and Tessie Guillermo (Zero Divide) . It’s moderated by Sommer Mathis (editor of CityLab.com. My seat-mate tells me that many of the people here are from the local library and its board.The audience is overwhelmingly female.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

SM: Libraries are being more used even though people can download books. Are libraries shifting away from being book collections to becoming community centers?

BB: Our missions are so much bigger than our traditional format for distributing knowledge. Over the past 144 years of Chicago Library’s history, we’ve been innovating all along. How many 144-year-old institations are experiencing record-breaking use?

TG: I was on the Aspen Institute’s sessions on libraries that wrote a report around three pillars of libraries: People, place and platform. Platforms are emerging now. They’re gathering up networks of people that can join together to continue to add value.

JP: I agree with Brian’s historical take and Tessie’s theoretical. I’m not as sanguine, though. Libraries are more important in digital age, but support for libraries could erode. Turning into community centers is risky for libraries. A community center is an open space that can be anything, but libraries are specific: in access to knowledge, in what immigrants need to find their way into a new country, to people seeking jobs [and more]. And all these are bound to the specifics of the community.

BB: When I think of community space, we’re Chicago’s largest provider of access to open, free technology, helping new economies, etc., but we do it through the lens of the library. It starts with the idea that everyone should have free and open acess to the leading ideas of the day. We think about our communities and how we can support these aspirations in very specific ways.

TG: Libraries are central to ideas that are shared across communities. We work with Web Junction as a content pusher to libraries around the enrolment of people in the Affordable Care Act. First, people need to enroll and can go to the library for the computer access. They need insurance literacy. Once you choose your plan and see a doctor, you might find out about a health problem, and you can come back to the library to get info curated for you, and then find out where to get community services. All at the library.

JP: Libraries should be the center of communities, but not be community centers.

BB: People are reading more today and in lots of different formats, and libraries have been great conveners of those conversations. On the other side, as the world of information changes, we’ve been experimenting with learning through experience. We wanted to explore the importance of manufacturing to the city. We opened up a lab that exposed people to ideas that would have been hard to understand simply through print.

JP: I saw your awesome innovation lab. Will you have 3D printers in there perpetually or always have the latest tech?

BB: It was supposed to be a 6 month experiment that’s been extended. We do not believe that Chicago Public Library [CPL] should be the city’s hub for 3D printing. We’re now starting to do experiments in data visualization to help people understand Big Data.

TG: It’s hard to talk about the future of libraries without talking about what places in the future will be like. Zoos, museums, etc., are all changing. There will be a lot of experimentation about how residents and community members organize themselves. At yesterday’s Market Future there was a lot of joking about librarians and the sense was that you can only get recommendations through algorithms. [Ack. That was my session. See this Atlantic post, and my comment there..]

Q: Atlanta libraries are helping people complete GEDs and LA libraries are going one step further and are granting HS diplomas. What innovating programming are you hearing about?

JP: Libraries helping people complete GEDs makes total sense. I like the model where libraries are connecting to learning — connected learning like at CPL. A lot of the learning that kids do is interstitial on mobile devices, and libraries can help with that. Hybrid spaces that connect what’s going on online to the real world is a great model.

TG: The use of libraries is increasing but not always the funding. Libraries have to find new sources of revenue…

SM: … not just revenues but being able to quantify the vaue they bring. JP: CPL has led in this.

BB: We worked with Mission Measurements to do that. We looked at the core mission of the library. We’re about supporting democracy but also helping to make our city competitive. So we looked at how we’re supporting the local economy.

BB: We don’t always recognize that there’s a large portion of the world, and parts of Chicago, where people have limited or no access to tech. So we are experimenting with ways to bring the Internet home. We’re launching a program that will let you checkout laptops and a hotspot. But that’s less about the tech than about the support to understand what programs are out there to sustain it and to gain the skills they need.

Q: Both CPL and NYPL won the Knight News Challenge to enable them to do this.

BB: We’ll be lending them for a three week term. NYPL is lending for months. It’s an experiment. But it’s not just about shiny objects. CPL has been acknowledged for experiments, for R&D. The buzz is important to elevating your brand.

JP: There will have to be trade-offs. Maybe libraries will have to spend less on books, on the marginal acquisitions, in order to support these hardware lending programs. That’s controversial but we have to talk about the trade-offs.

BB: Our model for sharing knowledge is changing dramatically because of the law. Our ability to lend physical books vs. digital materials …

JP: In the physical realm we have the right of first sale that lets you do whatever you want with a book, including resell it or lend it. But for digital there’s no first sale. Libraries acquire the digital under a contract that may limit the number of lends. Libraries are in a less good position with e-works.

TG: I’m not in the library world, but maybe librarians become facilitators of networked learning. People are becoming networked through their library cards, which becomes a platform for creating and curating knowledge that’s shared across the library system. If you create a platform where card holders in the virtual space are able to come together to say, e.g., that there are transportation issues in the city that need solving, the librarian can facilitate the coming together of that conversation. The library can be a link to other institutions.

BB: Librarians are moving away from being the experts in finding stuff (research librarians excepted) and becoming more facilitators.

SM: What about curation? Is that more the job of the librarian than ever?

BB: In the traditional sense, no. Curating programs, etc.: yes.

SM: When you were in SF, you were involved in the renovation of 24 neighborhood libraries. What are the challenges?

BB: Part of it is flexibility. We renovated beautiful Carnegie libaries, but they’re not well designed for the modern flow. As the environment changes, so will the spaces. So we were concerned with designing both for today’s needs and for the future. In Chicago we’re designing spaces to support simultaneous activities. E.g., many people using our libraries are coming because they’re a single person running their own busines out of the library. How do we support that? And we have huge usage by families and children, so we’re need to support that as well. So we’re trying to design spaces that support creative play.

TG: In one instance, a yong parent kept hearing people saying they were going to the library. She was curious. It turns out that the local library has lots of family spaces, not little chairs and little books and someone reading to a group. Rather, it’s an extension of the neighborhood. She’s learning parenting and her children are learning how to play together.

JP: In St. Paul they sent up a library space right off a basketball court. I think that’s a great idea.

JP: I was director of Harvard Law Library [Disclosure: where he was my boss] which had a reading room the size of a football stadium that was always filled, but I never saw a kid take a book off a shelf. They were there to study. They have good wifi in the dorms. There’s something about coming to a common space, with librarians there who could help them if they got in trouble. But they’re there using digital materials. We need to figure out how the physical and digital coalesce, but mainly we need to have to figure out how to build collaborative spaces. Boston Public Library is renovating the historic Johnson Building. They’re putting the teens and tweens on the second floor to make the space attractive to them but also to keep them a bit out of the way.

TG: We work with a teen center in the East Bay area of SF. When you walk into the teen center the first thing you see is the library within the center — the libraries services are embedded in the space that they think of as their space.

Q: [Fred Kent, project for Public Spaces] Different African cultures are coming into Winnipeg. They put an African market outside the library. Richmond BC had to move out of their library into a large Wal-mart-like space along with other services. In Perth, the state library took all the library materials off the ground floor and put in cultural activities. The main library Houston is sponsoring an activitation event with SW Airlines. Libraries could become an integral part of the community services. The future of libraries may not be in their own buildings . The architecture of libraries may be very different.

JP: Yes. E.g., the basketball court example.

Q: I hear about the bond problems in Chicago. I don’t hear that in your comments, Brian.

BB: Chicago has been struggling financially and hopefully is coming out of it. CPL saw significant reductions in 2009 and 2011, resulting in a reduction in hours. We’ve brought many of those hours back through a restructuring. It costs about $100M to run the library, but it costs $6B to run the schools. We’re a tiny piece. That tiny investment in libraries as community anchors and for after-school learning has been an important argument for keeping funding in place. Our collections budget is a little less than what we had in SF and we’re three times the size. So, we definitely have issues.

JP But you’re a cheap date. Our high school costs $100M to run and you’re running the entire library system on that.

Q: The Koolhaus-designed library in Seattle has the problem of being filled with homeless people. They’ve thought about relegating a space with showers and bathrooms and washing machines within the library. WDYT?

BB: Homelessness is part of the urban challenge. It’s important that we see libraries as public spaces open to all regardless of their background. We should not create rules to encourage some and discourage others. In SF we experimented with bringing in people to work with the homeless on finding services that can help them. So rather than creating a shelter within the library, I’d rather that we become a resource helping people to find resources.

Q: How can we make these presidential libraries less a monument and more a way to engage the populace?

BB: Presidential libraries are called libraries, but I’m very excited about the prospect of the Obama library aspiring to being a place to learn about democracy and see it in action. I think it’d be great if it happened in an urban space. We’ve been talking with all three organizations trying to bring the Obama Library to Chicago about what role the public library might play.

TG: It’s an opportunity to think about this as being more of a digital, virtual library. The discussion of democracy should not be confined to one physical place.

JP: I’d argue strongly for the blended approach especially with this president. His election combined beautifully the digital with knocking on doors. Also, the DPLA attempts to build a national digital library, backed by National Archives and the Smithsonian among others. We could do something incredibly cool by connecting the digital and the physical.

Q: In tough budgetary times how are acquisitions affected and how is that being used to shape publishers’ behaviors?

BB: Patron driven acquisitions has us buying books when users want them. The question of publishers is tough. Each library on its own doesn’t have much power. Some big city libraries have cut their own deals. We want to make materials available and also for the publishers to be successful.

JP: We haven’t talked here about the role libraries play in preserving knowledge. If all you were to do is provide what people want at that moment, you’d lose. Patron driven acquisition is a good idea in some respect, and libraires and puslihers should be making common cause, but we also should recall that publishers go out of business — major publishers two or three times came to Harvard Law Library asking for copies of their books so they could digitize them; they didn’t have copies.

TG: That’s where you have to be careful about these decisions made by the analytics of usage.

2 Comments »

May 13, 2014

Full-text searching Harvard Library: a hacky mashup

Harvard Library has 13M items in its collection. Harvard is digitizing many of them, but as of now you cannot do a full text search of them.

Google Books had 30M books digitized as of a year ago. You can do full-text searches of them.

So, I wrote a little app [Note: I've corrected this url.] that lets you search Google Books for text, and then matches up the results with books in Harvard Library. It’s a proof of concept, and I’m counting the concept as proved, or at least as promising. On the other hand, my API key for Google Books only allows 2,000 queries a day, so it’s not practical on the licensing front.

This project runs on top of LibraryCloud, an open source library metadata server created by the Harvard Library Innovation Lab that I co-direct (until Sept.). LibraryCloud provides an API to Harvard’s open library metadata and more. (We’re building a new, more scalable version now. It is, well, super-cool.)

But please note that this HOLLIS full-text search thingy is NOT a project done by our highly innovative and highly skilled developers. I did it, which means if you look at the code (github) you will have a good laugh. Also, this service will fail in dull and interesting ways. I am a horrible programmer. (But I enjoy it.)

Some details below the clickable screenshot…


Click on the image to expand it.
googleHollis screen capture

Click here to go to the app.

The Google Books results are on the left (only ten for now), and HOLLIS on the right.

If a Google result is yellow, there’s a match with a book in HOLLIS. Gray means no match. HOLLIS book titles are prefaced by a number that refers to the Google results number. Clicking on the Google results number (in the circle) hides or shows those works in the stack on the right; this is because some Google books match lots of items in HOLLIS. (Harvard has a lot of copies of King Lear, for example.)

There are two types of matches. If an item matched on a firm identifier (ISBN,OCLC, LCCN), then there’s a checkmark before the title in the HOLLIS stack, and there’s a “Stacklife” button in the Google list. Clicking on the Stacklife button displays the book in Harvard StackLife, a very cool — and prize winning! — library browser created by our Lab. The StackLife stack colorizes items based on how much they’re used by the Harvard community. The thickness of the book indicates its page count and its length indicates its actual physical height.

If there’s no match on the identifiers, then the page looks for a keyword match on the title and an exact match on the author’s last name. This can result in multiple results, not all of which may be right. So, on the Google result there’s a “Feeling lucky” button that will take you to the first match’s entry in StackLife.

The “Google” button takes you to that item’s page at Google Books, filtered by your search terms for your full-texting convenience.

The “View” button pops up the Google Books viewer for that book, if it’s available.

The “Clear stack” button deselects all the items in the Google results, hiding all the items in the HOLLIS stack.

Let me know how this breaks or sucks, but don’t expect it ever to be a robust piece of software. Remember its source.

Be the first to comment »

May 2, 2014

[2b2k] Digital Humanities: Ready for your 11AM debunking?

The New Republic continues to favor articles debunking claims that the Internet is bringing about profound changes. This time it’s an article on the digital humanities, titled “The Pseudo-Revolution,” by Adam Kirsch, a senior editor there. [This seems to be the article. Tip of the hat to Jose Afonso Furtado.]

I am not an expert in the digital humanities, but it’s clear to the people in the field who I know that the meaning of the term is not yet settled. Indeed, the nature and extent of the discipline is itself a main object of study of those in the discipline. This means the field tends to attract those who think that the rise of the digital is significant enough to warrant differentiating the digital humanities from the pre-digital humanities. The revolutionary tone that bothers Adam so much is a natural if not inevitable consequence of the sociology of how disciplines are established. That of course doesn’t mean he’s wrong to critique it.

But Adam is exercised not just by revolutionary tone but by what he perceives as an attempt to establish claims through the vehemence of one’s assertions. That is indeed something to watch out for. But I think it also betrays a tin-eared reading by Adam. Those assertions are being made in a context the authors I think properly assume readers understand: the digital humanities is not a done deal. The case has to be made for it as a discipline. At this stage, that means making provocative claims, proposing radical reinterpretations, and challenging traditional values. While I agree that this can lead to thoughtless triumphalist assumptions by the digital humanists, it also needs to be understood within its context. Adam calls it “ideological,” and I can see why. But making bold and even over-bold claims is how discourses at this stage proceed. You challenge the incumbents, and then you challenge your cohort to see how far you can go. That’s how the territory is explored. This discourse absolutely needs the incumbents to push back. In fact, the discourse is shaped by the assumption that the environment is adversarial and the beatings will arrive in short order. In this case, though, I think Adam has cherry-picked the most extreme and least plausible provocations in order to argue against the entire field, rather than against its overreaching. We can agree about some of the examples and some of the linguistic extensions, but that doesn’t dismiss the entire effort the way Adam seems to think it does.

It’s good to have Adam’s challenge. Because his is a long and thoughtful article, I’ll discuss the thematic problems with it that I think are the most important.

First, I believe he’s too eager to make his case, which is the same criticism he makes of the digital humanists. For example, when talking about the use of algorithmic tools, he talks at length about Franco Moretti‘s work, focusing on the essay “Style, Inc.: Reflections on 7,000 Titles.” Moretti used a computer to look for patterns in the titles of 7,000 novels published between 1740 and 1850, and discovered that they tended to get much shorter over time. “…Moretti shows that what changed was the function of the title itself.” As the market for novels got more crowded, the typical title went from being a summary of the contents to a “catchy, attention-grabbing advertisement for the book.” In addition, says Adam, Moretti discovered that sensationalistic novels tend to begin with “The” while “pioneering feminist novels” tended to begin with “A.” Moretti tenders an explanation, writing “What the article ‘says’ is that we are encountering all these figures for the first time.”

Adam concludes that while Moretti’s research is “as good a case for the usefulness of digital tools in the humanities as one can find” in any of the books under review, “its findings are not very exciting.” And, he says, you have to know which questions to ask the data, which requires being well-grounded in the humanities.

That you need to be well-grounded in the humanities to make meaningful use of digital tools is an important point. But here he seems to me to be arguing against a straw man. I have not encountered any digital humanists who suggest that we engage with our history and culture only algorithmically. I don’t profess expertise in the state of the digital humanities, so perhaps I’m wrong. But the digital humanists I know personally (including my friend Jeffrey Schnapp, a co-author of a book, Digital_Humanities, that Adam reviews) are in fact quite learned lovers of culture and history. If there is indeed an important branch of digital humanities that says we should entirely replace the study of the humanities with algorithms, then Adam’s criticism is trenchant…but I’d still want to hear from less extreme proponents of the field. In fact, in my limited experience, digital humanists are not trying to make the humanities safe for robots. They’re trying to increase our human engagement with and understanding of the humanities.

As to the point that algorithmic research can only “illustrate a truism rather than discovering a truth,” — a criticism he levels even more fiercely at the Ngram research described in the book Uncharted — it seems to me that Adam is missing an important point. If computers can now establish quantitatively the truth of what we have assumed to be true, that is no small thing. For example, the Ngram work has established not only that Jewish sources were dropped from German books during the Nazi era, but also the timing and extent of the erasure. This not only helps make the humanities more evidence-based —remember that Adam criticizes the digital humanists for their argument-by-assertion —but also opens the possibility of algorithmically discovering correlations that overturn assumptions or surprise us. One might argue that we therefore need to explore these new techniques more thoroughly, rather than dismissing them as adding nothing. (Indeed, the NY Times review of Uncharted discusses surprising discoveries made via Ngram research.)

Perhaps the biggest problem I have with Adam’s critique I’ve also had with some digital humanists. Adam thinks of the digital humanities as being about the digitizing of sources. He then dismisses that digitizing as useful but hardly revolutionary: “The translation of books into digital files, accessible on the Internet around the world, can be seen as just another practical tool…which facilitates but does not change the actual humanistic work of thinking and writing.”

First, that underplays the potential significance of making the works of culture and scholarship globally available.

Second, if you’re going to minimize the digitizing of books as merely the translation of ink into pixels, you miss what I think is the most important and transformative aspect of the digital humanities: the networking of knowledge and scholarship. Adam in fact acknowledges the networking of scholarship in a twisty couple of paragraphs. He quotes the following from the book Digital_Humanities:

The myth of the humanities as the terrain of the solitary genius…— a philosophical text, a definitive historical study, a paradigm-shifting work of literary criticism — is, of course, a myth. Genius does exist, but knowledge has always been produced and accessed in ways that are fundamentally distributed…

Adam responds by name-checking some paradigm-shifting works, and snidely adds “you can go to the library and check them out…” He then says that there’s no contradiction between paradigm-shifting works existing and the fact that “Scholarship is always a conversation…” I believe he is here completely agreeing with the passage he thinks he’s criticizing: genius is real; paradigm-shifting works exist; these works are not created by geniuses in isolation.

Then he adds what for me is a telling conclusion: “It’s not immediately clear why things should change just because the book is read on a screen rather than on a page.” Yes, that transposition doesn’t suggest changes any more worthy of research than the introduction of mass market paperbacks in the 1940s [source]. But if scholarship is a conversation, might moving those scholarly conversations themselves onto a global network raise some revolutionary possibilities, since that global network allows every connected person to read the scholarship and its objects, lets everyone comment, provides no natural mechanism for promoting any works or comments over any others, inherently assumes a hyperlinked rather than sequential structure of what’s written, makes it easier to share than to sequester works, is equally useful for non-literary media, makes it easier to transclude than to include so that works no longer have to rely on briefly summarizing the other works they talk about, makes differences and disagreements much more visible and easily navigable, enables multiple and simultaneous ordering of assembled works, makes it easier to include everything than to curate collections, preserves and perpetuates errors, is becoming ubiquitously available to those who can afford connection, turns the Digital Divide into a gradient while simultaneously increasing the damage done by being on the wrong side of that gradient, is reducing the ability of a discipline to patrol its edges, and a whole lot more.

It seems to me reasonable to think that it is worth exploring whether these new affordances, limitations, relationships and metaphors might transform the humanities in some fundamental ways. Digital humanities too often is taken simply as, and sometimes takes itself as, the application of computing tools to the humanities. But it should be (and for many, is) broad enough to encompass the implications of the networking of works, ideas and people.

I understand that Adam and others are trying to preserve the humanities from being abandoned and belittled by those who ought to be defending the traditional in the face of the latest. That is a vitally important role, for as a field struggling to establish itself digital humanities is prone to over-stating its case. (I have been known to do so myself.) But in my understanding, that assumes that digital humanists want to replace all traditional methods of study with computer algorithms. Does anyone?

Adam’s article is a brisk challenge, but in my opinion he argues too hard against his foe. The article becomes ideological, just as he claims the explanations, justifications and explorations offered by the digital humanists are.

More significantly, focusing only on the digitizing of works and ignoring the networking of their ideas and the people discussing those ideas, glosses over the locus of the most important changes occurring within the humanities. Insofar as the digital humanities focus on digitization instead of networking, I intend this as a criticism of that nascent discipline even more than as a criticism of Adam’s article.

3 Comments »

April 20, 2014

[2b2k] In defense of the library Long Tail

Two percent of Harvard’s library collection circulates every year. A high percentage of the works that are checked out are the same as the books that were checked out last year. This fact can cause reflexive tsk-tsking among librarians. But — with some heavy qualifications to come — this is at it should be. The existence of a Long Tail is not a sign of failure or waste. To see this, consider what it would be like if there were no Long Tail.

Harvard’s 73 libraries have 16 million items [source]. There are 21,000 students and 2,400 faculty [source]. If we guess that half of the library items are available for check-out, which seems conservative, that would mean that 160,000 different items are checked out every year. If there were no Long Tail, then no book would be checked out more than any other. In that case, it would take the Harvard community an even fifty years before anyone would have read the same book as anyone else. And a university community in which across two generations no one has read the same book as anyone else is not a university community.

I know my assumptions are off. For example, I’m not counting books that are read in the library and not checked out. But my point remains: we want our libraries to have nice long tails. Library long tails are where culture is preserved and discovery occurs.

And, having said that, it is perfectly reasonable to work to lower the difference between the Fat Head and the Long Tail, and it is always desirable to help people to find the treasures in the Long Tail. Which means this post is arguing against a straw man: no one actually wants to get rid of the Long Tail. But I prefer to put it that this post argues against a reflex of thought I find within myself and have encountered in others. The Long Tail is a requirement for the development of culture and ideas, and at the same time, we should always help users to bring riches out of the Long Tail

1 Comment »

March 22, 2014

Biblioteca Malatestiana – The world’s oldest public library

I’m in Cesena, Italy for the first holding of the Web Economic Forum. Because I’m only here for a day, I didn’t bother to look up the local attractions until I arrived this afternoon. At TripAdvisor, the #1 Attraction is the Biblioteca Malatestiana, so I walked there. (It turns out the WEF is in the adjoining building.)

The 400-year-old Biblioteca lays claim to being the world’s oldest public library. And it’s worth a visit, although the tour is in Italian, which I listened to attentively with my 1% Italian comprehension that consists almost entirely of false cognates and pizza toppings. Nevertheless, you can get the gist that this is a damn old library, that it’s got some very old books, including one from the 11th century, and that it was managed jointly by a monastery and the city government. (The intricate doors to the reading room require a key from each to be unlocked.)

The reading room looks like a chapel. There are two rows of pews that turn out to be reading desks designed for people to stand at. The books are stored underneath, like prayer books in a church, except they’re not and they’re chained to the shelf. The books on the right side of the chapel are religious, and the ones on the left are civic and classics. (The Greek classics are Latin translations.) The collection of 353 books includes seven Jewish works.

Reading room
Photo by Ivano Giovannini, from here

Reading room
Photo by Ivano Giovannini, from here

Then you are taken into the Pope Pius VII’s library, a well-lit room with 15th century music books on display. They are nicely illuminated. There’s also a small display of small books, including one that they claim is the smallest that is legible without a magnifier. I couldn’t read it, but my eyesight isn’t as good as it never was.

Chorale books
Photo by Sally Zuckerman, from here

I wish they had shown us more of the Library, but you can hear very old voices there, and they’re mainly saying, “Printed books are going to kill reading! Everyone’s a reader now! You don’t need any special skills or training. And the books are so much uglier than they were in my day. Hey you kids, get off of my fiefdom!”

 


The Wikipedia article isn’t very good. There’s better info on this Consortium of European Research Libraries page, and this Travel Through History page by Sally Zuckerman. (The photos are from Sally’s post.)

3 Comments »

March 18, 2014

Dean Krafft on the Linked Data for Libraries project

Dean Krafft, Chief Technology Strategist for Cornell University Library, is at Harvard to talk about the Mellon-funded Linked Data for Libraries (LD4L) project he leads. The grantees include Cornell, Stanford, and the Harvard Library Innovation Lab (which is co-sponsoring the talk with ABCD). (I provide nominal leadership for the Harvard team working on this.)

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Dean will talk about the LD4L project by talking about its building blocks. [Dean had lots of information and a lot on the slides. I did a particularly bad job of capturing it.]

Ld4L

Mellon last December put up $1M for a 2-year project that will end in Dec. 2015. The participants are Cornell, Stanford, and the Harvard Library Innovation Lab.

Cornell: Dean Krafft, Jon Corso-Rickert, Brian Lowe, Simeon Warner

Stanford: Tom Cramer, Lynn McRae, Naomi Dushay, Philip Schreur

Harvard: Paul Deschner, Paolo Ciccarese, me

Aim: Create a Scholarly Resource Semantic Info Store model that works within and across institutions to create a network of Linked Open Data to capture the intellectual value that librarians and other domain experts add to info, patterns of usage, and more.

Ld4L wants to have a common language for talking about scholarly materials. – Outcomes: – Create a SRSIS ontology sufficiently expressive to encompass catalog metadata and other contextual elements – Create a SRSIS semantic editing display, and discovery system based on Vitro to support the incremental ingest of semantic data from multiple info sources – Create a Project Hydra-compatible interface to SRSIS, an active triples software component to facilitate easy use of the data

Why use Linked Data?

LD puts the emphasis on the relationships. Everything is related.

Benefits: The connections have meaning. And it supports “many dimensions of nearness”

Dean explains RDF triples. They connect subjects with objects via a consistent set of relationships.

A nice feature of LOD is that the same URL that points to a human-readable page can also be taken as a query to show the machine-readable data.

There’s commonality among references: shared types, shared relationships, shared instances defined as types and linked by relationships.

LOD is great for sharing data. There’s a startup cost, but as you share more data repositories and types, the costs/effort goes up linearly, not at the steeper rate of traditional approaches.

Dean shows the mandatory graphic of a cloud of LOD sources.

Building Blocks

VIVO: Vivo was the inspiration for LD4L. It makes info about researchers discoverable. It’s software, data, a standard, and a community. It connects scientists and scholars through their research and scholarship. It provides self-describing data via shared ontologies. It provides search results enhanced by what it knows. And it does simple reasoning.

Vivo is built on the VIVO/Vitro platform. It has ingest tools, ontology editing tools, instance editing tools, and a display system. It models people, organizations, grants, etc., the relationships among them, and links to URIs elsewhere. It describes people in the process of doing research. It’s discipline-neutral. It uses existing domain terminology to describe the content of research. It’s modular, flexible, and extensible.

VIVO harvests much of its data automatically from verified sources.

It takes a complexity of inputs and makes them discoverable and usable.

All the data in VIVO is public and visible.

Dean shows us a page, and then traverses the network of interrelated authors.

He points out that other institutions are able to mash up their data with VIVO. E.g., the ICTS has info about 1.2M publications that they’ve integrated with VIVO’s data. E.g., you can see research papers created with federal funding but not deposited in PubMed Central.

VIVO is extensible. LASP extended VIVO to include spacecraft. Brown U. is extending it to support the humanities and artistic works, adding “performances,” for example.

The LD4L ontology will use components of the VIVO-ISF ontology. When new ontologies are needed, it will draw upon VIVO design patterns. The basis for SRSIS implementations will be Vitro plus LD4L ontologies. The multi-institution LD4L demo search will adapt VIVOsearch.org.

The 8M items at Cornell have generated billions of triples.

Project Hydra. Hydra is a tech suite and a partnership. You put your data there and can have many different apps. 22 institutions are collaborating.

Fundamental assumption: No single system can provide the full range of repository-based solutions for a given institution’s needs, yet sustainable solutions do require a common repository. Hydra is now building a set of “heads” (UI’s) for media, special collections, archives, etc.

Fundamental assumption: No single institution can build the full range of what it needs, so you need to work with others.

Hydra has an open architecture with many contributors to a common core. There are collaboratively built solution bundles.

Fedora, Ruby on Rails for Blacklight, Solr, etc.

LD4L will create an activeTriples Hyrdra component to mimic ActiveFedora.

Our Lab’s LibraryCloud/ShelfRank is another core element. It provides model for access to library data. Provides concrete example for creating an ontology for usage.

LD4L – the project

We’re now developing use cases. We have 32 on the wiki. [See the wiki for them]

We’re identifying data sources: Biblio, person (VIVO), usage (LibCloud, circ data, BorrowDirect circ), collections (EAD, IRs, SharedShelf, Olivia, arbitrary OAI-PMH), annotations (CuLLR, Stanford DMW, Bloglinks, DBpedia LibGuides), subjects and authorities (external sources). Imagine being able to look at usage across 50 research libraries…

Assembling the Ontology:

VIVO, Open Annotation, SKOS

BibFrame, BIBO, FaBIO

PROV-O, PAV

FOAF, PROVE, Schema.org

CreativeCommons, Dublin Core

etc.

Whenever possible the project will use existing ontologies

Timeline: By the end of the year we hope to be piloting initial ingests.

Workshop: Jan. 2015. 10-12 institutions. Aim: get feedback, make a “sales pitch” to other organizations to join in.

June 2015: Pilot SRSIS instances at Harvard and Stanford. Pilot gather info across all three instances.

Dec. 2015: Instances implemented.

wiki: http://wiki.duraspace.org/display/ld4l

Q&A

Q: Who anointed VIVO a standard?

A: It’s a de facto.

Q: SKOS is considered a great start, but to do anything real with it you have to modify it, and if it changes you’re screwed.

A: (Paolo) I think VIVO uses SKOS mainly for terms, not hierarchies. But I’m not sure.

Q: What are ActiveTriples?

A: It’s a Ruby Gem that serves as an interface for Hydra into a Fedora repository. ActiveTriples will serve the same function for a backend triple store. So you can swap different triple stores into the Fedora repository. This is Simeon Warner’s project.

Q: Does this mean you wouldn’t have to have a Fedora backend to take advantage of Hydra?

A: Yes, that’s part of it.

Q: Are you bringing in GIS linked data?

A: Yes, to the extent that we can and it makes sense to.

A: David Siegel: We have 6M data points from 1.1M Hollis records. LibraryCloud is ingesting them.

Q: What’s the product at the end?

A: We promised Mellon the ontology and instances of LOD based on the ontology at each of the 3 institutions, and search across the three.

Q: Harvard doesn’t have a Fedora backend…

A: We’d like to pull from non-catalog sources. That might well be an OAI-PMH ingest, or some other non-Fedora source.

Q: What is Simeon interested in with regard to Arxiv.org?

A: There isn’t a direct relationship.

Q: He’s also working on ORCID.

A: We have funding to do some level of integration of ORCID and VIVO.

Q: What is the bibliographic scope? BibFrame isn’t really defining items, etc. They’ve pushed it into annotations.

A: We’re interested in capturing some of that. BibFrame is offering most of what we need, but we have to look at each case. Then we communicate with them and hope that BibFrame does most of the work.

Q: Are any of your use cases posit tagging of contents, including by users perhaps with a controlled vocabulary?

A: We’ll be doing tagging at the object level. I’m unsure whether we’re willing to do tagging within the object.

A: [paolo] We assume we don’t have access to the full text.

A: You could always point into our data.

Q: How can we help?

A: We’re accumulating use cases and data sources. If you’re aware of any, let us know.

Q: It’s been hard for libraries to put enough effort into authority control, to associate values comparable across different subject schemes…there’s a lot of work to make things work together. What sort of vocabulary or semantic links will you be using? The hard part is getting values to work across domains.

A: One way to deal with that is to bring together the disparate info. By pulling together enough info, you can sometimes use the network to you figure that out. But in general the disambiguation challenge (and text fields are even worse) is not something we’re going to solve.

Q: Are the working groups institutionally based?

A: No. They’re cross-institution.

[I'm very excited about this project, and about the people working on it.]

Be the first to comment »

March 6, 2014

Report from Denmark: Designing the new public library at Aarhus, and the People’s Lab

Knud Schulze, manager of the main library in Aarhus, Denmark and Jne Kunze of the People’s Lab in Denmark are giving talks, hosted by the Harvard Library Innovation Lab. (Here are his slides.)

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Knud begins by reminding us how small Denmark is: 5.5M people. Aarhus has a population of about 330,000. [My account is very choppy. The talk was not.]

Now that the process of digitizing all information is well underway, the focus is on what can only be experienced in the library. Before, the library was a space for media. Now the space is a medium. Seriousness was prized in libraries. Now a sense of humor. We’ve built libraries with books and other media to serve an industrial society. Some are truly beautiful, but they’re under-used. Now we’re moving to libraries for networked society.

Three and a half years ago, the Danes wrote a report on public libraries in the knowledge society, and went looking for partnerships, which is unusual for the Danes, says Knud. The new model of the library intersects four spaces: inspiration, learning, performative, and meeting spaces. But the question is what people are going to do in those spaces. Recognition/experience, empowerment, learning, innovation. Knud shows pictures of those activities currently going on in the library.

Two hundred of Denmark’s 500 public libraries are “open libraries” — open 24 hours a day, with staffing only about 12 hours a week. If you have a library card, you can open the door. You can check media in and out, use the Internet, use a PC, read newspapers, study, arrange study circles. “The point is to let users take control.”

A law in 2007 said there had to be one-stop shopping for govt services. Most libraries offer these services. You go to the library for a passport, drivers license, health insurance, etc. Every citizen needs to have a personal account for communication with banks, from the state (e.g., about taxes). Libraries have helped educate the citizenry about this.

Often libraries are community centers that involve public and private sectors and a wide range of services. Sometimes the other services overwhelm the library services. “People ask me, ‘Where is the public library in this?’, and I say, ‘Think about the library as the glue.’”

There have to be innovation spaces in the local libraries.

The Danish Digital Library (Danskernes Digtale Bibliotek) is an open source infrastructure for digital objects, including a resouce management system for the whole country, and to purchase digital content. All its digital services are accessible anywhere in the world. 86 of the 98 municipal library systems have contributed to a shared contract for a new library system based on Open Source. They share operations and development. “There’s a very good business case.”

So, why Dokk1, the new library?

Libraries are symbols of development and innovation in the society. They drive city development. They add new stories about the town. All public libraries are examples of the citizens’ interest in innovation. E.g., the Opera, Munch museum and library in Oslo have transformed the waterfront and brought a new identity to the city. Helsinki, Birmingham (UK), and others as well. “The same will happen in Aarhus, we hope.”

DOKK1 is being built into the harbor, “transforming it into an open sea front.” There’s 200,000 sq. feet of library, parking for 1,000 cars, two new urban harbor squares, a light rail station. Cost: US$390M . It will open in early 2015.

The front of the current library features new programs every few months, rather than the entrance being a way of controlling the users. They’ve run projects like iFloor (social interaction), a news lab (producing TV), AI robots, displays that capture and freeze images of people interacting with it, and much more. The building needs to interact with its surroundings and adapt to it, says Knud.

DOKK1 is “no building with an advanced roof.

“It’s all about facilitating relations.” “The library of the future is all about people.” It will be a user-driven process: “From tradition to transcendence so users can deconstruct their old knowledge about libraries.” Knud shows a photo of children doing searches by interacting with blocks on the floor. They paid no attention to the info on the screens.

They have partnerships with the Gates Foundation, Chicago Public Libraries, IDEO, and the Aarhus Public Library

Another project: “Intelligent Libraries”: how to “work smart” by improving logistics. The project knows where all the books are in all the nation’s libraries, and how often they’re used. They use “media hotels”: “local or remote storage of overflow, slow moving materials.”

The name “DOKK1″ came from a competition. 1,250 proposals. Seven were considered by a jury. “It’s about branding the library.” 90% of all city inhabitants should know about the new project. In August 2013 75% did. In the existing library, users are invited to engage in the “mental construction” of the new one.

Now Jane Kunze talks about People’s Lab. She begins with a sign: “Shut up and hack.”

They’ve been setting up labs for the past two years to test different ways of interacting with users. Innovation is important to the Danish govt. (Denmark was just rated the most innovative country in Europe.) How can the public library be part of this?

They were inspired by Maker culture. Fab labs and maker spaces have been popping up everywhere. There’s also a trend in Denmark to repair rather than replace. And a focus on hand skills and not just academic knowledge. Also rapid prototyping, with inspiration from design thinking (as per IDEO).

The People’s Lab is a result of a collaboration among the library, community, and partners. Partners include public libraries, Aarhus School of Architecture, Moesgaard Museum, Roskilde festival, Orange Innovation, and more.

When they began, it was about kick-ass technology. But , while tech is fun, it’s really about people and community-building. “Don’t wait to involve people until your grand opening.” People will see your imperfections “but that’s part of what will make them committed to the place.”

The six labs:

  • TechLab: having a maker in residence is powerful. See Valdemar’s hovercraft:

  • Guitar Lab. Use local people and their passions.

  • Dreamcity: A maker space at the Roskilde rock festival. “You have to put yourself into play. You have to be there with your whole personality, and not just your professional side.”

  • WasteLab: Trash from dump “spiced up with specially selected trash.” “Creativity comes from chaos — stop tidying!”

  • Magentic Groove Memories: cut your own vinyl records and fix up old radios

  • The first maker faire in Aarhus will be 2014

They’ve been building a ladder of involvement, so people can come in for something basic and find themselves increasingly engaged — “small steps that make it possible for people to become more and more free in their thinking.”

They’ve learned that when the community already has hacker spaces and maker spaces, maybe the library should just be a gate to this ecosystem, opening them up to a broader public. Maybe the library is a place where people are introduced to making and working more creatively with their hands. “You can work with maker culture without having a makerspace.” You don’t have to have a room dedicated to machinery, especially for the smaller communities.

Q&A [with six of the Danes responding]

Q: Is this like a library plus the SF Exploratorium

A: Yes.

A: We’re looking at how to create relationships among the patrons, staff, the media…

A: We want to make a place where people get involved in different kinds of competencies.

Q: Many of the other libraries you showed are on the edge of the city. Are you trying to make the library a destination? In Boston I wouldn’t let my 14 yr old grandchild go down to the harbor by himself.

A: In Aarhus, children move through the city at 10-12 yrs old. They can get to the new library by public transportation or bike. But we are trying to transform the city so that it is looking out, not in.

Q: We’re seeing more random innovation in library spaces in this city, as opposed to your carefully planned and articulated change. (1) You’re designers, but it’s about designing the interaction. (2) How can you bring unique, local materials into this interactive environment. (3) At archives, people are now curating their own memories, with a community collective approach. (4) We have generations of professionals, so just building new locations may not change things.

A: In Denmark we have a long tradition of tcollecting of local historical materials. E.g., we have lots of photos of cattle and farms, so we crowd-sourced geolocating them and put on Google Maps. We have a lot of materials that could be used.

A: We have a new project. When you get your grandparents’ old documents, you digitize them and load them on a national server. You’re in control of how open they should be. That’s in test now.

A: We have lot of projects that focus on seniors.

A: At the WasteLab, one of the most active participants was a 70 year old woman. She made herself into the welcoming host. One day she came in with a smart phone she had won. People at the WasteLab sat with her and helped her learn how to use it; she’d found a community to ask. Creating a variety of offers — from more traditional to the newer — involves everyone.

A: We see the library as a space for that kind of relationships.

Q: Are you getting any support from the Royal Library?

A: It has no relationship to public libraries.

Q: Design is crucial. It can signal to people that there’s more here than you expect. Modern libraries send a signal that it’s not only a place for research or study. Putting up those popup labs in your lobby is one of the most useful devices; people are in the experience without having to look for it. It’s the best of what Disney is trying to accomplish. The popup libraries are the gateway drug.

Q: How might this fit into an academic library space?

A: We collaborate with a couple of universities, but they’re two different worlds. University libraries generally see users as people to whom they provide services, rather than as people who can contribute to the library. It’s a question of what the academic libraries want students to do in the library. To read? To learn from other students? You might experiment with a common space to bring together these different communities.

A: You have a lifelong relationship with your local library, but only for a few years with your university library.

Q: Ultimately all libraries are shared resources, whatever those resources are. That’s a great argument for sharing access to all the tools we’ve heard about. Not every library needs its own 3D printer, but they could use access to one.

A: In Norway, a particular university library is divided into five areas, but with big shared spaces with tables, chairs, and menus. Then they put in empty shelves. The room was totally over-crowded and totally re-arranged.

Q: At Tisch Library at Tufts they’re renovating and creating group study space for people working alone but in a public space. Also, they’ve installed a media lab. At the Northeastern U Library, it felt like I was at an airport. There were fixed spaces and terminals, but there must have been 500 students in there. It was like a beehive. At the Madison Public Library they have The Bubbler, media lab and performance space. These are blurring the lines.

[Loved these talks. These folks are taking deep principles and embodying them in their spaces.]

Be the first to comment »

Dan Cohen on the DPLA’s cloud proposal to the FCC

I’ve posted a podcast interview with Dan Cohen, the executive director of the Digital Public Library of America about their proposal to the FCC.

The FCC is looking for ways to modernize the E-Rate program that has brought the Internet to libraries and schools. The DPLA is proposing DPLA Local, which will enable libraries to create online digital collections using the DPLA’s platform.

I’m excited about this for two reasons beyond the service it would provide.

First, it could be a first step toward providing cloud-based library services, instead of the proprietary, closed, expensive systems libraries typically use to manage their data. (Evergreen, I’m not talking about you, you open source scamp!)

Second, as libraries build their collections using DPLA Local, their metadata is likely to assume normalized forms, which means that we should get cross-collection discovery and semantic riches.

Here’s the proposal itself. And here’s where you can comment to the FCC about it.

Be the first to comment »

February 1, 2014

Linked Data for Libraries: And we’re off!

I’m just out of the first meeting of the three universities participating in a Mellon grant — Cornell, Harvard, and Stanford, with Cornell as the grant instigator and leader — to build, demonstrate, and model using library resources expressed as Linked Data as a tool for researchers, student, teachers, and librarians. (Note that I’m putting all this in my own language, and I was certainly the least knowledgeable person in the room. Don’t get angry at anyone else for my mistakes.)

This first meeting, two days long, was very encouraging indeed: it’s a superb set of people, we are starting out on the same page in terms of values and principles, and we enjoyed working with one another.

The project is named Linked Data for Libraries (LD4L) (minimal home page), although that doesn’t entirely capture it, for the actual beneficiaries of it will not be libraries but scholarly communities taken in their broadest sense. The idea is to help libraries make progress with expressing what they know in Linked Data form so that their communities can find more of it, see more relationships, and contribute more of what the communities learn back into the library. Linked Data is not only good at expressing rich relations, it makes it far easier to update the dataset with relationships that had not been anticipated. This project aims at helping libraries continuously enrich the data they provide, and making it easier for people outside of libraries — including application developers and managers of other Web sites — to connect to that data.

As the grant proposal promised, we will use existing ontologies, adapting them only when necessary. We do expect to be working on an ontology for library usage data of various sorts, an area in which the Harvard Library Innovation Lab has done some work, so that’s very exciting. But overall this is the opposite of an attempt to come up with new ontologies. Thank God. Instead, the focus is on coming up with implementations at all three universities that can serve as learning models, and that demonstrate the value of having interoperable sets of Linked Data across three institutions. We are particularly focused on showing the value of the high-quality resources that libraries provide.

There was a great deal of emphasis in the past two days on partnerships and collaboration. And there was none of the “We’ll show ‘em where they got it wrong, by gum!” attitude that in my experience all too often infects discussions on the pioneering edge of standards. So, I just got to spend two days with brilliant library technologists who are eager to show how a new generation of tech, architecture, and thought can amplify the already immense value of libraries.

There will be more coming about this effort soon. I am obviously not a source for tech info; that will come soon and from elsewhere.

2 Comments »

Next Page »