Joho the Blog » libraries

March 22, 2014

Biblioteca Malatestiana – The world’s oldest public library

I’m in Cesena, Italy for the first holding of the Web Economic Forum. Because I’m only here for a day, I didn’t bother to look up the local attractions until I arrived this afternoon. At TripAdvisor, the #1 Attraction is the Biblioteca Malatestiana, so I walked there. (It turns out the WEF is in the adjoining building.)

The 400-year-old Biblioteca lays claim to being the world’s oldest public library. And it’s worth a visit, although the tour is in Italian, which I listened to attentively with my 1% Italian comprehension that consists almost entirely of false cognates and pizza toppings. Nevertheless, you can get the gist that this is a damn old library, that it’s got some very old books, including one from the 11th century, and that it was managed jointly by a monastery and the city government. (The intricate doors to the reading room require a key from each to be unlocked.)

The reading room looks like a chapel. There are two rows of pews that turn out to be reading desks designed for people to stand at. The books are stored underneath, like prayer books in a church, except they’re not and they’re chained to the shelf. The books on the right side of the chapel are religious, and the ones on the left are civic and classics. (The Greek classics are Latin translations.) The collection of 353 books includes seven Jewish works.

Reading room
Photo by Ivano Giovannini, from here

Reading room
Photo by Ivano Giovannini, from here

Then you are taken into the Pope Pius VII’s library, a well-lit room with 15th century music books on display. They are nicely illuminated. There’s also a small display of small books, including one that they claim is the smallest that is legible without a magnifier. I couldn’t read it, but my eyesight isn’t as good as it never was.

Chorale books
Photo by Sally Zuckerman, from here

I wish they had shown us more of the Library, but you can hear very old voices there, and they’re mainly saying, “Printed books are going to kill reading! Everyone’s a reader now! You don’t need any special skills or training. And the books are so much uglier than they were in my day. Hey you kids, get off of my fiefdom!”

 


The Wikipedia article isn’t very good. There’s better info on this Consortium of European Research Libraries page, and this Travel Through History page by Sally Zuckerman. (The photos are from Sally’s post.)

3 Comments »

March 18, 2014

Dean Krafft on the Linked Data for Libraries project

Dean Krafft, Chief Technology Strategist for Cornell University Library, is at Harvard to talk about the Mellon-funded Linked Data for Libraries (LD4L) project he leads. The grantees include Cornell, Stanford, and the Harvard Library Innovation Lab (which is co-sponsoring the talk with ABCD). (I provide nominal leadership for the Harvard team working on this.)

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Dean will talk about the LD4L project by talking about its building blocks. [Dean had lots of information and a lot on the slides. I did a particularly bad job of capturing it.]

Ld4L

Mellon last December put up $1M for a 2-year project that will end in Dec. 2015. The participants are Cornell, Stanford, and the Harvard Library Innovation Lab.

Cornell: Dean Krafft, Jon Corso-Rickert, Brian Lowe, Simeon Warner

Stanford: Tom Cramer, Lynn McRae, Naomi Dushay, Philip Schreur

Harvard: Paul Deschner, Paolo Ciccarese, me

Aim: Create a Scholarly Resource Semantic Info Store model that works within and across institutions to create a network of Linked Open Data to capture the intellectual value that librarians and other domain experts add to info, patterns of usage, and more.

Ld4L wants to have a common language for talking about scholarly materials. – Outcomes: – Create a SRSIS ontology sufficiently expressive to encompass catalog metadata and other contextual elements – Create a SRSIS semantic editing display, and discovery system based on Vitro to support the incremental ingest of semantic data from multiple info sources – Create a Project Hydra-compatible interface to SRSIS, an active triples software component to facilitate easy use of the data

Why use Linked Data?

LD puts the emphasis on the relationships. Everything is related.

Benefits: The connections have meaning. And it supports “many dimensions of nearness”

Dean explains RDF triples. They connect subjects with objects via a consistent set of relationships.

A nice feature of LOD is that the same URL that points to a human-readable page can also be taken as a query to show the machine-readable data.

There’s commonality among references: shared types, shared relationships, shared instances defined as types and linked by relationships.

LOD is great for sharing data. There’s a startup cost, but as you share more data repositories and types, the costs/effort goes up linearly, not at the steeper rate of traditional approaches.

Dean shows the mandatory graphic of a cloud of LOD sources.

Building Blocks

VIVO: Vivo was the inspiration for LD4L. It makes info about researchers discoverable. It’s software, data, a standard, and a community. It connects scientists and scholars through their research and scholarship. It provides self-describing data via shared ontologies. It provides search results enhanced by what it knows. And it does simple reasoning.

Vivo is built on the VIVO/Vitro platform. It has ingest tools, ontology editing tools, instance editing tools, and a display system. It models people, organizations, grants, etc., the relationships among them, and links to URIs elsewhere. It describes people in the process of doing research. It’s discipline-neutral. It uses existing domain terminology to describe the content of research. It’s modular, flexible, and extensible.

VIVO harvests much of its data automatically from verified sources.

It takes a complexity of inputs and makes them discoverable and usable.

All the data in VIVO is public and visible.

Dean shows us a page, and then traverses the network of interrelated authors.

He points out that other institutions are able to mash up their data with VIVO. E.g., the ICTS has info about 1.2M publications that they’ve integrated with VIVO’s data. E.g., you can see research papers created with federal funding but not deposited in PubMed Central.

VIVO is extensible. LASP extended VIVO to include spacecraft. Brown U. is extending it to support the humanities and artistic works, adding “performances,” for example.

The LD4L ontology will use components of the VIVO-ISF ontology. When new ontologies are needed, it will draw upon VIVO design patterns. The basis for SRSIS implementations will be Vitro plus LD4L ontologies. The multi-institution LD4L demo search will adapt VIVOsearch.org.

The 8M items at Cornell have generated billions of triples.

Project Hydra. Hydra is a tech suite and a partnership. You put your data there and can have many different apps. 22 institutions are collaborating.

Fundamental assumption: No single system can provide the full range of repository-based solutions for a given institution’s needs, yet sustainable solutions do require a common repository. Hydra is now building a set of “heads” (UI’s) for media, special collections, archives, etc.

Fundamental assumption: No single institution can build the full range of what it needs, so you need to work with others.

Hydra has an open architecture with many contributors to a common core. There are collaboratively built solution bundles.

Fedora, Ruby on Rails for Blacklight, Solr, etc.

LD4L will create an activeTriples Hyrdra component to mimic ActiveFedora.

Our Lab’s LibraryCloud/ShelfRank is another core element. It provides model for access to library data. Provides concrete example for creating an ontology for usage.

LD4L – the project

We’re now developing use cases. We have 32 on the wiki. [See the wiki for them]

We’re identifying data sources: Biblio, person (VIVO), usage (LibCloud, circ data, BorrowDirect circ), collections (EAD, IRs, SharedShelf, Olivia, arbitrary OAI-PMH), annotations (CuLLR, Stanford DMW, Bloglinks, DBpedia LibGuides), subjects and authorities (external sources). Imagine being able to look at usage across 50 research libraries…

Assembling the Ontology:

VIVO, Open Annotation, SKOS

BibFrame, BIBO, FaBIO

PROV-O, PAV

FOAF, PROVE, Schema.org

CreativeCommons, Dublin Core

etc.

Whenever possible the project will use existing ontologies

Timeline: By the end of the year we hope to be piloting initial ingests.

Workshop: Jan. 2015. 10-12 institutions. Aim: get feedback, make a “sales pitch” to other organizations to join in.

June 2015: Pilot SRSIS instances at Harvard and Stanford. Pilot gather info across all three instances.

Dec. 2015: Instances implemented.

wiki: http://wiki.duraspace.org/display/ld4l

Q&A

Q: Who anointed VIVO a standard?

A: It’s a de facto.

Q: SKOS is considered a great start, but to do anything real with it you have to modify it, and if it changes you’re screwed.

A: (Paolo) I think VIVO uses SKOS mainly for terms, not hierarchies. But I’m not sure.

Q: What are ActiveTriples?

A: It’s a Ruby Gem that serves as an interface for Hydra into a Fedora repository. ActiveTriples will serve the same function for a backend triple store. So you can swap different triple stores into the Fedora repository. This is Simeon Warner’s project.

Q: Does this mean you wouldn’t have to have a Fedora backend to take advantage of Hydra?

A: Yes, that’s part of it.

Q: Are you bringing in GIS linked data?

A: Yes, to the extent that we can and it makes sense to.

A: David Siegel: We have 6M data points from 1.1M Hollis records. LibraryCloud is ingesting them.

Q: What’s the product at the end?

A: We promised Mellon the ontology and instances of LOD based on the ontology at each of the 3 institutions, and search across the three.

Q: Harvard doesn’t have a Fedora backend…

A: We’d like to pull from non-catalog sources. That might well be an OAI-PMH ingest, or some other non-Fedora source.

Q: What is Simeon interested in with regard to Arxiv.org?

A: There isn’t a direct relationship.

Q: He’s also working on ORCID.

A: We have funding to do some level of integration of ORCID and VIVO.

Q: What is the bibliographic scope? BibFrame isn’t really defining items, etc. They’ve pushed it into annotations.

A: We’re interested in capturing some of that. BibFrame is offering most of what we need, but we have to look at each case. Then we communicate with them and hope that BibFrame does most of the work.

Q: Are any of your use cases posit tagging of contents, including by users perhaps with a controlled vocabulary?

A: We’ll be doing tagging at the object level. I’m unsure whether we’re willing to do tagging within the object.

A: [paolo] We assume we don’t have access to the full text.

A: You could always point into our data.

Q: How can we help?

A: We’re accumulating use cases and data sources. If you’re aware of any, let us know.

Q: It’s been hard for libraries to put enough effort into authority control, to associate values comparable across different subject schemes…there’s a lot of work to make things work together. What sort of vocabulary or semantic links will you be using? The hard part is getting values to work across domains.

A: One way to deal with that is to bring together the disparate info. By pulling together enough info, you can sometimes use the network to you figure that out. But in general the disambiguation challenge (and text fields are even worse) is not something we’re going to solve.

Q: Are the working groups institutionally based?

A: No. They’re cross-institution.

[I'm very excited about this project, and about the people working on it.]

Be the first to comment »

March 6, 2014

Report from Denmark: Designing the new public library at Aarhus, and the People’s Lab

Knud Schulze, manager of the main library in Aarhus, Denmark and Jne Kunze of the People’s Lab in Denmark are giving talks, hosted by the Harvard Library Innovation Lab. (Here are his slides.)

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Knud begins by reminding us how small Denmark is: 5.5M people. Aarhus has a population of about 330,000. [My account is very choppy. The talk was not.]

Now that the process of digitizing all information is well underway, the focus is on what can only be experienced in the library. Before, the library was a space for media. Now the space is a medium. Seriousness was prized in libraries. Now a sense of humor. We’ve built libraries with books and other media to serve an industrial society. Some are truly beautiful, but they’re under-used. Now we’re moving to libraries for networked society.

Three and a half years ago, the Danes wrote a report on public libraries in the knowledge society, and went looking for partnerships, which is unusual for the Danes, says Knud. The new model of the library intersects four spaces: inspiration, learning, performative, and meeting spaces. But the question is what people are going to do in those spaces. Recognition/experience, empowerment, learning, innovation. Knud shows pictures of those activities currently going on in the library.

Two hundred of Denmark’s 500 public libraries are “open libraries” — open 24 hours a day, with staffing only about 12 hours a week. If you have a library card, you can open the door. You can check media in and out, use the Internet, use a PC, read newspapers, study, arrange study circles. “The point is to let users take control.”

A law in 2007 said there had to be one-stop shopping for govt services. Most libraries offer these services. You go to the library for a passport, drivers license, health insurance, etc. Every citizen needs to have a personal account for communication with banks, from the state (e.g., about taxes). Libraries have helped educate the citizenry about this.

Often libraries are community centers that involve public and private sectors and a wide range of services. Sometimes the other services overwhelm the library services. “People ask me, ‘Where is the public library in this?’, and I say, ‘Think about the library as the glue.’”

There have to be innovation spaces in the local libraries.

The Danish Digital Library (Danskernes Digtale Bibliotek) is an open source infrastructure for digital objects, including a resouce management system for the whole country, and to purchase digital content. All its digital services are accessible anywhere in the world. 86 of the 98 municipal library systems have contributed to a shared contract for a new library system based on Open Source. They share operations and development. “There’s a very good business case.”

So, why Dokk1, the new library?

Libraries are symbols of development and innovation in the society. They drive city development. They add new stories about the town. All public libraries are examples of the citizens’ interest in innovation. E.g., the Opera, Munch museum and library in Oslo have transformed the waterfront and brought a new identity to the city. Helsinki, Birmingham (UK), and others as well. “The same will happen in Aarhus, we hope.”

DOKK1 is being built into the harbor, “transforming it into an open sea front.” There’s 200,000 sq. feet of library, parking for 1,000 cars, two new urban harbor squares, a light rail station. Cost: US$390M . It will open in early 2015.

The front of the current library features new programs every few months, rather than the entrance being a way of controlling the users. They’ve run projects like iFloor (social interaction), a news lab (producing TV), AI robots, displays that capture and freeze images of people interacting with it, and much more. The building needs to interact with its surroundings and adapt to it, says Knud.

DOKK1 is “no building with an advanced roof.

“It’s all about facilitating relations.” “The library of the future is all about people.” It will be a user-driven process: “From tradition to transcendence so users can deconstruct their old knowledge about libraries.” Knud shows a photo of children doing searches by interacting with blocks on the floor. They paid no attention to the info on the screens.

They have partnerships with the Gates Foundation, Chicago Public Libraries, IDEO, and the Aarhus Public Library

Another project: “Intelligent Libraries”: how to “work smart” by improving logistics. The project knows where all the books are in all the nation’s libraries, and how often they’re used. They use “media hotels”: “local or remote storage of overflow, slow moving materials.”

The name “DOKK1″ came from a competition. 1,250 proposals. Seven were considered by a jury. “It’s about branding the library.” 90% of all city inhabitants should know about the new project. In August 2013 75% did. In the existing library, users are invited to engage in the “mental construction” of the new one.

Now Jane Kunze talks about People’s Lab. She begins with a sign: “Shut up and hack.”

They’ve been setting up labs for the past two years to test different ways of interacting with users. Innovation is important to the Danish govt. (Denmark was just rated the most innovative country in Europe.) How can the public library be part of this?

They were inspired by Maker culture. Fab labs and maker spaces have been popping up everywhere. There’s also a trend in Denmark to repair rather than replace. And a focus on hand skills and not just academic knowledge. Also rapid prototyping, with inspiration from design thinking (as per IDEO).

The People’s Lab is a result of a collaboration among the library, community, and partners. Partners include public libraries, Aarhus School of Architecture, Moesgaard Museum, Roskilde festival, Orange Innovation, and more.

When they began, it was about kick-ass technology. But , while tech is fun, it’s really about people and community-building. “Don’t wait to involve people until your grand opening.” People will see your imperfections “but that’s part of what will make them committed to the place.”

The six labs:

  • TechLab: having a maker in residence is powerful. See Valdemar’s hovercraft:

  • Guitar Lab. Use local people and their passions.

  • Dreamcity: A maker space at the Roskilde rock festival. “You have to put yourself into play. You have to be there with your whole personality, and not just your professional side.”

  • WasteLab: Trash from dump “spiced up with specially selected trash.” “Creativity comes from chaos — stop tidying!”

  • Magentic Groove Memories: cut your own vinyl records and fix up old radios

  • The first maker faire in Aarhus will be 2014

They’ve been building a ladder of involvement, so people can come in for something basic and find themselves increasingly engaged — “small steps that make it possible for people to become more and more free in their thinking.”

They’ve learned that when the community already has hacker spaces and maker spaces, maybe the library should just be a gate to this ecosystem, opening them up to a broader public. Maybe the library is a place where people are introduced to making and working more creatively with their hands. “You can work with maker culture without having a makerspace.” You don’t have to have a room dedicated to machinery, especially for the smaller communities.

Q&A [with six of the Danes responding]

Q: Is this like a library plus the SF Exploratorium

A: Yes.

A: We’re looking at how to create relationships among the patrons, staff, the media…

A: We want to make a place where people get involved in different kinds of competencies.

Q: Many of the other libraries you showed are on the edge of the city. Are you trying to make the library a destination? In Boston I wouldn’t let my 14 yr old grandchild go down to the harbor by himself.

A: In Aarhus, children move through the city at 10-12 yrs old. They can get to the new library by public transportation or bike. But we are trying to transform the city so that it is looking out, not in.

Q: We’re seeing more random innovation in library spaces in this city, as opposed to your carefully planned and articulated change. (1) You’re designers, but it’s about designing the interaction. (2) How can you bring unique, local materials into this interactive environment. (3) At archives, people are now curating their own memories, with a community collective approach. (4) We have generations of professionals, so just building new locations may not change things.

A: In Denmark we have a long tradition of tcollecting of local historical materials. E.g., we have lots of photos of cattle and farms, so we crowd-sourced geolocating them and put on Google Maps. We have a lot of materials that could be used.

A: We have a new project. When you get your grandparents’ old documents, you digitize them and load them on a national server. You’re in control of how open they should be. That’s in test now.

A: We have lot of projects that focus on seniors.

A: At the WasteLab, one of the most active participants was a 70 year old woman. She made herself into the welcoming host. One day she came in with a smart phone she had won. People at the WasteLab sat with her and helped her learn how to use it; she’d found a community to ask. Creating a variety of offers — from more traditional to the newer — involves everyone.

A: We see the library as a space for that kind of relationships.

Q: Are you getting any support from the Royal Library?

A: It has no relationship to public libraries.

Q: Design is crucial. It can signal to people that there’s more here than you expect. Modern libraries send a signal that it’s not only a place for research or study. Putting up those popup labs in your lobby is one of the most useful devices; people are in the experience without having to look for it. It’s the best of what Disney is trying to accomplish. The popup libraries are the gateway drug.

Q: How might this fit into an academic library space?

A: We collaborate with a couple of universities, but they’re two different worlds. University libraries generally see users as people to whom they provide services, rather than as people who can contribute to the library. It’s a question of what the academic libraries want students to do in the library. To read? To learn from other students? You might experiment with a common space to bring together these different communities.

A: You have a lifelong relationship with your local library, but only for a few years with your university library.

Q: Ultimately all libraries are shared resources, whatever those resources are. That’s a great argument for sharing access to all the tools we’ve heard about. Not every library needs its own 3D printer, but they could use access to one.

A: In Norway, a particular university library is divided into five areas, but with big shared spaces with tables, chairs, and menus. Then they put in empty shelves. The room was totally over-crowded and totally re-arranged.

Q: At Tisch Library at Tufts they’re renovating and creating group study space for people working alone but in a public space. Also, they’ve installed a media lab. At the Northeastern U Library, it felt like I was at an airport. There were fixed spaces and terminals, but there must have been 500 students in there. It was like a beehive. At the Madison Public Library they have The Bubbler, media lab and performance space. These are blurring the lines.

[Loved these talks. These folks are taking deep principles and embodying them in their spaces.]

Be the first to comment »

Dan Cohen on the DPLA’s cloud proposal to the FCC

I’ve posted a podcast interview with Dan Cohen, the executive director of the Digital Public Library of America about their proposal to the FCC.

The FCC is looking for ways to modernize the E-Rate program that has brought the Internet to libraries and schools. The DPLA is proposing DPLA Local, which will enable libraries to create online digital collections using the DPLA’s platform.

I’m excited about this for two reasons beyond the service it would provide.

First, it could be a first step toward providing cloud-based library services, instead of the proprietary, closed, expensive systems libraries typically use to manage their data. (Evergreen, I’m not talking about you, you open source scamp!)

Second, as libraries build their collections using DPLA Local, their metadata is likely to assume normalized forms, which means that we should get cross-collection discovery and semantic riches.

Here’s the proposal itself. And here’s where you can comment to the FCC about it.

Be the first to comment »

February 1, 2014

Linked Data for Libraries: And we’re off!

I’m just out of the first meeting of the three universities participating in a Mellon grant — Cornell, Harvard, and Stanford, with Cornell as the grant instigator and leader — to build, demonstrate, and model using library resources expressed as Linked Data as a tool for researchers, student, teachers, and librarians. (Note that I’m putting all this in my own language, and I was certainly the least knowledgeable person in the room. Don’t get angry at anyone else for my mistakes.)

This first meeting, two days long, was very encouraging indeed: it’s a superb set of people, we are starting out on the same page in terms of values and principles, and we enjoyed working with one another.

The project is named Linked Data for Libraries (LD4L) (minimal home page), although that doesn’t entirely capture it, for the actual beneficiaries of it will not be libraries but scholarly communities taken in their broadest sense. The idea is to help libraries make progress with expressing what they know in Linked Data form so that their communities can find more of it, see more relationships, and contribute more of what the communities learn back into the library. Linked Data is not only good at expressing rich relations, it makes it far easier to update the dataset with relationships that had not been anticipated. This project aims at helping libraries continuously enrich the data they provide, and making it easier for people outside of libraries — including application developers and managers of other Web sites — to connect to that data.

As the grant proposal promised, we will use existing ontologies, adapting them only when necessary. We do expect to be working on an ontology for library usage data of various sorts, an area in which the Harvard Library Innovation Lab has done some work, so that’s very exciting. But overall this is the opposite of an attempt to come up with new ontologies. Thank God. Instead, the focus is on coming up with implementations at all three universities that can serve as learning models, and that demonstrate the value of having interoperable sets of Linked Data across three institutions. We are particularly focused on showing the value of the high-quality resources that libraries provide.

There was a great deal of emphasis in the past two days on partnerships and collaboration. And there was none of the “We’ll show ‘em where they got it wrong, by gum!” attitude that in my experience all too often infects discussions on the pioneering edge of standards. So, I just got to spend two days with brilliant library technologists who are eager to show how a new generation of tech, architecture, and thought can amplify the already immense value of libraries.

There will be more coming about this effort soon. I am obviously not a source for tech info; that will come soon and from elsewhere.

2 Comments »

December 14, 2013

Are tags over-rated?

Jeff Atwood [twitter:codinghorror] , a founder of Stackoverflow and Discourse.org — two of my favorite sites — is on a tear about tags. Here are his two tweets that started the discussion:

I am deeply ambivalent about tags as a panacea based on my experience with them at Stack Overflow/Exchange. Example: pic.twitter.com/AA3Y1NNCV9

Here’s a detweetified version of the four-part tweet I posted in reply:

Jeff’s right that tags are not a panacea, but who said they were? They’re a tool (frequently most useful when combined with an old-fashioned taxonomy), and if a tool’s not doing the job, then drop it. Or, better, fix it. Because tags are an abstract idea that exists only in particular implementations.

After all, one could with some plausibility claim that online discussions are the most overrated concept in the social media world. But still they have value. That indicates an opportunity to build a better discussion service. … which is exactly what Jeff did by building Discourse.org.

Finally, I do think it’s important — even while trying to put tags into a less over-heated perspective [do perspectives overheat??] — to remember that when first introduced in the early 2000s, tags represented an important break with an old and long tradition that used the authority to classify as a form of power. Even if tagging isn’t always useful and isn’t as widely applicable as some of us thought it would be, tagging has done the important work of telling us that we as individuals and as a loose collective now have a share of that power in our hands. That’s no small thing.

2 Comments »

December 11, 2013

Americans love themselves some libraries

Here’s the summary from a new Pew Internet & American Life survey of 6,224 Americans 16 years and older:

Some 90% of Americans ages 16 and older said that the closing of their local public library would have an impact on their community, with 63% saying it would have a “major” impact. Asked about the personal impact of a public library closing, two-thirds (67%) of Americans said it would affect them and their families, including 29% who said it would have a major impact. Moreover, the vast majority of Americans ages 16 and older say that public libraries play an important role in their communities:

  • 95% of Americans ages 16 and older agree that the materials and resources available at public libraries play an important role in giving everyone a chance to succeed;

  • 95% say that public libraries are important because they promote literacy and a love of reading;

  • 94% say that having a public library improves the quality of life in a community;

  • 81% say that public libraries provide many services people would have a hard time finding elsewhere.

I find it encouraging that while only 54% of Americans have used a public library in the past 12 months, 95% think libraries play an important social role. The half of Americans who don’t use public libraries still see the importance of maintaining them. The Pew report confirms that “Libraries are also particularly valued by those who are unemployed, retired, or searching for a job, as well as those living with a disability and internet users who lack home internet access.”

The full report is available online for free because Pew.

Be the first to comment »

December 9, 2013

A day at the Bogota Library

The Luis Ángel Arango Library and the National Library of Colombia have been bringing in speakers this year to talk about the future of libraries, the relation of digital and cultural worlds, and library innovation, partially sponsored by the U.S. State Dept. through our local embassy. Friday was my turn. What a privilege!

And I mean privilege. After spending the morning wandering around a section of Bogota that we really enjoyed — so interesting and lively, and people wer so kind to us — but we were later told we should have avoided, my wife and I met with Alexis de Greiff [twitter: ahdegreiffa] the director of the Luis Ángel Arango Library (web site here). He and I had had lunch this summer when he was touring library innovation labs in the US. Alexis has the opportunity to re-do some of the space in the National Library. It was a fascinating to hear the sorts of possibilities he’s considering, including creating an innovation lab somewhat along the lines of the Harvard Library Innovation Lab; we talked about the wisdom of including researchers in the mix, something we’d love to do here at our Lab.

Alexis has a huge opportunity because the Library is such an important part of the life of the city and of the country. It has well over a million books, making it the largest library in Latin America. And it gets an astonishing 5,000 visitors a day, making it more used than the New York Public Library. Many of the patrons are university students, so the Library has elements of both a research library (including the most complete collection in the world of materials about Colombian heritage) and a public library. It is physically located in the midst of an amazing cultural complex that includes an absolutely stunning concert hall and art museums. If you think there’s value in having art, culture, education, and knowledge intersect, come to the Luis Ángel Arango Library Library.

Luis Ángel Arango Library

The Luis Ángel Arango Library, and the church up the street.

We got a tour of the Library from top to bottom, led by Diana Restrepo Torres, the director of technology (collections development and cataloguing), and Juan Pablo Siza Ramírez, the coordinator of digital resources. The current building was created in the 1950s and still exhibits that era’s openness and cleanliness of line. Significant additions and modifications have been made over the years. It all feels light, airy, and inviting.

About 20% of the works are on open stacks. The rest are in the basement and are fetched via a conveyor system. (The works are shelved according to the Dewey Decimal System, but have a color stripe on the spine to indicate the rough area in which they are shelved, a quick way to get works to their first stop.)

Books with color-coded stripes at the bottom

Books in basement with conveyor systen

Books in basement, with conveyor upwards

The Library includes sound-proofed rooms where people can practice playing musical instruments, a room for playing board games (with a balcony overlooking Mount Monserrate), lots of computers (but only designated areas with wifi), a rare books room, cafes, and much more. It is a big library.

Game Room at the Bogota Library

Game Room

Mt. Monserrate from the game room window

Mt. Monserrate from the game room window

And it is a library completely committed to serving its community however it can.

The Library is funded by the Central Bank (like our Federal Reserve). The Bank spends about a third of its budget supporting Colombian culture, and for this reason the Library is able to purchase cultural items that otherwise might slip out of Colombian hands. For example, Diana and Juan Pablo should us a set of volumes from the 1880s that were simply astonishingly. They are part of a ten volume set the Library recently purchased, hand-written and illustrated by José María Gutierrez de Alba, a Spanish spy who traveled through Colombia for about 13 years. The Library is going to digitize the volumes and post them online in its Virtual Library under an Open Access license. The thought that everyone will be able to see these pages makes me happy.

1880 manuscript

Juan Pablo tells me that the Spanish equivalent to "awesome!" is "grandiose!" This is a grandiose library.

3 Comments »

November 15, 2013

[liveblog] Noam Chomsky and Bart Gellman at Engaging Data

I’m at the Engaging Data 2013conference where Noam Chomsky and Pulitzer Prize winner (twice!) Barton Gellman are going to talk about Big Data in the Snowden Age, moderated by Ludwig Siegele of the Economist. (Gellman is one of the three people Snowden vouchsafed his documents with.) The conference aims at having us rethink how we use Big Data and how it’s used.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

LS: Prof. Chomsky, what’s your next book about?

NC: Philosophy of mind and language. I’ve been writing articles that are pretty skeptical about Big Data. [Please read the orange disclaimer: I'm paraphrasing and making errors of every sort.]

LS: You’ve said that Big Data is for people who want to do the easy stuff. But shouldn’t you be thrilled as a linguist?

NC: When I got to MIT at 1955, I was hired to work on a machine translation program. But I refused to work on it. “The only way to deal with machine translation at the current stage of understanding was by brute force, which after 30-40 years is how it’s being done.” A principled understanding based on human cognition is far off. Machine translation is useful but you learn precisely nothing about human thought, cognition, language, anything else from it. I use the Internet. Glad to have it. It’s easier to push some buttons on your desk than to walk across the street to use the library. But the transition from no libraries to libraries was vastly greater than the transition from librarites to Internet. [Cool idea and great phrase! But I think I disagree. It depends.] We can find lots of data; the problem is understanding it. And a lot of data around us go through a filter so it doesn’t reach us. E.g., the foreign press reports that Wikileaks released a chapter about the secret TPP (Trans Pacific Partnership). It was front page news in Australia and Europe. You can learn about it on the Net but it’s not news. The chapter was on Intellectual Property rights, which means higher prices for less access to pharmaceuticals, and rams through what SOPA tried to do, restricting use of the Net and access to data.

LS: For you Big Data is useless?

NC: Big data is very useful. If you want to find out about biology, e.g. But why no news about TPP? As Sam Huntington said, power remains strongest in the dark. [approximate] We should be aware of the long history of surveillance.

LS: Bart, as a journalist what do you make of Big Data?

BG: It’s extraordinarily valuable, especially in combination with shoe-leather, person-to-person reporting. E.g., a colleague used traditional reporting skills to get the entire data set of applicants for presidential pardons. Took a sample. More reporting. Used standard analytics techniques to find that white people are 4x more likely to get pardons, that campaign contributors are also more likely. It would be likely in urban planning [which is Senseable City Labs' remit]. But all this leads to more surveillance. E.g., I could make the case that if I had full data about everyone’s calls, I could do some significant reporting, but that wouldn’t justify it. We’ve failed to have the debate we need because of the claim of secrecy by the institutions in power. We become more transparent to the gov’t and to commercial entities while they become more opaque to us.

LS: Does the availability of Big Data and the Internet automatically mean we’ll get surveillance? Were you surprised by the Snowden revelations>

NC: I was surprised at the scale, but it’s been going on for 100 years. We need to read history. E.g., the counter-insurgency “pacification” of the Philippines by the US. See the book by McCoy [maybe this. The operation used the most sophisticated tech at the time to get info about the population to control and undermine them. That tech was immediately used by the US and Britain to control their own populations, .g., Woodrow Wilson’s Red Scare. Any system of power — the state, Google, Amazon — will use the best available tech to control, dominate, and maximize their power. And they’ll want to do it in secret. Assange, Snowden and Manning, and Ellsberg before them, are doing the duty of citizens.

BG: I’m surprised how far you can get into this discussion without assuming bad faith on the part of the government. For the most part what’s happening is that these security institutions genuinely believe most of the time that what they’re doing is protecting us from big threats that we don’t understand. The opposition comes when they don’t want you to know what they’re doing because they’re afraid you’d call it off if you knew. Keith Alexander said that he wishes that he could bring all Americans into this huddle, but then all the bad guys would know. True, but he’s also worried that we won’t like the plays he’s calling.

LS: Bruce Schneier says that the NSA is copying what Google and Yahoo, etc. are doing. If the tech leads to snooping, what can we do about it?

NC: Govts have been doing this for a century, using the best tech they had. I’m sure Gen. Alexander believes what he’s saying, but if you interviewed the Stasi, they would have said the same thing. Russian archives show that these monstrous thugs were talking very passionately to one another about defending democracy in Eastern Europe from the fascist threat coming from the West. Forty years ago, RAND released Japanese docs about the invasion of China, showing that the Japanese had heavenly intentions. They believed everything they were saying. I believe these are universals. We’d probably find it for Genghis Khan as well. I have yet to find any system of power that thought it was doing the wrong thing. They justify what they’re doing for the noblest of objectives, and they believe it. The CEOs of corporations as well. People find ways of justifying things. That’s why you should be extremely cautious when you hear an appeal to security. It literally carries no information, even in the technical sense: it’s completely predictable and thus carries no info. I don’t doubt that the US security folks believe it, but it is without meaning. The Nazis had their own internal justifications.

BG: The capacity to rationalize may be universal, but you’ll take the conversation off track if you compare what’s happening here to the Stasi. The Stasi were blackmailing people, jailing them, preventing dissent. As a journalist I’d be very happy to find that our govt is spying on NGOs or using this power for corrupt self-enriching purposes.

NC: I completely agree with that, but that’s not the point: The same appeal is made in the most monstrous of circumstances. The freedom we’ve won sharply restricts state power to control and dominate, but they’ll do whatever they can, and they’ll use the same appeals that monstrous systems do.

LS: Aren’t we all complicit? We use the same tech. E.g., Prof. Chomsky, you’re the father of natural language processing, which is used by the NSA.

NC: We’re more complicit because we let them do it. In this country we’re very free, so we have more responsibility to try to control our govt. If we do not expose the plea of security and separate out the parts that might be valid from the vast amount that’s not valid, then we’re complicit because we have the oppty and the freedom.

LS: Does it bug you that the NSA uses your research?

NC: To some extent, but you can’t control that. Systems of power will use whatever is available to them. E.g., they use the Internet, much of which was developed right here at MIT by scientists who wanted to communicate freely. You can’t prevent the powers from using it for bad goals.

BG: Yes, if you use a free online service, you’re the product. But if you use a for-pay service, you’re still the product. My phone tracks me and my social network. I’m paying Verizon about $1,000/year for the service, and VZ is now collecting and selling my info. The NSA couldn’t do its job as well if the commercial entities weren’t collecting and selling personal data. The NSA has been tapping into the links between their data centers. Google is racing to fix this, but a cynical way of putting this is that Google is saying “No one gets to spy on our customers except us.”

LS: Is there a way to solve this?

BG: I have great faith that transparency will enable the development of good policy. The more we know, the more we can design policies to keep power in place. Before this, you couldn’t shop for privacy. Now a free market for privacy is developing as the providers now are telling us more about what they’re doing. Transparency allows legislation and regulation to be debated. The House Repubs came within 8 votes of prohibiting call data collection, which would have been unthinkable before Snowden. And there’s hope in the judiciary.

NC: We can do much more than transparency. We can make use of the available info to prevent surveillance. E.g., we can demand the defeat of TPP. And now hardware in computers is being designed to detect your every keystroke, leading some Americans to be wary of Chinese-made computers, but the US manufacturers are probably doing it better. And manufacturers for years have been trying to dsign fly-sized drones to collect info; that’ll be around soon. Drones are a perfect device for terrorists. We can learn about this and do something about it. We don’t have to wait until it’s exposed by Wikileaks. It’s right there in mainstream journals.

LS: Are you calling for a political movement?

NC: Yes. We’re going to need mass action.

BG: A few months ago I noticed a small gray box with an EPA logo on it outside my apartment in NYC. It monitors energy usage, useful to preventing brown outs. But it measures down to the apartment level, which could be useful to the police trying to establish your personal patterns. There’s no legislation or judicial review of the use of this data. We can’t turn back the clock. We can try to draw boundaries, and then have sufficient openness so that we can tell if they’ve crossed those boundaries.

LS: Bart, how do you manage the flow of info from Snowden?

BG: Snowden does not manage the release of the data. He gave it to three journalists and asked us to use your best judgment — he asked us to correct for his bias about what the most important stories are — and to avoid direct damage to security. The documents are difficult. They’re often incomplete and can be hard to interpret.

Q&A

Q: What would be a first step in forming a popular movement?

NC: Same as always. E.g., the women’s movement began in the 1960s (at least in the modern movement) with consciousness-raising groups.

Q: Where do we draw the line between transparency and privacy, given that we have real enemies?

BG: First you have to acknowledge that there is a line. There are dangerous people who want to do dangerous things, and some of these tools are helpful in preventing that. I’ve been looking for stories that elucidate big policy decisions without giving away specifics that would harm legitimate action.

Q: Have you changed the tools you use?

BG: Yes. I keep notes encrypted. I’ve learn to use the tools for anonymous communication. But I can’t go off the grid and be a journalist, so I’ve accepted certain trade-offs. I’m working much less efficiently than I used to. E.g., I sometimes use computers that have never touched the Net.

Q: In the women’s movement, at least 50% of the population stood to benefit. But probably a large majority of today’s population would exchange their freedom for convenience.

NC: The trade-off is presented as being for security. But if you read the documents, the security issue is how to keep the govt secure from its citizens. E.g., Ellsberg kept a volume of the Pentagon Papers secret to avoid affecting the Vietnam negotiations, although I thought the volume really only would have embarrassed the govt. Security is in fact not a high priority for govts. The US govt is now involved in the greatest global terrorist campaign that has ever been carried out: the drone campaign. Large regions of the world are now being terrorized. If you don’t know if the guy across the street is about to be blown away, along with everyone around, you’re terrorized. Every time you kill an Al Qaeda terrorist, you create 40 more. It’s just not a concern to the govt. In 1950, the US had incomparable security; there was only one potential threat: the creation of ICBM’s with nuclear warheads. We could have entered into a treaty with Russia to ban them. See McGeorge Bundy’s history. It says that he was unable to find a single paper, even a draft, suggesting that we do something to try to ban this threat of total instantaneous destruction. E.g., Reagan tested Russian nuclear defenses that could have led to horrible consequences. Those are the real security threats. And it’s true not just of the United States.

1 Comment »

November 13, 2013

Protecting library privacy with a hard opt-in

Marshall Breeding gave a talk today to the Harvard Library system as part of its Discoverability Day. Marshall is an expert in discovery systems, i.e., technology that enables library users to find what they need and what they didn’t know they needed, across every medium and metadata boundary.

It’s a stupendously difficult problem, not least because the various providers of the metadata about non-catalog items — journal articles, etc. — don’t cooperate. On top of that, there’s a demand for “single searchbox solutions,” so that you can not only search everything the Googley way, but the results that come back will magically sort themselves in the order of what’s most useful to you. To bring us closer to that result, Marshall said that systems are beginning to use personal profiles and usage data. The personal profile lets the search engine know that you’re an astronomer, so that when you search for “mercury” you’re probably not looking for information about the chemical, the outboard motor company, or Queen. The usage data will let the engine sort based on what your community has voted on with its checkouts, recommendations, etc.

Marshall was careful to stipulate that using profiles or usage data will require user consent. I’m very interested in this because the Library Innovation Lab where I work has created an online library browser — StackLife — that sorts results based on a variety of measures of Harvard community usage. StackLife computes a “stackscore” based on a simple calculation of the number of checkouts by faculty, grad students or undergrads, how many copies are in Harvard’s 73 libraries, and potentially other metrics such as how often it’s put on reserve or called back early. The stackscores are based on 10-year aggregates without any personal identifiers, and with no knowledge of which books were checked out together. And our Awesome Box project, now in more than 40 libraries, provides a returns box into which users can deposit books that they thought were “awesome,” generating particularly delicious user-based (but completely anonymized) data.

Marshall is right: usage data is insanely useful for a community, and I’d love for us to be able to get our hands on more of it. But, I got into a Twitter discussion about the danger of re-identification with Mark Ockerbloom [twitter:jmarkockerbloom] and John Wilbanks [twitter:wilbanks], two people I greatly respect, and I agree that a simple opt-in isn’t enough, because people may not fully recognize the possibility that their info may be made public. So, I had an idea.

Suppose you are not allowed to do a “soft” opt-in, by which I mean an opt-in that requires you to read some terms and ticking a box that permits the sharing of information about what you check out from the library. Instead, you would be clearly told that you are opting-in to publishing your check-outs. Not to letting your checkouts be made public if someone figures out how to get them, or even to making your checkouts public to anyone who asks for them. No, you’d be agreeing to having a public page with your name on it that lists your checkouts. This is a service a lot of people want anyway, but the point would be to make it completely clear to you that ticking the checkbox means that, yes, your checkouts are so visible that they get their own page. And if you want to agree to the “soft” opt-in, but don’t want that public page posted, you can’t.

Presumably the library checkout system would allow you to exempt particular checkouts, but by default they all get posted. That would, I think, drive home what the legal language expressed in the “soft” version really entails.

 


Here are a couple of articles by Marshall Breeding: 1. Infotoday 2. Digital Shift

1 Comment »

Next Page »


Switch to our mobile site