Joho the Blog
|
|
|
October 23, 2007
Aaron Swartz is giving a Berkman talk on the Open Library project. [As always, I'm typing quickly, missing stuff, getting things wrong. You can hear the whole thing as Media Berkman.] The basic idea is to give each page a Web page that collects all the information about that book. Books have never had "a first class place on the web." They've been distributed across publishers' Web sites, etc. The book pages are a "structured wiki." Wikipedia lacks the structure required to let computers access it. So, the OL wiki page has separate fields for all of the metadata about it. E.g., click on the author's name and you get a list of all the books the author has written. It has to be really open, Aaron says. "This is something that has to be a collaboration among a lot of different people." They've brought in publishers, reviews, authors, etc. It's all available for free, for download or reuse. Anyone can use it. When books are out of copyright, the OL brings in the full text, when available. But that raises issues about how people want to read books on line he says. OL also wants to be able to point people to libraries that have copies of books. There are "Buy, borrow or download" options for every book (when possible). Readers can review books on the site. The first thing librarian argued about when they saw OL was what subject classification system to use. "We don't have to choose on the Internet. We can store all the category systems and let people choose which ones they want." Likewise with all the different identifiers, e.b., ISBN, OCLC numbers, OL identifiers. ("We have to make our own identifier system because we're going to have more books.") Ferberization means connecting physical books to all the different abstractions, e.g., print runs, editions, translations, etc. The library world has focused primarily on the physical books on the shelves. "We're going to have to come up with new ways of expressing the relationships," including allowing people to create new relationships, e.g., this book is based on that one, this book refutes that one, this one replaces that one. They'd like to be able to do print on demand, and mail you a physical copy. Also scan on demand: You pay some money and someone goes and scans it. Amazon is doing something similar to OL. But Amazon is trying to sell you stuff and doesn't have good info about books that are out of print. Google Books has very few community features. And there's WorldCat from OCLC, but their business model depends on selling information. OL wants to be a public group available to everyone.
Q: English language only?
Q: (terry martin - law school librarian) Journals?
Q: (wendy) Fuzzy connections? Is West Side Story an adaptation of Romeo and Juliet?
Q: User-generated categories? Q: (jpalfrey) We'd love to hear what you say about how a huge library, such as Harvard Law School Library could contribute... Aaron now talks about the current status of the project. The software is working well, he says. They worried about it because it combines a database and a wiki in a new ways. They have about 10 million catalog records, including 6M from the Library of Congress and 5M from U of NC. They have about 400,000 full text copies, mainly from the Internet Archive. Publishers have been good about providing info. They're looking for collections of reviews. Publishing on-demand works well; they have machines that print and assemble books in about 5 mins. They're going to repopulate the New Orleans public library with the 400,000 books the OL has. OL wants more data. Also, they need more programmers. "If you love books, we'd love your help soon curating and annotating them."
Q: (sj klein) Interlibrary loan for books in copyright?
Q: (gene koo) The publishers are ok with it but the non-profit book association has problems with it?
Q: International?
Q: Are you working with delicious library, etc., to see if they can contribute? Q: What are you doing to reach the social tipping point?
Q: (oliver goodenough) Money? Q: What is the glue? I don't see a unique ID...
Q: (me) FRBR is pretty structured. But the number of ways we might want to connect things is open ended. How are you going to figure out the right way to have structured vs unstructured? Q: (tim spalding - librarything) Tagging allows for multiple categorizations and relationships. E.g., at librarything we got pressure to include more choices under gender. How to resolve?
Q: (sj) Are you hotlinking to any databases? I.e., not importing but doing calls.
Q: Frequently, wikipedia will put in a note to clarify ambiguous categorizations, e.g., a gender categorization that isn't right. But OL is more constrained
Q: (Terry martin) Greg Crain, 25 yrs ago you did something like this for a closed domain. Would you do it this way now?
Q: (sj) What about unpublished works?
Q: (me) And then doesn't it get spammed as people link their self-published book to existing books?
Q: Why won't OCLC give you the data?
Q: (tim) The greatest thing about OL is that it's an OCLC killer. Libraries shouldn't pay for it. Why not just explicitly say that the enormous value is that libraries won't have to pay for cataloging records.
Q: (sj) OCLC culls and curates. OL will need this.
A: Why not just give OL the records?
Even if contracts allow you to distribute your records, wouldn't that annoy OCLC?
Q: (sjklein) What happened to Wikicat?
A: How do you plan on promoting it once you open it up?
Q: You will have solved the age old problem of where the ISBN number points to.
Q: (me) What do you need to succeed? Posted
by D. Weinberger at October 23, 2007 03:26 PM
|
Comments
Good meeting f2f, David. I'm pretty excited by Open Library, from an end-user perspective, anyway. The librarians at the meeting seemed a little leery.
Posted by: Josh Glenn | October 23, 2007 08:16 PM
awesome.
hoping they get connected with the Distributed Proofreaders project, which is another source of data and metadata and scans.
Posted by: Edward Vielmetti | October 24, 2007 01:45 AM
Awesome project. I've thought for a while that there needs to be an equivalent project for audio and video. Something that documents, collates and provides download links for every bit of audio and video that's ever been recorded. Both areas have content that is on media that is self destructing, that is out of print and which is falling out of copyright. It needs to be preserved and made available again.
Posted by: Julian Bond | October 24, 2007 02:47 AM
I won't argue either way about the closed-ness of OCLC's records as an aggregate set, but individually, libraries can share their records. see:
http://www.oclc.org/support/documentation/worldcat/records/guidelines/default.htm
Posted by: K.G. Schneider | October 24, 2007 09:17 PM
Oct. 3, 2007:
Me: Aaron, we should talk. OCLC wants to see how it can participate in OL.
Aaron: I'd love to do a call sometime; unfortunately I need to pack for a trip right now.
Posted by: eric hellman | October 25, 2007 04:15 PM