Joho the Blog » [2b2k][everythingismisc]”Big data for books”: Harvard puts metadata for 12M library items into the public domain

[2b2k][everythingismisc]”Big data for books”: Harvard puts metadata for 12M library items into the public domain

(Here’s a version of the text of a submission I just made to BoingBong through their “Submitterator”)

Harvard University has today put into the public domain (CC0) full bibliographic information about virtually all the 12M works in its 73 libraries. This is (I believe) the largest and most comprehensive such contribution. The metadata, in the standard MARC21 format, is available for bulk download from Harvard. The University also provided the data to the Digital Public Library of America’s prototype platform for programmatic access via an API. The aim is to make rich data about this cultural heritage openly available to the Web ecosystem so that developers can innovate, and so that other sites can draw upon it.

This is part of Harvard’s new Open Metadata policy which is VERY COOL.

Speaking for myself (see disclosure), I think this is a big deal. Library metadata has been jammed up by licenses and fear. Not only does this make accessible a very high percentage of the most consulted library items, I hope it will help break the floodgates.

(Disclosures: 1. I work in the Harvard Library and have been a very minor player in this process. The credit goes to the Harvard Library’s leaders and the Office of Scholarly Communication, who made this happen. Also: Robin Wendler. (next day:) Also, John Palfrey who initiated this entire thing. 2. I am the interim head of the DPLA prototype platform development team. So, yeah, I’m conflicted out the wazoo on this. But my wazoo and all the rest of me is very very happy today.)

Finally, note that Harvard asks that you respect community norms, including attributing the source of the metadata as appropriate. This holds as well for the data that comes from the OCLC, which is a valuable part of this collection.

16 Responses to “[2b2k][everythingismisc]”Big data for books”: Harvard puts metadata for 12M library items into the public domain”

  1. That would be Robin Wendler, Metadata Analyst for the Harvard Library: http://sylvia.harvard.edu/~robin/wendler.html

  2. Ack! Of course it is, Roy. I’ve fixed it. Thanks! (And sorry, Robin! My only two excuses: I wrote it early on a plane. And I’m an idiot. I do know your name!)

  3. Thank, Roy, but now I have to update that page, which was last touched in 2007!

  4. Mr. David Weinberger: There is a rumor going around the internet that you have cloven hooves instead of feet. Could you confirm or deny that you have cloven hooves instead of feet?

  5. […] the rest here:  Joho the Blog » [2b2k][everythingismisc]“Big data for books … « #10: Prague Winter: A Personal Story of Remembrance and War, 1937-1948 Highlights Of LA […]

  6. […] boingboing via Hyperorg; photo via […]

  7. […] Weinberger writes, “Harvard University has today put into the public domain (CC0) full bibliographic […]

  8. […] have implications for the organizations which sell library metadata. Joho the Blog reports, “‘Big Data for Books’: Harvard Puts Metadata for 12M Library Items into the Public Domain.” We learn from the write up: “Harvard University has today put into the public domain […]

  9. […] [2b2k][everythingismisc]“Big data for books”: Harvard puts metadata for 12M library items into t… (Here’s a version of the text of a submission I just made to BoingBong through their “Submitterator”). Harvard University has today put into the public domain (CC0) full bibliographic information about virtually all the 12M works in its 73 libraries. This is (I believe) the largest and most comprehensive such contribution. MORE >> […]

  10. […] Learn more about this project […]

  11. […] The library catalog dataset comprises bibliographic records of almost all of Harvard Library’s gigantic collection. It’s available under a CC 0 public domain license for bulk download, and can be accessed through an API via the DPLA’s prototype platform. More info here. […]

  12. […] [2b2k][everythingismisc]“Big data for books”: Harvard puts metadata for 12M library items into t… (Here’s a version of the text of a submission I just made to BoingBong through their “Submitterator”). Harvard University has today put into the public domain (CC0) full bibliographic information about virtually all the 12M works in its 73 libraries. This is (I believe) the largest and most comprehensive such contribution. MORE >> […]

  13. […] [2b2k][everythingismisc]“Big data for books”: Harvard puts metadata for 12M library items into t… (Here’s a version of the text of a submission I just made to BoingBong through their “Submitterator”). Harvard University has today put into the public domain (CC0) full bibliographic information about virtually all the 12M works in its 73 libraries. This is (I believe) the largest and most comprehensive such contribution. MORE >> […]

  14. […] [2b2k][everythingismisc]“Big data for books”: Harvard puts metadata for 12M library items into t… (Here’s a version of the text of a submission I just made to BoingBong through their “Submitterator”). Harvard University has today put into the public domain (CC0) full bibliographic information about virtually all the 12M works in its 73 libraries. This is (I believe) the largest and most comprehensive such contribution. MORE >> […]

  15. […] [2b2k][everythingismisc]“Big data for books”: Harvard puts metadata for 12M library items into t…DAVID WEINBERGER |  TUESDAY, APRIL 24, 2012 […]

  16. […] release (“big data for books,” as David Weinberger calls it) is, to put it mildly, a Very Big Deal. Harvard’s collections are both deep and broad, […]

Leave a Reply


Web Joho only

Comments (RSS).  RSS icon