[2b2k][everythingismisc]”Big data for books”: Harvard puts metadata for 12M library items into the public domain
(Here’s a version of the text of a submission I just made to BoingBong through their “Submitterator”)
Harvard University has today put into the public domain (CC0) full bibliographic information about virtually all the 12M works in its 73 libraries. This is (I believe) the largest and most comprehensive such contribution. The metadata, in the standard MARC21 format, is available for bulk download from Harvard. The University also provided the data to the Digital Public Library of America’s prototype platform for programmatic access via an API. The aim is to make rich data about this cultural heritage openly available to the Web ecosystem so that developers can innovate, and so that other sites can draw upon it.
This is part of Harvard’s new Open Metadata policy which is VERY COOL.
Speaking for myself (see disclosure), I think this is a big deal. Library metadata has been jammed up by licenses and fear. Not only does this make accessible a very high percentage of the most consulted library items, I hope it will help break the floodgates.
(Disclosures: 1. I work in the Harvard Library and have been a very minor player in this process. The credit goes to the Harvard Library’s leaders and the Office of Scholarly Communication, who made this happen. Also: Robin Wendler. (next day:) Also, John Palfrey who initiated this entire thing. 2. I am the interim head of the DPLA prototype platform development team. So, yeah, I’m conflicted out the wazoo on this. But my wazoo and all the rest of me is very very happy today.)
Finally, note that Harvard asks that you respect community norms, including attributing the source of the metadata as appropriate. This holds as well for the data that comes from the OCLC, which is a valuable part of this collection.