Joho the Blog1,000 downloads - Joho the Blog


I learned yesterday from Robin Wendler (who worked mightily on the project) that Harvard’s library catalog dataset of 12.3M records has been bulk downloaded a thousand times, excluding the Web spiderings. That seems like an awful lot to me, and makes me happy.

The library catalog dataset comprises bibliographic records of almost all of Harvard Library’s gigantic collection. It’s available under a CC 0 public domain license for bulk download, and can be accessed through an API via the DPLA’s prototype platform. More info here.

One Response to “1,000 downloads”

  1. The phrase “excluding the Web spiderings” may be “Excluding a list of known Web spiders (but not excluding some other unknown Web spiders)”. Not all Web spiders declare themselves or are otherwise easy to identify. This can be a problem with download statistics.

Web Joho only

Comments (RSS).  RSS icon