Joho the Bloglil Archives - Joho the Blog

July 26, 2014

Why I have not been blogging much: it’s my book’s fault and more

My blogging has gone way down in frequency and probably in quality. I think there are two reasons.

First, I’ve been wrapped up in trying to plot a new book. I’ve known for about three years the set of things I want to write about, but I’ve had my usual difficult time figuring out what the book is actually about. For example, when I was planning Everything is Miscellaneous, I knew that I wanted to write about the importance of metadata, but it took a couple of years to figure out that it wasn’t a book about metadata, or a book about the virtue of messiness, or two dozen other attempts at a top line.

I’m going through the same process now. The process itself consists of me writing a summary of each chapter. Except they’re not summaries. They’re like the article version of each chapter and usually work about to about 2,000 words. That’s because a chapter is more like a path than a list, and I can’t tell what’s on the path until I walk it. Given that I work for a living, each complete iteration can take me 2-3 months. And then I realize that I have it all wrong.

I don’t feel comfortable going through this process in public. My investment of time into these book summaries is evidence of how seriously I take them, but my experience shows that nineteen times out of twenty, what I thought was a good idea is a very bad idea. It’s embarrassing. So, I don’t show these drafts even to the brilliant, warm and forgiving Berkman Book Club — a group of Berkfolk writing books — not only because it’s embarrassing but because I don’t want to inflict 10,000 words on them when I know the odds are that I’m going to do a thorough re-write starting tomorrow. The only people who see these drafts are my literary agents and friends David Miller and Lisa Adams, who are crucial critics in helping me to see what’s wrong and right in what I’ve done, and working out the next approach.

Anyway, I’ve been very focused for the past couple of months on figuring out this next book. I think I’m getting closer. But I always think that.

The second reason I haven’t been blogging much: I’ve been mildly depressed. No cause for alarm. It’s situational and it’s getting better. I’ve been looking for a new job because the Harvard Library Innovation Lab that I’ve co-directed, with the fabulous Kim Dulin, for almost five years has been given a new mission. I’m very proud of what we — mainly the amazing developers who are actually more like innovation fellows — have done, and I’m very sorry to leave. Facing unemployment hasn’t helped my mood. There have been some other stresses as well. So: somewhat depressed. And that makes it harder for me to post to my blog for some reason.

I thought you might want to know, not that anyone cares [Sniffles, idly kicks at a stone in the ground, waits for a hug].

8 Comments »

May 13, 2014

Full-text searching Harvard Library: a hacky mashup

Harvard Library has 13M items in its collection. Harvard is digitizing many of them, but as of now you cannot do a full text search of them.

Google Books had 30M books digitized as of a year ago. You can do full-text searches of them.

So, I wrote a little app [Note: I’ve corrected this url.] that lets you search Google Books for text, and then matches up the results with books in Harvard Library. It’s a proof of concept, and I’m counting the concept as proved, or at least as promising. On the other hand, my API key for Google Books only allows 2,000 queries a day, so it’s not practical on the licensing front.

This project runs on top of LibraryCloud, an open source library metadata server created by the Harvard Library Innovation Lab that I co-direct (until Sept.). LibraryCloud provides an API to Harvard’s open library metadata and more. (We’re building a new, more scalable version now. It is, well, super-cool.)

But please note that this HOLLIS full-text search thingy is NOT a project done by our highly innovative and highly skilled developers. I did it, which means if you look at the code (github) you will have a good laugh. Also, this service will fail in dull and interesting ways. I am a horrible programmer. (But I enjoy it.)

Some details below the clickable screenshot…


Click on the image to expand it.
googleHollis screen capture

Click here to go to the app.

The Google Books results are on the left (only ten for now), and HOLLIS on the right.

If a Google result is yellow, there’s a match with a book in HOLLIS. Gray means no match. HOLLIS book titles are prefaced by a number that refers to the Google results number. Clicking on the Google results number (in the circle) hides or shows those works in the stack on the right; this is because some Google books match lots of items in HOLLIS. (Harvard has a lot of copies of King Lear, for example.)

There are two types of matches. If an item matched on a firm identifier (ISBN,OCLC, LCCN), then there’s a checkmark before the title in the HOLLIS stack, and there’s a “Stacklife” button in the Google list. Clicking on the Stacklife button displays the book in Harvard StackLife, a very cool — and prize winning! — library browser created by our Lab. The StackLife stack colorizes items based on how much they’re used by the Harvard community. The thickness of the book indicates its page count and its length indicates its actual physical height.

If there’s no match on the identifiers, then the page looks for a keyword match on the title and an exact match on the author’s last name. This can result in multiple results, not all of which may be right. So, on the Google result there’s a “Feeling lucky” button that will take you to the first match’s entry in StackLife.

The “Google” button takes you to that item’s page at Google Books, filtered by your search terms for your full-texting convenience.

The “View” button pops up the Google Books viewer for that book, if it’s available.

The “Clear stack” button deselects all the items in the Google results, hiding all the items in the HOLLIS stack.

Let me know how this breaks or sucks, but don’t expect it ever to be a robust piece of software. Remember its source.

1 Comment »

August 4, 2011

LibraryLab funded project librapalooza

The Harvard Library Lab, which issues grants for library innovation at the University, is holding a forum in which all the projects get 5 mins to introduce themselves. (The names prefacing these blurbs are of the presenters, who are not always the project leads or developers.)

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Sebastian Diaz: Slideshow generator. Makes it easy to create slideshows out of images from image repositories. It initially is using the VIA repository. You can search by keyword, select the slides, set the delay between slides, and publish it. It’s intended for classroom use, or, of course, for anyone.

Sebastian Diaz: Enhanced Social Tagging for Classifiation and Current Awareness. It’s currently under development. (The code is at Github.) It enables the merging of tag sets that use different vocabularies without having to define a dictionary ahead of time. The tool produces a filter, “and you aggregate based on that filter,” renaming tags (or associating them?) based on the filter. People can make their own aggregated feed out of these multiple tag sets. It’s a form of behavior-driven development.

Sebastian Diaz: Deposit@Harvard. This tool eases the process of adding open access material to open access repositories, including Harvard DASH. This is an issue because not all repositories have the same APIs or metadata definitions.

Abigail Bourdeaux: The Copyright and Fair Use Tool: An interactive workflow tool for those trying to determine the copyright status, and fair use status, of materials, particularly for use in the classroom. (It has not yet begun coding.)

Abigail Bourdeaux: Online Digital Atlas Viewer. This is a viewer designed specifically for viewing historical atlases online. These atlases may have overlap from page to page, may switch scales, etc. ODAV will help to reconcile maps through Open Layers, to overlap and scale them seamlessly. (It has not yet begun coding.)

Marc MGee and Dave Siegel: Enhanced Catalog Searching with Geospatial Technology. They’re working on ways to spatially search information in the Harvard Library system. They’re using PRESTO Web Services tools. They’ve taken 1,700 MARC records and sent them to Metacarta, a geocoding company. Metacarta assigns lat/long to words it’s extracted from text. They then put markers on a map to show docs relevant to those places.

Bobbi Fox: Library Application Collaboration Development Tools and Resources: How we can better coordinate library innovation at Harvard. They’ve reactivated the ABCD Library discussion group, which has been a “roaring success.” They’ve also been talking with groups all across the library system about what would help. They’re also coordinating with the new University CTO. From the small group discussions they’ve confirmed that everyone wants simple and convenient ways to keep up with the various projects, but we tend to disagree about what “simple and convenient” means :) Also, it’s clear we need to work get over the cultural barriers against sharing what we’re doing. Most people are not all that excited about centrally provided services such as bug tracking or source code management.

Justin Daost, Chris Erdmann: Wolbach User Experience Lab. The center for astrophysics got a Microsoft Surface, which interacts with objects near its surface via infrared cameras. They’ve been working with Microsoft Research to see how it could be use in the Library. Microsoft also connected them with Andy van Dam at Brown U. where they’re working on the Garibaldi Project, a way of browsing a set of related content. They’ve been working on the LADS project that lets people scroll through a timeline, zoom in on high res images (without using much memory), click on hotspots that display related metadata, etc. They are using this to give access to special collections. Also, they created an interface to enable librarians to update it easily.

Andy Wilson: QR Codes in the Library: This project would put QR in the stacks that would load onto a mobile device research guides relevant to that area of the stacks. They will spend the fall semester gathering more usage data before going to full implementation; they want to make sure people will actually use it.

Skip Kendall and Andrea Goethals: Zone 1 Rescue Repository: 1. Working with faculty members to look at their own personal archive (personal papers, etc.), and to think about policy recommendations. 2. The Rescue Repository is a place to put content the final destination of which is not yet known.; it’s a type of staging area, for use by anyone at Harvard, with very low barriers to getting content in. People can nominate content for long-term preservation. Content can be exported into other repositories. It will be open source software. (MIT is collaborating on this project).

Carli Spina and Kim Dulin: Library Analytics Toolkit: An open source, highly configurable dashboard for viewing library statistics. It will be configurable for individuals, departments, entire libraries, etc. By having it in similar formats, libraries will be able to compare their data. It will be widget-based and extensible, drawing data from standard data collectors, and will be built on existing dashboards (e.g. NCSU, Brown U., and the Watson Library at the Met). It is at the wireframe stage.

Cheryl McGrath: Interactive Carrel Seating App: Currently getting a carrel requires a bunch of paperwork and staff time. People have a wide variety of requests: Near a bathroom, in sunlight, no glare at sunset, are there crumbs in it, etc. This open source app lets users browse and search, and reserve the carrel. Carrel users can also post msgs to one another. The team thinks this app may save 5 weeks of labor for a staff member per year.

Library Innovation Podcasts: That’s my project: http://librarylab.law.harvard.edu/blog/category/podcast/

Chip Goines: DRS Access for Mobil Devices: Creating an API to enable mobile devices to locate items in a “page-turned digital research” object, returning info about that particular page. [pdf]

Kimberly Hall: The Connected Scholar: “Building ideas and exploring sources within an online culture of attribution.” It lets researchers track what they’re looking at/copying/jotting down, and enables collaboration in the management of information resources. This should help scholars see where their ideas are coming from, to better understand their creative process.” It should also help students develop the habit of attributing sources. Students will be able to see their research process through the tool.

Reinhard Engels: Highbrow: A textual annotation browser that displays the density of references to a text. E.g., you can plot the Biblical references in Aquinas, St. Augustine, Martin Luther, and Maimonides. (Augustine is more interested in Psalms than Aquinas was, and no one is interested in Mark.) You can zoom in on the line chart until you get to the actual text. The source text preferably should have a clear coordinate system (e.g., chapter and verse, or numbered lines of poetry). In working with Dante references, Reinhard has hit scaling issues: one set of commentators has almost 300,000 annotations. So, he slices them by century, or by various other facets. Or you can browse by line and see how many annotations there, and what they are. He’s now working interactive annotations, enabling students and researchers to enter annotations.

Tom Dawson: Yana: “an open source template for scholarly journals to develop mobile apps.” “Yana” is Sanskrit for “vehicle.”) “The goal of the Yana project is to provide a light-weight, modular, open source template within which open acccess publishers can develop their own mobile applications.” The aim is to make it easier for journals to do open access publishing on mobiles.

I talked about LibraryCloud, and Matt Phillips did a demo. LibraryCloud is an open library metadata server. It’s coming along well.

James Burns, Jesse Shapins: extraMUROS. The aim is to provide a multimedia library without walls. It will bring together collections from all over and let users browse and search, curate in their own fashion, and be able to publish collections. James and Jesse show an early build of their browser that lets you quickly scan multiple collections. (Very cool.) You can drag objects into a scratch space — either collections or individual items. It can look at the items you’re choosing in order to refine your search. There’s a map view that is also very cool. It even has a 3D view (No, no glasses required :) And a timeline view.

Q: Will you fund non-tech-heavy proposals?
A: Yes!

Q: Could these be sources of revenue for the Library?
A: Nope. It’s open source for the greater good of libraries.

3 Comments »

July 21, 2011

Why I’ve been quiet

There’s been just so much to do. I’ve been on double deadlines (which, btw, is the direct opposite of double rainbows), while the Library Innovation Lab project for the DPLA beta sprint has been roaring forward. But, as of two minutes ago, I have reached a moment when I can breathe…for a minute.

I turned in the final copy-edited version of Too Big to Know a few minutes ago. The copy editor, Christine Arden, was a dream, finding errors and infelicities at every level of the book. Plus, she occasionally put in a note about something she liked; that matters a lot to me. Anyway, it was due in today and I hit the send button at 5:10.

So, sure, yay and congratulations. But from here on in, the book only gets worse. Let me put it like this: It sure isn’t gonna get any better. It’s a relief to be done, of course, but it is anxiety-making to watch the world change as the book stays the same.

I also was on deadline to submit a Scientific American article, which I did on Monday. I’m excited to have something considered by them. (They can always say no, even though it was their idea, and I’ve been working with a really good editor there.)

As for the Library Innovation Lab, we are doing this amazing project for DPLA that is coming together. There are some gigantic, chewy issues we’ve had to work through, which we have been working with some fantastic people on. If we get this even close to right — and I’m confident we will — it will make some very hard problems look so easy that they’re invisible. It’s going to be cool. I am learning so much watching my colleagues work through these issues at a level I can barely hang on to. And then there are all the fascinating problems of building an app that makes people think it’s easy to navigate through tens of millions of works.

It’s been a busy summer. And despite sending off the two large writing projects that have occupied for me a while, I don’t anticipate it getting any less busy.

4 Comments »

May 11, 2011

James Bridle – first Library Innovation Lab podcast

James Bridle is the interviewee in the first in a series of podcasts I’m doing for the Harvard Library Innovation Lab.

I met James at a conference in Israel a few weeks ago, and had the great pleasure of getting to hang out with him. He’s a British book-lover and provocateur, who expresses his deep insights through his wicked sense of humor.

Thanks to Daniel Dennis “Magnificent” Jones [twitter:blanket] for producing the series, doing the intros, choosing the music, writing the page…

7 Comments »