Joho the Blogeverythingismisc Archives - Joho the Blog

August 8, 2017

Messy meaning

Steve Thomas [twitter: @stevelibrarian] of the Circulating Ideas podcast interviews me about the messiness of meaning, library innovation, and educating against fake news.

You can listen to it here.

Be the first to comment »

March 8, 2016

Making library miscellaneousness awesome

Sitterwerk Art Library in St. Gallen, Switzerland, has 25,000 items on its shelves in no particular order. This video explains why that is a brilliant approach. And then the story just gets better and better.

Werkbank from Astrom / Zimmer on Vimeo.

That the shelves have no persistent order doesn’t mean they have no order. Rather, works are reshelved by users in the clusters the users have created for their research. All the items have RFID tags in them, and the shelves are automatically scanned so that the library can always tell users where items are located.

As a result, if you look up a particular item, you will see it surrounded by works that some other user thought were related to it in some way. This creates a richer browsing experience because it is shaped and reshaped by how its community of users sees the items’ inter-relationships.

The library has now installed Werkbank, which is a plain old table where you can spread out a pile of books and do your research. But, unlike truly plain old tables, this one combines RFID sensors and cameras with recognition software so it knows which works you’ve put on the table and how you’ve organized them. Werkbench notes those associations, and stores them, creating a rich network of related works.

It also lets the individual save a research set, and even compile a booklet documenting those items, with notes. It can be printed on the spot and taken home … or put into the shelves as a user-generated lib guide.

This is awesome.

Here’s a bit more about it:

…the new table sports a grid of 12 an­ten­nas. It also has two cam­eras at­tached: one for scan­ning the tab­letop and through cus­tom image re­cog­ni­tion soft­ware de­term­ine the exact po­s­i­tion and ro­ta­tion of books; one for mak­ing high-res­ol­u­tion scans of pages, notes or ob­jects not yet in the Sit­ter­werk cata­logue. Just like be­fore, the new server and its in­ter­face provides a real-time di­gital ren­der­ing of the table and its con­tents, but in two di­men­sions in­stead of one. It also lets you at­tach scans, pho­tos and texts to in­di­vidual ob­jects, and to the vir­tual table it­self. Once you save your col­lec­tion, it merges with a grow­ing net­work of other col­lec­tions, books, ma­ter­i­als, thoughts and people

Anthon Astrom tells me that the project currently runs against an internal API, and they are planning to create a public API at some point. That way, the world can benefit from what Sitterwerk’s users are teaching it.



At the Harvard Library Innovation Lab, we wanted to do something that touches on some elements of this. With 73 libraries and 13 million items in Harvard Library it never even crossed our minds to install continuous RFID scanners in the stacks. So, our StackLife project and the LibraryCloud platform underneath it wanted simply to record which books were checked out with others, on the grounds that those clusters often have meaning. But, Harvard cyber-security researchers warned that this could be used to identify who took the books out. We thought about ways of smudging the data, and about making it opt-in, but it was not a fight we could win at that point. Werkbank might have the same issues when recording clusters but because it’s an art library, there may be less concern about the government demanding to know who was researching The Scream, Delacroix’s Liberty Leading the People, and Guernica because that person is clearly up to no good.

In any case the Sitterwerk library and Werkbank have far exceeded our imagination. More than that: it’s real. Awesomely real.


August 1, 2015

Restoring the Network of Bloggers

It’s good to have Hoder — Hossein Derakhshan— back. After spending six years in an Iranian jail, his voice is stronger than ever. The changes he sees in the Web he loves are distressingly real.

Hoder was in the cohort of early bloggers who believed that blogs were how people were going to find their voices and themselves on the Web. (I tried to capture some of that feeling in a post a year and a half ago.) Instead, in his great piece in Medium he describes what the Web looks like to someone extremely off-line for six years: endless streams of commercial content.

Some of the decline of blogging was inevitable. This was made apparent by Clay Shirky’s seminal post that showed that the scaling of blogs was causing them to follow a power law distribution: a small head followed by a very long tail.

Blogs could never do what I, and others, hoped they would. When the Web started to become a thing, it was generally assumed that everyone would have a home page that would be their virtual presence on the Internet. But home pages were hard to create back then: you had to know HTML, you had to find a host, you had to be so comfortable with FTP that you’d use it as a verb. Blogs, on the other hand, were incredibly easy. You went to one of the blogging platforms, got yourself a free blog site, and typed into a box. In fact, blogging was so easy that you were expected to do it every day.

And there’s the rub. The early blogging enthusiasts were people who had the time, skill, and desire to write every day. For most people, that hurdle is higher than learning how to FTP. So, blogging did not become everyone’s virtual presence on the Web. Facebook did. Facebook isn’t for writers. Facebook is for people who have friends. That was a better idea.

But bloggers still exist. Some of the early cohort have stopped, or blog infrequently, or have moved to other platforms. Many blogs now exist as part of broader sites. The term itself is frequently applied to professionals writing what we used to call “columns,” which is a shame since part of the importance of blogging was that it was a way for amateurs to have a voice.

That last value is worth preserving. It’d be good to boost the presence of local, individual, independent bloggers.

So, support your local independent blogger! Read what she writes! Link to it! Blog in response to it!

But, I wonder if a little social tech might also help. . What follows is a half-baked idea. I think of it as BOAB: Blogger of a Blogger.

Yeah, it’s a dumb name, and I’m not seriously proposing it. It’s an homage to Libby Miller [twitter:LibbyMiller] and Dan Brickley‘s [twitter:danbri ] FOAF — Friend of a Friend — idea, which was both brilliant and well-named. While social networking sites like Facebook maintain a centralized, closed network of people, FOAF enables open, decentralized social networks to emerge. Anyone who wants to participate creates a FOAF file and hosts it on her site. Your FOAF file lists who you consider to be in your social network — your friends, family, colleagues, acquaintances, etc. It can also contain other information, such as your interests. Because FOAF files are typically open, they can be read by any application that wants to provide social networking services. For example, an app could see that Libby ‘s FOAF file lists Dan as a friend, and that Dan’s lists Libby, Carla and Pete. And now we’re off and running in building a social network in which each person owns her own information in a literal and straightforward sense. (I know I haven’t done justice to FOAF, but I hope I haven’t been inaccurate in describing it.)

BOAB would do the same, except it would declare which bloggers I read and recommend, just as the old “blogrolls” did. This would make it easier for blogging aggregators to gather and present networks of bloggers. Add in some tags and now we can browse networks based on topics.

In the modern age, we’d probably want to embed BOAB information in the HTML of a blog rather than in a separate file hidden from human view, although I don’t know what the best practice would be. Maybe both. Anyway, I presume that the information embedded in HTML would be similar to what does: information about what a page talks about is inserted into the HTML tags using a specified vocabulary. The great advantage of is that the major search engines recognize and understand its markup, which means the search engines would be in a position to constructdiscover the initial blog networks.

In fact, has a blog specification already. I don’t see anything like markup for a blogroll, but I’m not very good a reading specifications. In any case, how hard could it be to extend that specification? Mark a link as being to a blogroll pal, and optionally supply some topics? (Dan Brickley works on

So, imagine a BOAB widget that any blogger can easily populate with links to her favorite blog sites. The widget can then be easily inserted into her blog. Hidden from the users in this widget is the appropriate markup. Not only could the search engines then see the blogger network, so could anyone who wanted to write an app or a service.

I have 0.02 confidence that I’m getting the tech right here. But enhancing blogrolls so that they are programmatically accessible seems to me to be a good idea. So good that I have 0.98 confidence that it’s already been done, probably 10+ years ago, and probably by Dave Winer :)

Ironically, I cannot find Hoder’s personal site; is down, at least at the moment.

More shamefully than ironically, I haven’t updated this blog’s blogroll in many years.

My recent piece in The Atlantic about whether the Web has been irremediably paved touches on some of the same issues as Hoder’s piece.


June 1, 2015

[misc][liveblog] Alex Wright: The secret history of hypertext

I’m in Oslo for Kunnskapsorganisasjonsdagene, which my dear friend Google Translate tells me is Knowledge Organization Days. I have been in Oslo a few times before — yes, once in winter, which was as cold as Boston but far more usable — and am always re-delighted by it.

Alex Wright is keynoting this morning. The last time I saw him was … in Oslo. So apparently Fate has chosen this city as our Kismet. Also coincidence. Nevertheless, I always enjoy talking with Alex, as we did last night, because he is always thinking about, and doing, interesting things. He’s currently at Etsy , which is a fascinating and inspiring place to work, and is a professor interaction design,. He continues to think about the possibilities for design and organization that led him to write about Paul Otlet who created what Alex has called an “analog search engine”: a catalog of facts expressed in millions of index cards.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Alex begins by telling us that he began as a librarian, working as a cataloguer for six years. He has a library degree. As he works in the Net, he finds himself always drawn back to libraries. The Net’s fascination with the new brings technologists to look into the future rather than to history. Alex asks, “How do we understand the evolution of the Web and the Net in an historical context?” We tend to think of the history of the Net in terms of computer science. But that’s only part of the story.

A big part of the story takes us into the history of libraries, especially in Europe. He begins his history of hypertext with the 16th century Swiss naturalist Conrad Gessner who created a “universal bibliography” by writing each entry on a slip of paper. Leibniz used the same technique, writing notes on slips of paper and putting them in an index cabinet he had built to order.

In the 18th century, the French started using playing cards to record information. At the beginning of the 19th, the Jacquard loom used cards to guide weaving patterns, inspiring Charles Babbage to create what many [but not me] consider to be the first computer.

In 1836, Isaac Adams created the steam powered printing press. This, along with economic and social changes, enabled the mass production of books, newspapers, and magazines. “This is when the information explosion truly started.”

To make sense of this, cataloging systems were invented. They were viewed as regimented systems that could bring efficiencies … a very industrial concept, Alex says.

“The mid-19th century was also a period of networking”: telegraph systems, telephones, internationally integrated postal systems. “Goods, people, and ideas were flowing across national borders in a way they never had before.” International journals. International political movements, such as Marxism. International congresses (conferences). People were optimistic about new political structures emerging.

Alex lists tech from the time that spread information: a daily reading of the news over copper wires, pneumatic tubes under cities (he references Molly Wright Steenson‘s great work on this), etc.

Alex now tells us about Paul Otlet, a Belgian who at the age of 15 started designing his own cataloging system. He and a partner, Henri La Fontaine, started creating bibliographies of disciplines, starting with the law. Then they began a project to create a universal bibliography.

Otlet thought libraries were focused on the wrong problem. Getting readers to the right book isn’t enough. People also need access to the information in the books. At the 1900 [?] world’s fair in Paris, Otlet and La Fontaine demonstrated their new system. They wanted to provide a universal language for expressing the connections among topics. It was not a top-down system like Dewey’s.

Within a few years, with a small staff (mainly women) they had 15 million cards in their catalog. You could buy a copy of the catalog. You could send a query by telegraphy, and get a response telegraphed back to you, for a fee.

Otlet saw this in a bigger context. He and La Fontaine created the Union of International Associations, an association of associations, as the governing body for the universal classification system. The various associations would be responsible for their discpline’s information.

Otlet met a Scotsman named Patrick Geddes who worked against specialization and the fracturing of academic disciplines. He created a camera obscura in Edinburgh so that people could see all of the city, from the royal areas and the slums, all at once. He wanted to stitch all this information together in a way that would have a social effect. [I’ve been there as a tourist and had no idea!] He also used visual forms to show the connections between topics.

Geddes created a museum, the Palais Mondial, that was organized like hypertext., bringing together topics in visually rich, engaging displays. The displays are forerunners of today’s tablet-based displays.

Another collaborator, Hendrik Christian Andersen, wanted to create a world city. He went deep into designing it. He and Otlet looked into getting land in Belgium for this. World War I put a crimp in the idea of the world joining in peace. Otlet and Andersen were early supporters of the idea of a League of Nations.

After the War, Otlet became a progressive activist, including for women’s rights. As his real world projects lost momentum, in the 1930s he turned inward, thinking about the future. How could the new technologies of radio, television, telephone, etc., come together? (Alex shows a minute from the documentary, The Man who wanted to Classify the World.”) Otlet imagines a screen and television instead of books. All the books and info are in a separate facility, feeding the screen. “The radiated library and the televised book.” 1934.

So, why has no one ever heard of Otlet? In part because he worked in Belgium in the 1930s. In the 1940s, the Nazis destroyed his work. They replaced his building, destrooying 70 tons of materials, with an exhibit of Nazi art.

Although there are similarities to the Web, how Otlet’s system worked was very different. His system was a much more controlled environment, with a classification system, subject experts, etc. … much more a publishing system than a bottom-up system. Linked Data and the Semantic Web are very Otlet-ish ideas. RDF triples and Otlet’s “auxiliary tables” are very similar.

Alex now talks about post-Otlet hypertext pioneers.

H.G. Wells’ “World Brain” essay from 1938. “The whole human memory can be, and probably in a shoirt time will be, made accessibo every individual.” He foresaw a complete and freely avaiable encyclopedia. He and Otlet met at a conference.

Emanuel Goldberg wanted to encode punchcard-style information on microfilm for rapid searching.

Then there’s Vannevar Bush‘s Memex that would let users create public trails between documents.

And Liklider‘s idea that different types of computers should be able to share infromation. And Engelbart who in 1968’s “Mother of all Demos” had a functioning hypertext system.

Ted Nelson thought computer scientists were focused on data computation rather than seeing computers as tools of connection. He invnted the term “hypertext,” the Xanadu web, and “transclusion” (embedding a doc in another doc). Nelson thought that links always should be two way. Xanadu= “intellectual property” controls built into it.

The Internet is very flat, with no central point of control. It’s self-organizing. Private corporations are much bigger on the Net than Otlet, Engelbart, and Nelson envisioned “Our access to information is very mediated.” We don’t see the classification system. But at sites like Facebook you see transclusion, two-way linking, identity management — needs that Otlet and others identified. The Semantic Web takes an Otlet-like approach to classification, albeit perhaps by algorithms rather than experts. Likewise, the Google “knowledge vaults” project tries to raise the ranking of results that come from expert sources.

It’s good to look back at ideas that were left by the wayside, he concludes, having just decisively demonstrated the truth of that conclusion :)

Q: Henry James?

A: James had something of a crush on Anderson, but when he saw the plan for the World City told him that it was a crazy idea.

[Wonderful talk. Read his book.]


March 21, 2014

Reading Emily Dickinson’s metadata

There’s a terrific article by Helen Vendler in the March 24, 2014 New Republic about what can learn about Emily Dickinson by exploring her handwritten drafts. Helen is a Dickinson scholar of serious repute, and she finds revelatory significance in the words that were crossed out, replaced, or listed as alternatives, in the physical arrangement of the words on the page, etc. For example, Prof. Vendler points to the change of the line in “The Spirit” : “What customs hath the Air?” became “What function hath the Air?” She says that this change points to a more “abstract, unrevealing, even algebraic” understanding of “the future habitation of the spirit.”

Prof. Vendler’s source for many of the poems she points to is Emily Dickinson: The Gorgeous Nothings, by Marta Werner and Jen Bervin, the book she is reviewing. But she also points to the new online Dickinson collection from Amherst and Harvard. (The site was developed by the Berkman Center’s Geek Cave.)

Unfortunately, the New Republic article is not available online. I very much hope that it will be since it provides such a useful way of reading the materials in the online Dickinson collection which are themselves available under a CreativeCommons license that enables
non-commercial use without asking permission.

Be the first to comment »

December 14, 2013

Are tags over-rated?

Jeff Atwood [twitter:codinghorror] , a founder of Stackoverflow and — two of my favorite sites — is on a tear about tags. Here are his two tweets that started the discussion:

I am deeply ambivalent about tags as a panacea based on my experience with them at Stack Overflow/Exchange. Example:

Here’s a detweetified version of the four-part tweet I posted in reply:

Jeff’s right that tags are not a panacea, but who said they were? They’re a tool (frequently most useful when combined with an old-fashioned taxonomy), and if a tool’s not doing the job, then drop it. Or, better, fix it. Because tags are an abstract idea that exists only in particular implementations.

After all, one could with some plausibility claim that online discussions are the most overrated concept in the social media world. But still they have value. That indicates an opportunity to build a better discussion service. … which is exactly what Jeff did by building

Finally, I do think it’s important — even while trying to put tags into a less over-heated perspective [do perspectives overheat??] — to remember that when first introduced in the early 2000s, tags represented an important break with an old and long tradition that used the authority to classify as a form of power. Even if tagging isn’t always useful and isn’t as widely applicable as some of us thought it would be, tagging has done the important work of telling us that we as individuals and as a loose collective now have a share of that power in our hands. That’s no small thing.


July 6, 2013

[misc][2b2k] Why ontologies make me nervous

A few days ago there was a Twitter back and forth between two people I deeply respect: Dan Brickley [twitter:danbri] and Ed Summers [twitter:edsu]. It started with Ed responding to a tweet about a brief podcast I did with Kevin Ford [twitter:3windmills], who is on the team working on BibFrame:

After a couple of tweets, Dan tweeted the following:

There followed some agreement that it's often helpful to have apps driving the development of standards. (Kevin agrees with this, and points to BibFrame's process.) But, Dan's comment clarified my understanding of why ontologies make me nervous.

Over the past hundred years or so, we've come to a general recognition that all classifications and categorizations are tools, not representations of The Real Order. The periodic table of the elements is a useful way of organizing information, and manifests real relationships among the elements, but it is not the single "real" way the elements are arranged; if you're an economist or an industrialist, a chart that arranges the elements based on where they exist on our planet might be just as valid. Likewise, Linneaus' classification scheme is useful and manifests some real relationships, but if you're a chef you might have a different way of carving up the animal kingdom. Linneaus chose to organize species based upon visible differences — which might not be the "essential" differences — so that his scheme would be useful to scientists in the field. Although he was sometimes ambiguous about this, he seems not to have thought that he was discerning God's own order. Since Linnaeus we have become much more explicit in our understanding that how we classify depends on what we're trying to accomplish.

For example, a DTD (document type definition) typically is designed not to capture the eternal essence of some type of document, but to make the document more usable by systems that automate the document's production and processing. For example, an industry might agree on a DTD for parts catalogs that specifies that a parts catalog must have an element called "part" and that a part must have a type, part number, length, height, weight, material, and a description, and optionally can note whether it turns clockwise or counterclockwise. Each of these elements would have a standard name (e.g., "part_number," not "part#"). The result is a document that describes parts in a standard way so that a company can receive descriptions from all of its suppliers and automatically build a database of the parts it uses.

A DTD therefore is designed with an eye toward what properties are going to be useful. In some industries, it might include a term that captures how shiny the part is, but if it's a DTD for surgical equipment, that may not be relevant enough to include...although "sanitary_packaging" might be. Likewise, how quickly a bolt transfers heat might seem irrelevant, at least until NASA places an order. In this DTD's are much like forms: You don't put a field for earlobe length in the college application form you're designing.

Ontologies are different. They can try to express the structure of a domain independent of any particular use, so that the widest variety of applications can share data, including apps from domains outside of the one that's been mapped. So, to use Dan's example, your ontology of jobs would note that jobs have employers and workers, that they may have a salary or other form of compensation, that they can be part-time, full-time, seasonal, etc. As an ontology designer, because you're trying to think beyond whatever applications you already can imagine, your aim (often, not always) is to provide the fullest possible set of slots just in case someone sometime needs that info. And you will carefully describe the relationships among the elements so that apps and researchers can use knowledge that is implicit in the model.

The line between DTD's and ontologies is fuzzy. Many ontologies are designed with classes of apps in mind, and some DTD's have tried to be hugely general purpose. My discomfort really comes down to a distrust of the concept of "knowledge representation" that underlies some ontologies (especially earlier ones). The complexity of the relationships among parts will always outstrip our attempts to capture and codify those relationships. Further, knowledge cannot be fully represented because it isn't a thing apart from our continuous invention, discovery, and engagement with it.

What it comes down to is that if you talk about ontologies as knowledge representations I'll mutter something under my breath and change the topic.


June 19, 2013

[lodlam] Dean Krafft on VIVO

Dean Krafft of Cornell talks about the status of VIVO, an interdisciplinary tool to help researchers discover one another.

This is from the LODLAM conference in Montreal.

1 Comment »

April 25, 2013

[eim][misc] Too big to categorize

Amanda Filipacchi has a great post at the New York Times about the problem with classifying American female novelists as American female novelists. That’s been going on at Wikipedia, with the result that the category American novelist was becoming filled predominantly with male novelists.

Part of this is undoubtedly due to the dumb sexism that thinks that “normal” novelists are men, and thus women novelists need to be called out. And even if the category male novelist starts being used, it still assumes that gender is a primary way of dividing up novelists, once you’ve segregated them by nation. Amanda makes both points.

From my point of view, the problem is inherent in hierarchical taxonomies. They require making decisions not only about the useful ways of slicing up the world, but also about which slices come first. These cuts reflect cultural and political values and have cultural and political consequences. They also get in the way of people who are searching with a different way of organizing the topic in mind. In a case like this, it’d be far better to attach tags to Wikipedia articles so that people can search using whatever parameters they need. That way we get better searchability, and Wikipedia hasn’t put itself in the impossible position of coming up with a taxonomy that is neutral to all points of view.

Wikipedia’s categories have been broken for a long time. We know this in the Library Innovation Lab because a couple of years ago we tried to find every article in Wikipedia that is about a book. In theory, you can just click on the “Book” category. In practice, the membership is not comprehensive. The categories are inconsistent and incomplete. It’s just a mess.

It may be that a massive crowd cannot develop a coherent taxonomy because of the differences in how people think about things. Maybe the crowd isn’t massive enough. Or maybe the process just needs far more guidance and regulation. But even if the crowd can bring order to the taxonomy, I don’t believe it can bring neutrality, because taxonomies are inherently political.

There are problems with letting people tag Wikipedia articles. Spam, for example. And without constraints, people can lard up an object with tags that are meaningful only to them, offensive, or wrong. But there are also social mechanisms for dealing with that. And we’ve been trained by the Web to lower our expectations about the precision and recall afforded by tags, whereas our expectations are high for taxonomies.

Go tags.


April 18, 2013

[misc] StackLife goes live – visually browse millions of books

I’m very proud to announce that the Harvard Library Innovation Lab (which I co-direct) has launched what we think is a useful and appealing way to browse books at scale. This is timed to coincide with the launch today of the Digital Public Library of America. (Congrats, DPLA!!!)

StackLife (nee ShelfLife) shows you a visualization of books on a scrollable shelf, which we turn sideways so you can read the spines. It always shows you books in a context, on the ground that no book stands alone. You can shift the context instantly, so that you can (for example) see a work on a shelf with all the other books classified under any of the categories professional cataloguers have assigned to it.

We also heatmap the books according to various usage metrics (“StackScore”), so you can get a sense of the work’s community relevance.

There are lots more features, and lots more to come.

We’ve released two versions today.

StackLife DPLA mashes up the books in the Digital Public Library of America’s collection (from the Biodiversity Heritage Library) with books from The Internet Archive‘s Open Library and the Hathi Trust. These are all online, accessible books, so you can just click and read them. There are 1.7M in the StackLife DPLA metacollection. (Development was funded in part by a Sprint grant from the DPLA. Thank you, DPLA!)

StackLife Harvard lets you browse the 12.3M books and other items in the Harvard Library systems 73 libraries and off-campus repository. This is much less about reading online (unfortunately) than about researching what’s available.

Here are some links:

StackLife DPLA:
StackLife Harvard:
The DPLA press release:
The DPLA version FAQ:

The StackLife team has worked long and hard on this. We’re pretty durn proud:

Annie Cain
Paul Deschner
Kim Dulin
Jeff Goldenson
Matthew Phillips
Caleb Troughton


Next Page »