A few times in the course of Derek Attig’s really interesting talk on the history of bookmobiles yesterday, he pointed out how the route map of the early bookmobiles (and later ones, too) resembles a network map. He did this to stress that the history of bookmobiles is not simply a history of vehicles, but rather should be understood in terms of those vehicles’ social effect: creating and connecting communities.
I like this point, and I don’t mean to suggest that Derek carried the analogy too far. Not at all. But, it is an excellent example of how we are reinterpreting everything in terms of networks, just as we had previously interpreted everything in terms of computers and programs and information, and before that in terms of telephone networks, and before that…and before that…and before that….
NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.
Matthew Battles begins by inviting us all to visit the Harvard center for Renaissance studies in Florence, Italy. [Don't toy with us, Matthew!] There’s a collection there, curated by Bernard Berenson, of 16,000 photos documenting art that can’t be located, which Berenson called “Homeless Paintings of the Italian Renaissance.” A few years ago, Mellon sponsored the digitization of this collection, to be made openly available. One young man, Chris Daley [sp?] has since found about 120 of the works. [This is blogged at the metaLab site.]
These 16,000 images are available at Harvard’s VIA image manager [I think]. VIA is showing its age. It doesn’t support annotation, etc. There are some cultural crowdsourcing projects already underway, e.g., Zooniverse’s Ancient Lives project for transcribing ancient manuscripts. metaLab is building a different platform: Curarium.com.
Matthew hands off to Jeffrey Schnapp. He says Curarium will allow a diverse set of communities (archivist, librarian, educator, the public, etc.) to animate digital collections by providing tools for doing a multiplicity things with those collections. We’re good at making collections, he says, but not as good at making those collections matter. Curarium should help take advantage of the expertise of distributed communities.
What sort of things will Curarium allow us to do? (A beta should be up in about a month.) Add metadata, add meaning to items…but also work with collections as aggregates. VIA doesn’t show relations among items. Curarium wants tomake collections visible and usable at the macro and micro levels, and to tell stories (“spotlights”).
Jeffrey hands off to Pablo, who walks us through the wireframes. Curarium will ingest records, and make them interoperable. They take in reords in JSON format, and extract the metadata they want. (They save the originals.) They’re working on how to give an overview of the collection; “When you have 11,000 records, thumbnails don’t help.” So, you’ll see a description and visualizations of the cloud of topic tags and items. (The “Homeless” collection has 2,000 tags.)
At the item level, you can annotate, create displays of selected content (“‘Spotlights’ are selections of records organized as thematized content”) in various formats (e.g., slideshow, more academic style, etc.). There will be a rich way of navigating and visualizing. There will be tools for the public, researchers, and teachers.
Q: [me] How will you make the enhanced value available outside of Curarium? And, have you considered using Linked Data?
A: We’re looking into access. The data we have is coming from other places that have their own APIs, but we’re interested in this.
Q: You could take the Amazon route by having your own system use API’s, and then make those API’s open.
Q: How important is the community building? E.g., Zooniverse succeeds because people have incentives to participate.
A: Community-building is hugely important to us. We’ll be focusing on that over the next few months as we talk with people about what they want from this.
A: We want to expand the scope of conversation around cultural history. We’re just beginning. We’d love teachers in various areas — everything from art history to history of materials — to start experimenting with it as a teaching tool.
Q: The spotlight concept is powerful. Can it be used to tell the story of an individual object. E.g., suppose an object has been used in 200 different spotlights, and there might be a story in this fact.
A: Great question. Some of the richness of the prospect is perhap addressed by expectations we have for managing spotlights in the context of classrooms or networked teaching.
Q: To what extent are you thinking differently than a standard visual library?
A: On the design side, what’s crucial about our approach is the provision for a wide variety of activities, within the platform itself: curate, annotate, tell a story, present it… It’s a CMS or blogging platform as well. The annotation process includes bringing in content from outside of the environment. It’s a porous platform.
Q: To what extent can users suggest changes to the data model. E.g., Europeana has a very rigid data model.
A: We’d like a significant user contribution to metadata. [Linked Data!]
Q: Are we headed for a bifurcation of knowledge? Dedicated experts and episodic amateurs. Will there be a curator of curation? Am I unduly pessimistic?
A: I don’t know. If we can develop a system, maybe with Linked Data, we can have a more self-organizing space that is somewhere in between harmony and chaos. E.g., Wikimedia Loves Monuments is a wonderful crowd curatorial project.
Q: Is there anything this won’t do? What’s out of scope?
A: We’re not providing tools for creating animated gifs. We don’t want to become a platform for high-level presentations. [metaLab's Zeega project does that.] And there’s a spectrum of media we’ll leave alone (e.g., audio) because integrating them with other media is difficult.
Q: How about shared search, i.e., searching other collections?
A: Great idea. We haven’t pursued this yet.
Q: Custodianship is not the same as meta-curation. Chris Daly could become a meta-curator. Also, there’s a lot of great art curation at Pinterist. Maybe you should be doing this on top of Pinterest? Maybe built spotlight tools for Pinteresters?
A: Great idea. We already do some work along those lines. This project happens to emerge from contact with a particular collection, one that doesn’t have an API.
Q: The fact that people are re-uploading the same images to Pinterest is due to the lack of standards.
Q: Are you going to be working on the vocabulary, or let someone else worry about that?
A: So far, we’re avoiding those questions…although it’s already a problem with the tags in this collection.
[Looks really interesting. I'd love to see it integrate with the work the Harvard Library Interoperability Initiative is doing.]
A mailing list I’m on is discussing GenderAvenger.com. Here’s the text from the home page:
Be A Gender Avenger Don’t Accept It. Change It.
Panel of all men? Conference with no women speakers? Book of essays with no women authors? Do something, something simple: Point it out. Opportunities — sadly — abound. How could that be in 2013? They can be found among iconic institutions and in seemingly small bore infractions.
Seeing can be believing. Everywhere possible when women are unrepresented or underrepresented, a gender avenger will take note, take action or ask someone else to take action. No excuses. This effort requires speaking out even when it is uncomfortable. Try it. The outcome could make you smile or groan. Either way you will have a story to tell that could influence others.
The site does a poor job of explaining exactly what it wants by way of input and what the outcome will be, but the email you receive if you decide to sign up anyway cites a HuffPo article about the idea, encourages you to publicize male-dominated conferences, etc., and asks for your participation in a discussion about how to make the idea work.
aÂ·venge [uh-venj] verb (used with object), aÂ·venged, aÂ·vengÂ·ing. 1. to take vengeance or exact satisfaction for: to avenge a grave insult. 2. to take vengeance on behalf of: He avenged his brother.
This person knows that we know (and Gina Glanz, the site’s creator, knows) what the word “avenger” means. He’s not correcting a misuse, the way he might if she’d used “revenge” as a verb. So why is he telling us what he knows we all already know?
Very likely he’s saying that the way people take a word is how the word is defined in a dictionary. But since this mailing list has been together for well over a decade, and since no one on it has ever recommended violent action (it’s moderated by a pacifist), and since the language of the site itself talks about “speaking out even when it’s uncomfortable,” to think that the site or its supporters mean “vengeance” in its dictionary sense requires dropping a whole lot of context in favor of a slavish devotion to Mr. Webster. It would be perfectly reasonable to push back on the word because it carries bad connotations or because it doesn’t quite fit the intended meaning, but neither of those conversations is advanced by citing the dictionary definition of a common word. Rather, the argument is over territory beyond the sovereignty of a dictionary.
In short (or as the kids say, TL;DR), if you’re citing a definition of a word that everyone understands, you’re probably missing the point.
Hanan Cohen points me to a blog post by a MLIS student at Haifa U., named Shir, in which she discourses on the term “paradata.” Shir cites Mark Sample who in 2011 posted a talk he had given at an academic conference, Mark notes the term’s original meaning:
In the social sciences, paradata refers to data about the data collection process itself—say the date or time of a survey, or other information about how a survey was conducted.
Mark intends to give it another meaning, without claiming to have worked it out fully. :
…paradata is metadata at a threshold, or paraphrasing Genette, data that exists in a zone between metadata and not metadata. At the same time, in many cases it’s data that’s so flawed, so imperfect that it actually tells us more than compliant, well-structured metadata does.
His example is We Feel Fine, a collection of tens of thousands (or more … I can’t open the site because Amtrak blocks access to what it intuits might be intensive multimedia) of sentences that begin “I feel” from many, many blogs. We Feel Fine then displays the stats in interesting visualizations. Mark writes:
…clicking the Age visualizations tells us that 1,223 (of the most recent 1,500) feelings have no age information attached to them. Similarly, the Location visualization draws attention to the large number of blog posts that lack any metadata regarding their location.
Unlike many other massive datamining projects, say, Google’s Ngram Viewer, We Feel Fine turns its missing metadata into a new source of information. In a kind of playful return of the repressed, the missing metadata is colorfully highlighted—it becomes paradata. The null set finds representation in We Feel Fine.
So, that’s one sense of paradata. But later Mark makes it clear (I think) that We Feel Fine presents paradata in a broader sense: it is sloppy in its data collection. It strips out HTML formatting, which can contain information about the intensity or quality of the statements of feeling the project records. It’s lazy in deciding which images from a target site it captures as relevant to the statement of feeling. Yet, Mark finds great value in We Feel Fine.
His first example, where the null set is itself metadata, seems unquestionably useful. It applies to any unbounded data set. For example, that no one chose answer A on a multiple choice test is not paradata, just as the fact that no one has checked out a particular item from a library is not paradata. But that no one used the word “maybe” in an essay test is paradata, as would be the fact that no one has checked out books in Aramaic and Klingon in one bundle. Getting a zero in a metadata category is not paradata; getting a null in a category that had not been anticipated is paradata. Paradata should therefore include which metadata categories are missing from a schema. E.g., that Dublin Core does not have a field devoted to reincarnation says something about the fact that it was not developed by Tibetans.
But I don’t think that’s at the heart of what Mark means by paradata. Rather, the appearance of the null set is just one benefit of considering paradata. Indeed, I think I’d call this “implicit metadata” or “derived metadata,” not “paradata.”
The fuller sense of paradata Mark suggests — “data that exists in a zone between metadata and not metadata” — is both useful and, as he cheerfully acknowleges, “a big mess.” It immediately raises questions about the differences between paradata and pseudodata: if We Feel Fine were being sloppy without intending to be, and if it were presenting its “findings” as rigorously refined data at, say, the biennial meeting of the Society for Textual Analysis, I don’t think Mark would be happy to call it paradata.
Mark concludes his talk by pointing at four positive characteristics of the We Feel Fine site:? It’s inviting, paradata, open, and juicy. (“Juicy” means that there’s lots going on and lots to engage you.) It seems to me that the site’s only an example of paradata because of the other three. If it were a jargon-filled, pompous site making claims to academic rigor, the paradata would be pseudodata.
This isn’t an objection or a criticism. In fact, it’s the opposite. Mark’s post, which is based on a talk that he gave at the Society for Textual Analysis, is a plea for research thatis inviting, open, juicy, and is willing to acknowledge that its ideas are unfinished. Mark’s post is, of course, paradata.
On Wednesday and Thursday I went to the second LODLAM (linked open data for libraries, archives, and museums) unconference, in Montreal. I’d attended the first one in San Francisco two years ago, and this one was almost as exciting — “almost” because the first one had more of a new car smell to it. This is a sign of progress and by no means is a complaint. It’s a great conference.
But, because it was an unconference with up to eight simultaneous sessions, there was no possibility of any single human being getting a full overview. Instead, here are some overall impressions based upon my particular path through the event.
Serious progress is being made. E.g., Cornell announced it will be switching to a full LOD library implementation in the Fall. There are lots of great projects and initiatives already underway.
Some very competent tools have been developed for converting to LOD and for managing LOD implementations. The development of tools is obviously crucial.
There isn’t obvious agreement about the standard ways of doing most things. There’s innovation, re-invention, and lots of lively discussion.
Some of the most interesting and controversial discussions were about whether libraries are being too library-centric and not web-centric enough. I find this hugely complex and don’t pretend to understand all the issues. (Also, I find myself — perhaps unreasonably — flashing back to the Standards Wars in the late 1980s.) Anyway, the argument crystallized to some degree around BIBFRAME, the Library of Congress’ initiative to replace and surpass MARC. The criticism raised in a couple of sessions was that Bibframe (I find the all caps to be too shouty) represents how libraries think about data, and not how the Web thinks, so that if Bibframe gets the bib data right for libraries, Web apps may have trouble making sense of it. For example, Bibframe is creating its own vocabulary for talking about properties that other Web standards already have names for. The argument is that if you want Bibframe to make bib data widely available, it should use those other vocabularies (or, more precisely, namespaces). Kevin Ford, who leads the Bibframe initiative, responds that you can always map other vocabs onto Bibframe’s, and while Richard Wallis of OCLC is enthusiastic about the very webby Schema.org vocabulary for bib data, he believes that Bibframe definitely has a place in the ecosystem. Corey Harper and Debra Riley-Huff, on the other hand, gave strong voice to the cultural differences. (If you want to delve into the mapping question, explore the argument about whether Bibframe’s annotation framework maps to Open Annotation.)
I should add that although there were some strong disagreements about this at LODLAM, the participants seem to be genuinely respectful.
LOD remains really really hard. It is not a natural way of thinking about things. Of course, neither are old-fashioned database schemas, but schemas map better to a familiar forms-based view of the world: you fill in a form and you get a record. Linked data doesn’t even think in terms of records. Even with the new generation of tools, linked data is hard.
LOD is the future for library, archive, and museum data.
Here’s a list of brief video interviews I did at LODLAM:
…as long as there are bands of violent Islamic radicals anywhere in the world who find it attractive to call themselves Al Qaeda, a formal state of war may exist between Al Qaeda and America. The Hundred Years War could seem a brief skirmish in comparison.
This is a different category of issue than the oft-criticized “war on terror,” which is a war against a tactic, not against an enemy. The war against Al Qaeda implies that there is a structurally unified enemy organization. How do you declare victory against a group that refuses to enforce its trademark?
In this, the war against Al Qaeda (which is quite preferable to a war against terror — and I think Steve agrees) is similar to the war on cancer. Cancer is not a single disease and the various things we call cancer are unlikely to have a single cause and thus are unlikely to have a single cure (or so I have been told). While this line of thinking would seem to reinforce politicians’ referring to terrorism as a “cancer,” the same applies to dessert. Each of these terms probably does have a single identifying characteristic, which means they are not classic examples of Wittgensteinian family resemblances: all terrorism involves a non-state attack that aims at terrifying the civilian population, all cancers involve “unregulated cell growth” [thank you Wikipedia!], and all desserts are designed primarily for taste not nutrition and are intended to end a meal. In fact, the war on Al Qaeda is actually more like the war on dessert than like the war on cancer, because just as there will always be some terrorist group that takes up the Al Qaeda name, there will always be some boundary-pushing chef who declares that beefy jerky or glazed ham cubes are the new dessert. You can’t defeat an enemy that can just rebrand itself.
I think that Steve Coll comes to the wrong conclusion, however. He ends his piece this way:
Yet the empirical case for a worldwide state of war against a corporeal thing called Al Qaeda looks increasingly threadbare. A war against a name is a war in name only.
I agree with the first sentence, but I draw two different conclusions. First, this has little bearing on how we actually respond to terrorism. The thinking that has us attacking terrorist groups (and at times their family gatherings) around the world is not made threadbare by the misnomer “war against Al Qaeda.” Second, isn’t it empirically obvious that a war against a name is not a war in name only?
A New Yorker article that profiles John Quijada, the inventor of a language (and a double-dotter!), mentions the first artificial language we know about, Lingua Ignota. The article’s author, Joshua Foer, tells us it was invented by Hildegard von Bingen (totally fun to say out loud) in the 12th century. “All that remains of her language is a short passage and a dictionary of a thousand and twelve words listed in hierarchical order, from the most important (Aigonz, God) to the least (Cauiz, cricket).” There’s more about Lingua Ignota over at our friend, Wikipedia. (And did you remember to kick in a few bucks to keep Wikipedia in booze and cigarettes?)
Ordering a list by cosmic importance (remember the Great Chain of Being?) makes sense if everyone agrees on what that order is. And it expresses respect for the order. That’s why some clergyfolk objected to the fact that Diderot’s Encyclopedia in the 18th century alphabetized its contents. Imagine Cows coming before God!
Before we sneer, we should keep in mind that we do the same thing when we make lists to be seen by others. For example, lists of donors put the Big Money folk first. For another example, we wouldn’t post a list of New Year’s resolutions in the following order:
My New Year’s Resolutions
Bring in an apple instead of snacking from the vending machine
Don’t let the ironing back up for more than a week
Refill the bird-feeder before it’s empty.
Get those birthday cards in the mail on time!
And there are rhetorical rules for the order in which we give reasons to support an argument. For example, we often give the easiest reason to accept first, and lead up to the most serious reason: “It’s easy, it’ll save money, people will feel good about it, and it’s the right thing to do.” The phrase “most important,….” is not permitted to appear in the middle of a sentence.
Now, when I say that this thread is “shockingly informative,” I don’t mean that it gives sufficient or even relevant information about the leaders it discusses. After all, it focuses on their personal combat skills. Rather, it is an interesting example of the haphazard way information spreads when that spreading is participatory. So, we are unlikely to have sent around the Wikipedia article on Kabila or Borisov simply because we all should know about the people leading the nations of the world. Further, while there is more information about world leaders available than ever in human history, it is distributed across a huge mass of content from which we are free to pick and choose. That’s disappointing at the least and disastrous at its worst.
On the other hand, information is now passed around if it is made interesting, sometimes in jokey, demeaning ways, like an article that steers us toward beefcake (although the president of Ireland does make it up quite high in the Reddit thread). The information that gets propagated through this system is thus spotty and incomplete. It only becomes an occasion for serendipity if it is interesting, not simply because it’s worthwhile. But even jokey, demeaning posts can and should have links for those whose interest is piqued.
So, two unspectacular conclusions.
First, in our despair over the diminishing of a shared knowledge-base of important information, we should not ignore the off-kilter ways in which some worthwhile information does actually propagate through the system. Indeed, it is a system designed to propagate that which is off-kilter enough to be interesting. Not all of that “news,” however, is about water-skiing cats. Just most.
Second, we need to continue to have the discussion about whether there is in fact a shared news/knowledge-base that can be gathered and disseminated, whether there ever was, whether our populations ever actually came close to living up to that ideal, the price we paid for having a canon of news and knowledge, and whether the networking of knowledge opens up any positive possibilities for dealing with news and knowledge at scale. For example, perhaps a network is well-informed if it has experts on hand who can explain events at depth (and in interesting ways) on demand, rather than assuming that everyone has to be a little bit expert at everything.
I’m not sure how I came into possession of a copy of The Indexer, a publication by the Society of Indexers, but I thoroughly enjoyed it despite not being a professional indexer. Or, more exactly, because I’m not a professional indexer. It brings me joy to watch experts operate at levels far above me.
The issue of The Indexer I happen to have — Vol. 30, No,. 1, March 2012 — focuses on digital trends, with several articles on the Semantic Web and XML-based indexes as well as several on broad trends in digital reading and digital books, and on graphical visualizations of digital indexes. All good.
I also enjoyed a recurring feature: Indexes reviewed. This aggregates snippets of book reviews that mention the quality of the indexes. Among the positive reviews, the Sunday Telegraph thinks that for the book My Dear Hugh, “the indexer had a better understanding of the book than the editor himself.” That’s certainly going on someone’s resumé!
I’m not sure why I enjoy works of expertise in fields I know little about. It’s true that I know a little about indexing because I’ve written about the organization of digital information, and even a little about indexing. And I have a lot of interest in the questions about the future of digital books that happen to be discussed in this particular issue of The Indexer. That enables me to make more sense of the journal than might otherwise be the case. But even so, what I enjoy most are the discussions of topics that exhibit the professionals’ deep involvement in their craft.
But I think what I enjoy most of all is the discovery that something as seemingly simple as generating an index turns out to be indefinitely deep. There are endless technical issues, but also fathomless questions of principle. There’s even indexer humor. For example, one of the index reviews notes that Craig Brown’s The Lost Diaries “gives references with deadpan precision (‘Greer, Germaine: condemns Queen, 13-14…condemns pineapple, 70…condemns fat, thin and medium sized women, 93…condemns kangaroos,122′).”
As I’ve said before, everything is interesting if observed at the right level of detail.