Joho the Bloglodlam Archives - Page 2 of 2 - Joho the Blog

June 19, 2013

[lodlam] Focus on helping users

Corey Harper [twitter:chrpr] starts a session by giving a terrific presentation of the problem: Linked data discussions and apps have focused too much on resources instead of on topics, narratives, etc. — what users are using resources to explore. We are not extracting all the value from librarians’ controlled vocabulary.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Some notes from the open discussion. Very sketchy, much choppier than in life, and highly incomplete.

Why not use Solr, i.e., an indexer of SQL databases? In part because Solr doesn’t know enough about the context, so a search for “silver” comes back with all sorts of hits without recognizing that some refer to the mineral, some to geo places with “silver” in the name, etc. E.g., if you say “john constable artist birthdate,” linked data can get you the answer. [I typed that into Google. It came back with the answer in big letters.]

Linked data can do the sort of thing that reference librarians do: Here’s what you’re looking for, but have you also seen this and this and that?

How do we evaluate the user interfaces we come up with? How do we know if it’s helped someone find something, put something into context, tell a story…?

We have two weird paradigms in the library community: Lucene-based indexes of metadata (e.g., Blacklight) vs. exhibit makers (e.g., Omeka). How to bring those together so exhibits are made through an index, and the flow through them is itself indexed and made findable and re-usable. (And then there’s the walking through a room and discovering relationships among things.)

How do we preserve the value of the subject classifications? [Here’s one idea: Stacklife :) ]

It’s important to keep one of the core functions of catalog: to identify and create identities for resources. A lot of our examples are facts, but in the Humanities what’s our role in maintaining identities around which we can hang relationships and maintain the disagreements among people. How do you help people navigate that problem space?

The Web’s taught us that the only way to find things is through search, but let’s remember the “link” in “linked data”: the ability to find the relationship between things you’ve found. E.g., the Google Knowledge Graph and Google fact panel are doing this to some degree. We’ve lost that, thanks to computers.

People want to have debates and find conflicting information. It’s hard how to bring this into a search interface.

The Digital Mellini project digitized a specialized manuscript and opened up. Once something is digitized, there are pieces you cannot see with the human eye — e.g., marginal notes.

Other examples of the sort of thing that Corey is talking about:

  • Linking Lives. EACCPF (corporations persons and families).

  • SNACs [??] (“Facebook for dead people”) mines finding aids to find social relationships.

  • LinkSailor (RIP) traversed a many OWL sameAs relationships.

  • CultureSampo (Finnish)

  • Tim Sherratt‘s group has something coming out soon

People think that museum web sites are boring. At LODLAM we’re a bunch of data geeks and are the wrong people to be talking about user interfaces. Response: We should take the Apple route and give people what they don’t know they want. We should also be testing our models against how people think about the world.

“I have a lot of data. It’s very sparse and sometimes very concentrated. It’s hard to know what users want from it. I don’t know what’s going to be important to you. So we generate video games, using geodata to create the playing field.” That’s not a retrieval engine, but it’s a way to make use of the factoids.

Read “The Lean Startup.” The Minimum Viable Product is an important idea. Don’t underrate the role of the product owner in shaping a great project. (Me:) Having strong, usable, graphs that take advantage of what libraries know would be helpful.

Who are our clients? Users? Scholars? Developers? A: All of them. Response: Then we’ll fail. Response: Catalogs were designed to manage collections, not for the general public. People have been forced to learn how to use them; you have to understand the collection’s abstraction. And that’s not sustainable.

Our library wants to build the graph. We build simple interfaces to demonstrate the power, but our value is in building the graph.

We don’t want to deliver linked data to users. We want to build the layer between the linked data and the apps. If we do it well, users won’t know or care that there’s linked data underneath it.

We tend to focus on what we think our users should want. It’s an “eat your broccoli” approach to search. E.g., users want social networks, but many scholars resist it because it seems too non-rigorous.

1 Comment »

[lodlam] The state of Linked Data

Jon Voss, an organizer of the LODLAM conference in Montreal, talks about what we can learn about the current state of Linked Data for libraries, archives, and museums by looking at the topics proposed at this unconference:

1 Comment »

[lodlam] Convert to RDF with KARMA

KARMA from University of Southern California takes tools for a wide variety of sources and maps it to your ontologies and generates linked data. It is open source and free. [I have not even re-read this post. Running to the next session.]

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

They are demo-ing using a folder full of OWL ontology files. [OWL files contain the rules that define ontologies. KARMA runs in your browser. The mapping format is R2RML, which is designed for relational databases, but they’ve extended it to handle more types of databases. You can import from a database, files, or a service. For the demo, they’re using CSV files from a Smithsonian database that consists of display names, IDs represented unique people, and a variant or married name. They want to map it to the Europeana ontology. KARMA shows the imported CSV and lets you (for example) create a URI for every person’s name in the table. You can use Python to transform the variant names into a standard name ontology, e.g. transforming “married name” into aac-ont:married (American Art Consortium), You can model the data and it learns it. E.g., it asks if you want to map the original’s ConstituentID to saam-ont:constituentID or saam-ont:objectId. (It recognizes that the ID is all numerals.) There’s an advanced option that lets you mp it to, for example, a URI for aac-ont:Person1.

He clicks on the “display name” and KARMA suggests that it’s a SKOS altLabel, or a FOAF name, etc. If there are no useful suggestions, you can pick one that’s close and then edit it. You can browse the ontologies in the folders you’ve configured it to load. You can have synonyms (“a FOAF person can be a SKOS person.”) [There’s yet more functionality, but this where I topped out.]

You can save this as a process that can be run in batch mode.

Comments Off on [lodlam] Convert to RDF with KARMA

[lodlam] Topics

I’m at LODLAM (linked open data for libraries, archives, and museums) in Montreal. It’s an unconference with 100 people from 16 countries. Here are the topics being suggested at the opening session. (There will be more added to the agenda board.)

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

(Because this is an unconference, I probably will not be doing much more liveblogging.)

  • Taxonomy alignment

  • How to build a case for LOD

  • How to build a pattern library (a clear articulation for a problem, the context where the problem appears, and a pattern for its solution) for cultural linked open data

  • How to take PDF to the next level, integrating triples to make it open data? How to make it into a “portable data format”

  • How can we efficiently convert our data to LOD? USC has Karma and would like to convene a workshop about tools.

  • How to convert simple data to LOD? How to engage users in making that data better?

  • A cultural heritage standard.

  • User interfaces. What do we do after we create all of this data? [applause]

  • Progress since the prior LODLAM (in San Francisco)? BIBFRAME?

  • Preserving linked data

  • The NSA has built the ultimate linked data tool chain. What can we learn?

  • Internal use cases for linked data.

  • How to make use of dirty metadata

  • A draft ontology for MODS metadata (MODSRDF)

  • Collaborating on a harvesting/enrichment tool

  • Getty Vocabulary is being released as LOD [applause], but they need help building a community making sure they have the right ontologies, early adopters, etc.

  • The data exhaust from dSPACE and linking it to world problems — find the disconnects between the people who have problems and people with info helpful for those problems

  • Identities and authorities — linked data as an app-independent way of doing identity control and management

  • RDF cataloging interface

  • Curation and social relationships

  • Linked Open Data echo systems

  • A new understanding of search — ways LODers search isn’t familiar to most people


  • Open Annotation tools enabling end users to enrich the graph

  • Our collections are different for a reason. That manifests itself in the data structure. We should talk about this.

  • In the business writ large, maybe we need the confidence to be invisible. What does that mean?

  • Feedback loops once data has been exposed

  • Wikidata — the database that supports Wikipedia

  • Forming an international group to discuss archival data, particularly in LOD

  • Comments Off on [lodlam] Topics

June 8, 2011

MacKenzie Smith on open licenses for metadata

MacKenzie Smith of MIT and Creative Commons talks about the new 4-star rating system for open licenses for metadata from cultural institutions:

The draft is up on the LOD-LAM site.

Here are some comments on the system from open access guru Peter Suber.


June 6, 2011

Peter Suber on the 4-star openness rating

One of the outcomes of the the LOD-LAM conference was a draft of an idea for a 4-star classification of openness of metadata from cultural institutions. The classification is nicely counter-intuitive, which is to say that it’s useful.

I asked Peter Suber, the Open Access guru, what he thought of it. He replied in an email:

First, I support the open knowledge definition and I support a star system to make it easy to refer to different degrees of openness.

* I’m not sure where this particular proposal comes from. But I recommend working with the Open Knowledge Foundation, which developed the open knowledge definition. The more key players who accept the resulting star system, the more widely it will be used.

* This draft overlooks some complexity in the 3-star entry and the 2-star entry. Currently it suggests that attribution through linking is always more open than attribution by other means (say, by naming without linking). But this is untrue. Sometimes one is more difficult than the other. In a given case, the easier one is more open by lowering the barrier to distribution.

If you or your software had both names and links for every datasource you wanted to attribute, then attribution by linking and attribution by naming would be about equal in difficulty and openness. But if you had names without links, then obtaining the links would be an extra burden that would delay or impede distribution.

The disparity in openness grows as the number of datasources increases. On this point, see the Protocol for Implementing Open Access Data (by John Wilbanks for Science Commons, December 2007).

Relevant excerpt: “[T]here is a problem of cascading attribution if attribution is required as part of a license approach. In a world of database integration and federation, attribution can easily cascade into a burden for scientists….Would a scientist need to attribute 40,000 data depositors in the event of a query across 40,000 data sets?” In the original context, Wilbanks uses this (cogently) as an argument for the public domain, or for shedding an attribution requirement. But in the present context, it complicates the ranking system. If you *did* have to attribute a result to 40,000 data sources, and if you had names but not links for many of those sources, then attribution by naming would be *much* easier than attribution by linking.

Solution? I wouldn’t use stars to distinguish methods of attribution. Make CC-BY (or the equivalent) the first entry after the public domain, and let it cover any and all methods of attribution. But then include an annotation explaining that some methods attribution increase the difficulty of distribution, and that increasing the difficulty will decrease openness. Unfortunately, however, we can’t generalize about which methods of attribution raise and lower this barrier, because it depends on what metadata the attributing scholar may already possess or have ready to hand.

* The overall implication is that anything less open than CC-BY-SA deserves zero stars. On the one hand, I don’t mind that, since I’d like to discourage anything less open than CC-BY-SA. On the other, while CC-BY-NC and CC-BY-ND are less open than CC-BY-SA, they’re more open than all-rights-reserved. If we wanted to recognize that in the star system, we’d need at least one more star to recognize more species.

I responded with a question: “WRT to your naming vs. linking comments: I assumed the idea was that it’s attribution-by-link vs. attribution-by-some-arbitrary-requirement. So, if I require you to attribute by sticking in a particular phrase or mark, I’m making it harder for you to just scoop up and republish my data: Your aggregating sw has to understand my rule, and you have to follow potentially 40,000 different rules if you’re aggregating from 40,000 different databases.

Peter responded:

You’re right that “if I require you to attribute by sticking in a particular phrase or mark, I’m making it harder for you to just scoop up and republish my data.” However, if I already have the phrases or marks, but not the URLs, then requiring me to attribute by linking would be the same sort of barrier. My point is that the easier path depends on which kinds of metadata we already have, or which kinds are easier for us to get. It’s not the case that one path is always easier than another.

But it might be the case that one path (attribution by linking) is *usually* easier than another. That raises a nice question: should that shifting, statistical difference be recognized with an extra star? I wouldn’t mind, provided we acknowledged the exceptions in an annotation.

1 Comment »

June 5, 2011

How to digitize a million books

Brewster Kahle gives a tour of one of the Internet Archive‘s book scanning facilities. This one is part of the Archive’s San Francisco headquarters:

Recorded during a tour of the facilities, as part of the LOD-LAM conference.

1 Comment »

June 2, 2011

[lodlam] The rise of Linked Open Data

At the Linked Open Data in Libraries, Archives and Museums conf [LODLAM], Jonathan Rees casually offered what I thought was useful a distinction. (Also note that I am certainly getting this a little wrong, and could possibly be getting it entirely wrong.)

Background: RDF is the basic format of data in the Semantic Web and LOD; it consists of statements of the form “A is in some relation to B.”

My paraphrase: Before LOD, we were trying to build knowledge representations of the various realms of the world. Therefore, it was important that the RDF triples expressed were true statements about the world. In LOD, triples are taken as a way of expressing data; take your internal data, make it accessible as RDF, and let it go into the wild…or, more exactly, into the commons. You’re not trying to represent the world; you’re just trying to represent your data so that it can be reused. It’s a subtle but big difference.

I also like John Wilbanks‘ provocative tweet-length explanation of LOD: “Linked open data is duct tape that some people mistake for infrastructure. Duct tape is awesome.”

Finally, it’s pretty awesome to be at a techie conference where about half the participants are women.


« Previous Page