Joho the Blog » linked data

October 16, 2012

[eim][semtechbiz] Enterprise Linked Data

David Wood of 3RoundStones.com is talking about Callimachus, an open source project that is also available through his company. [NOTE: Liveblogging. All bets on accuracy are off.]

We’re moving from PCs to mobile, he says. This is rapidly changing the Internet. 51% of Internet traffic is non-human, he says (as of Feb 2012). 35hrs of video are uploaded to YouTube every minute. Traditionally we dealt with this type of demand via data warehousing: put it all in one place for easy access. But that’s not true: we never really got it all in one place accessible through one interface. Jeffrey Pollock says we should be talking not about data integration but interoperability because the latter implies a looser coupling.

He gives some use cases:

  • BBC wanted to have a Web presence for all of its 1500 broadcasts per day. They couldn’t do it manually. So, they decided to grab data from the linked open data data cloud and assemble the pages automatically. They hired fulltime editors to curate Wikipedia. RDF enabled them to assuemble the pages.

  • O’Reilly Media switched to RDF reluctantly but for purely pragmatic reasons.

  • BestBuy, too. They used RDFa to embed metadata into their pages to improve their SEO.

  • Elsevier uses Linked Data to manage their assets, from acquisition to delivery.

This is not science fiction, he says. It’s happening now.

Then two negative examples:

  • David says that Microsoft adopted RDF in the late 90s. But Netscape came out a portal tech based on RDF that scared Microsoft out of the standards effort. But they needed the tech, so they’ve reinvented it three times in proprietary ways.

  • Borders was too late in changing its tech.

Then he does a product pitch for Callimachus Enterperise: a content management system for enterprises.

Be the first to comment »

July 3, 2012

[2b2k]The inevitable messiness of digital metadata

This is cross posted at the Harvard Digital Scholarship blog

Neil Jeffries, research and development manager at the Bodleian Libraries, has posted an excellent op-ed at Wikipedia Signpost about how to best represent scholarly knowledge in an imperfect world.

He sets out two basic assumptions: (1) Data has meaning only within context; (2) We are not going to agree on a single metadata standard. In fact, we could connect those two points: Contexts of meaning are so dependent on the discipline and the user's project and standpoint that it is unlikely that a single metadata standard could suffice. In any case, the proliferation of standards is simply a fact of life at this point.

Given those constraints, he asks, what's the best way to increase the interoperability of the knowledge and data that are accumulating on line at at pace that provokes extremes of anxiety and joy in equal measures? He sees a useful consensus emerging on three points: (a) There are some common and basic types of data across almost all aggregations. (b) There is increasing agreement that these data types have some simple, common properties that suffice to identify them and to give us humans an idea about whether we want to delve deeper. (c) Aggregations themselves are useful for organizing data, even when they are loose webs rather than tight hierarchies. 

Neil then proposes RDF and linked data as appropriate ways to capture the very important relationships among ideas, pointing to the Semantic MediaWiki as a model. But, he says, we need to capture additional metadata that qualifies the data, including who made the assertion, links to differences of scholarly opinion, omissions from the collection, and the quality of the evidence. "Rather than always aiming for objective statements of truth we need to realise that a large amount of knowledge is derived via inference from a limited and imperfect evidence base, especially in the humanities," he says. "Thus we should aim to accurately represent  the state of knowledge about a topic, including omissions, uncertainty and differences of opinion."

Neil's proposals have the strengths of acknowledging the imperfection of any attempt to represent knowledge, and of recognizing that the value of representing knowledge lies mainly in its getting linked it to its sources, its context, its controversies, and to other disciplines. It seems to me that such a system would not only have tremendous pragmatic advantages, for all its messiness and lack of coherence it is in fact a more accurate representation of knowledge than a system that is fully neatened up and nailed down. That is, messiness is not only the price we pay for scaling knowledge aggressively and collaboratively, it is a property of networked knowledge itself. 

 

3 Comments »