David Wood of 3RoundStones.com is talking about Callimachus, an open source project that is also available through his company. [NOTE: Liveblogging. All bets on accuracy are off.]
We’re moving from PCs to mobile, he says. This is rapidly changing the Internet. 51% of Internet traffic is non-human, he says (as of Feb 2012). 35hrs of video are uploaded to YouTube every minute. Traditionally we dealt with this type of demand via data warehousing: put it all in one place for easy access. But that’s not true: we never really got it all in one place accessible through one interface. Jeffrey Pollock says we should be talking not about data integration but interoperability because the latter implies a looser coupling.
He gives some use cases:
BBC wanted to have a Web presence for all of its 1500 broadcasts per day. They couldn’t do it manually. So, they decided to grab data from the linked open data data cloud and assemble the pages automatically. They hired fulltime editors to curate Wikipedia. RDF enabled them to assuemble the pages.
O’Reilly Media switched to RDF reluctantly but for purely pragmatic reasons.
BestBuy, too. They used RDFa to embed metadata into their pages to improve their SEO.
Elsevier uses Linked Data to manage their assets, from acquisition to delivery.
This is not science fiction, he says. It’s happening now.
Then two negative examples:
David says that Microsoft adopted RDF in the late 90s. But Netscape came out a portal tech based on RDF that scared Microsoft out of the standards effort. But they needed the tech, so they’ve reinvented it three times in proprietary ways.
Borders was too late in changing its tech.
Then he does a product pitch for Callimachus Enterperise: a content management system for enterprises.
I’m at the Semantic Technology & Business conference in NYC. Matthew Degel, Senior Vice President and Chief Architect at Viacom Media Networks is talking about “Modeling Media and the Content Supply Chain Using Semantic Technologies.” [NOTE: Liveblogging. Getting things wrong. Mangling words. Missing points. Over- and under-emphasizing the wrong things. Not running a spellpchecker. You are warned!]
Matthew says that the problem is that we’re “drowning in data but starved for information” Tere is a “thirst for asset-centric views.” And of course, Viacom needs to “more deeply integrate how property rights attach to assets.” And everything has to be natively local, all around the world.
Viacom has to model the content supply chain in a holistic way. So, how to structure the data? To answer, they need to know what the questions are. Data always has some structure. The question is how volatile those structures are. [I missed about 5 mins m– had to duck out.]
He shows an asset tree, “relating things that are different yet the same,” with SpongeBob as his example: TV series, characters, the talent, the movie, consumer products, etc. Stations are not allowed to air a commercial with the voice actor behind Spoongey, Tom Kenney, during the showing of the SpongeBob show, so they need to intersect those datasets. Likewise, the video clip you see on your setup box’s guide is separate from, but related to, the original. For doing all this, Viacom is relying on inferences: A prime time version of a Jersey Shore episode, which has had the bad language censored out of it, is a version of the full episode, which is part of the series which has licensing contracts within various geographies, etc. From this Viacom can infer that the censored episode is shown in some geography under some licensing agreements, etc.
“We’ve tried to take a realistic approach to this.” As excited as they are about the promise, “we haven’t dived in with a huge amount of resources.” They’re solving immediate problems. They began by making diagrams of all of the apps and technologies. It was a mess. So, they extracted and encoded into a triplestore all the info in the diagram. Then they overlaid the DR data. [I don’t know what DR stands for. I’m guessing the D stands for Digital, and the R might be Resource]] Further mapping showed that some apps that they weren’t paying much attention to were actually critical to multiple systems. They did an ontology graph as a London Underground map. [By the way, Gombrich has a wonderful history and appreciation of those maps in Art and Representation, I believe.]
What’s worked? They’re focusing on where they’re going, not where they’ve been. This has let them “jettison a lot of intellectual baggage” so that they can model business processes “in a much cleaner and effective way.” Also, OWL has provided a rich modeling language for expressing their Enterprise Information Model.
What hasn’t worked?
“The toolsets really aren’t quite there yet.” He says that based on the conversations he’s had to today, he doesn’t think anyone disagrees with him.
Also, the modeling tools presume you already know the technology and the approach. Also, the query tools presume you have a user at a keyboard rather than as a backend of a Web service capable of handling sufficient volume. For example, he’d like “Crystal Reports for SPARQL,” as an example of a usable tool.
Visualization tools are focused on interactive use. You pick a class and see the relationships, etc. But if you want to see a traditional ERD diagram, you can’t.
- Also, the modeling tools present a “forward-bias.” E.g., there are tools for turning schemas into ontologies, but not for turning ontologies into a reference model for schema.
Matthew makes some predictions:
They will develop into robust tools
Semantic tech will enable queries such as “Show me all Madonna interviews where she sings, where the footage has not been previously shown, and where we have the license to distribute it on the Web in Australia in Dec.”
This is cross posted at the Harvard Digital Scholarship blog
Neil Jeffries, research and development manager at the Bodleian Libraries, has posted an excellent op-ed at Wikipedia Signpost about how to best represent scholarly knowledge in an imperfect world.
He sets out two basic assumptions: (1) Data has meaning only within context; (2) We are not going to agree on a single metadata standard. In fact, we could connect those two points: Contexts of meaning are so dependent on the discipline and the user's project and standpoint that it is unlikely that a single metadata standard could suffice. In any case, the proliferation of standards is simply a fact of life at this point.
Given those constraints, he asks, what's the best way to increase the interoperability of the knowledge and data that are accumulating on line at at pace that provokes extremes of anxiety and joy in equal measures? He sees a useful consensus emerging on three points: (a) There are some common and basic types of data across almost all aggregations. (b) There is increasing agreement that these data types have some simple, common properties that suffice to identify them and to give us humans an idea about whether we want to delve deeper. (c) Aggregations themselves are useful for organizing data, even when they are loose webs rather than tight hierarchies.
Neil then proposes RDF and linked data as appropriate ways to capture the very important relationships among ideas, pointing to the Semantic MediaWiki as a model. But, he says, we need to capture additional metadata that qualifies the data, including who made the assertion, links to differences of scholarly opinion, omissions from the collection, and the quality of the evidence. "Rather than always aiming for objective statements of truth we need to realise that a large amount of knowledge is derived via inference from a limited and imperfect evidence base, especially in the humanities," he says. "Thus we should aim to accurately represent the state of knowledge about a topic, including omissions, uncertainty and differences of opinion."
Neil's proposals have the strengths of acknowledging the imperfection of any attempt to represent knowledge, and of recognizing that the value of representing knowledge lies mainly in its getting linked it to its sources, its context, its controversies, and to other disciplines. It seems to me that such a system would not only have tremendous pragmatic advantages, for all its messiness and lack of coherence it is in fact a more accurate representation of knowledge than a system that is fully neatened up and nailed down. That is, messiness is not only the price we pay for scaling knowledge aggressively and collaboratively, it is a property of networked knowledge itself.
At the Linked Open Data in Libraries, Archives and Museums conf [LODLAM], Jonathan Rees casually offered what I thought was useful a distinction. (Also note that I am certainly getting this a little wrong, and could possibly be getting it entirely wrong.)
Background: RDF is the basic format of data in the Semantic Web and LOD; it consists of statements of the form “A is in some relation to B.”
My paraphrase: Before LOD, we were trying to build knowledge representations of the various realms of the world. Therefore, it was important that the RDF triples expressed were true statements about the world. In LOD, triples are taken as a way of expressing data; take your internal data, make it accessible as RDF, and let it go into the wild…or, more exactly, into the commons. You’re not trying to represent the world; you’re just trying to represent your data so that it can be reused. It’s a subtle but big difference.
I also like John Wilbanks‘ provocative tweet-length explanation of LOD: “Linked open data is duct tape that some people mistake for infrastructure. Duct tape is awesome.”
Finally, it’s pretty awesome to be at a techie conference where about half the participants are women.
Have semantic technologies reached the tipping point? Rene Reinsberg at the MIT Entrepreneurship Review says yes.
Im not sure what exactly a “tipping point” would be here, but it seems incontestable that semantic technologies are an important part of the Web and of business, and, taken broadly enough, always have been. I wonder, though, if the term “the semantic web” has reached a negative tipping point. Rene seems to use it pretty much interchangeably with “semantic technologies,” although the Semantic Web seems to promise something more world-wide-ish and systemic than the increasing use of semantic technologies.
Anyway, its an interesting post, with lots of links.
Tagged with: semantic web
Date: July 19th, 2010 dw
Data.gov has announced that it’s making some data sets available as RDF triples so Semantic Webbers can start playing with it. There’s an index of data here. The site says that even though only a relative handful of datasets have been RDF’ed, there are 6.4 billion triples available. They’ve got some examples of RDF-enabled visualizations here and here, and some more as well.
Data.gov also says they’re working with RPI to come up with a proposal for “a new encoding of datasets converted from CSV (and other formats) to RDF” to be presented for worldwide consideration: “We’re looking forward to a design discussion to determine the best scheme for persistent and dereferenceable government URI naming with the international community and the World Wide Web Consortium to promote international standards for persistent government data (and metadata) on the World Wide Web.” This is very cool. A Uniform Resource Identifier points to a resource; it is dereferenceable if there is some protocol for getting information about that resource. So, Data.gov and RPI are putting together a proposal for how government data can be given stable Web addresses that will predictably yield useful information about that data.
Kate Ray has created a short, clear video about the promise of the Semantic Web. It consists mainly of snippets of interviews with various folks (including briefly and vapidly me). It’s got snazzy graphics and a balance of views. Nicely done.
Tagged with: semantic web
Date: May 8th, 2010 dw