Joho the Blog

July 6, 2013

[misc][2b2k] Why ontologies make me nervous

A few days ago there was a Twitter back and forth between two people I deeply respect: Dan Brickley [twitter:danbri] and Ed Summers [twitter:edsu]. It started with Ed responding to a tweet about a brief podcast I did with Kevin Ford [twitter:3windmills], who is on the team working on BibFrame:

@3windmills @dweinberger @danbri doing data representation independent of apps that use it is bordering on a waste of time imho.

— Ed Summers (@edsu) July 2, 2013

After a couple of tweets, Dan tweeted the following:

@dweinberger @edsu with say a DTD you know what you're getting. A JobPosting XML doc must have an employer, a salary, etc.

— Dan Brickley (@danbri) July 2, 2013

...whereas in rdf/onto you just get to say 'jobs have employers, salaries' without making rules for what each data doc has

— Dan Brickley (@danbri) July 2, 2013

There followed some agreement that it's often helpful to have apps driving the development of standards. (Kevin agrees with this, and points to BibFrame's process.) But, Dan's comment clarified my understanding of why ontologies make me nervous.

Over the past hundred years or so, we've come to a general recognition that all classifications and categorizations are tools, not representations of The Real Order. The periodic table of the elements is a useful way of organizing information, and manifests real relationships among the elements, but it is not the single "real" way the elements are arranged; if you're an economist or an industrialist, a chart that arranges the elements based on where they exist on our planet might be just as valid. Likewise, Linneaus' classification scheme is useful and manifests some real relationships, but if you're a chef you might have a different way of carving up the animal kingdom. Linneaus chose to organize species based upon visible differences — which might not be the "essential" differences — so that his scheme would be useful to scientists in the field. Although he was sometimes ambiguous about this, he seems not to have thought that he was discerning God's own order. Since Linnaeus we have become much more explicit in our understanding that how we classify depends on what we're trying to accomplish.

For example, a DTD (document type definition) typically is designed not to capture the eternal essence of some type of document, but to make the document more usable by systems that automate the document's production and processing. For example, an industry might agree on a DTD for parts catalogs that specifies that a parts catalog must have an element called "part" and that a part must have a type, part number, length, height, weight, material, and a description, and optionally can note whether it turns clockwise or counterclockwise. Each of these elements would have a standard name (e.g., "part_number," not "part#"). The result is a document that describes parts in a standard way so that a company can receive descriptions from all of its suppliers and automatically build a database of the parts it uses.

A DTD therefore is designed with an eye toward what properties are going to be useful. In some industries, it might include a term that captures how shiny the part is, but if it's a DTD for surgical equipment, that may not be relevant enough to include...although "sanitary_packaging" might be. Likewise, how quickly a bolt transfers heat might seem irrelevant, at least until NASA places an order. In this DTD's are much like forms: You don't put a field for earlobe length in the college application form you're designing.

Ontologies are different. They can try to express the structure of a domain independent of any particular use, so that the widest variety of applications can share data, including apps from domains outside of the one that's been mapped. So, to use Dan's example, your ontology of jobs would note that jobs have employers and workers, that they may have a salary or other form of compensation, that they can be part-time, full-time, seasonal, etc. As an ontology designer, because you're trying to think beyond whatever applications you already can imagine, your aim (often, not always) is to provide the fullest possible set of slots just in case someone sometime needs that info. And you will carefully describe the relationships among the elements so that apps and researchers can use knowledge that is implicit in the model.

The line between DTD's and ontologies is fuzzy. Many ontologies are designed with classes of apps in mind, and some DTD's have tried to be hugely general purpose. My discomfort really comes down to a distrust of the concept of "knowledge representation" that underlies some ontologies (especially earlier ones). The complexity of the relationships among parts will always outstrip our attempts to capture and codify those relationships. Further, knowledge cannot be fully represented because it isn't a thing apart from our continuous invention, discovery, and engagement with it.

What it comes down to is that if you talk about ontologies as knowledge representations I'll mutter something under my breath and change the topic.

Follow me

Categories: misc Tagged with: 2b2k • dtd • everythingismisc • ontologies • sgml Date: July 6th, 2013 dw

6 Comments »

December 30, 2011

Quirky html

In the recent — and probably unabated — unpleasantness attending the launch of the update of this blog’s look, I have learned a little about Quirks mode. I learned this because Internet Explorer 9 was not displaying rounded corners or laying out divs (blocks of content) the way Firefox and Chrome were. Once I switched off Quirks mode in my blog pages, it worked much better.

There’s a good explanation and some very detailed info here. But as I “understand” the story, quirks mode was introduced to handle the problem that different browsers were expecting different sorts of markup (particular for CSS style information). Then, once the browsers realized it would be helpful to everyone if they agreed to support truly standardized standards, they had to decide what to do with the old code written in the particularities of each browser. So, they agreed to allow the HTML developer to specify whether the page she’s created should be interpreted according to the modern standardized standards, or if it’s quirky and ought to be interpreted according to the idiosyncratic expectations of the various browsers.

You, the HTML developer, specify your intentions in the DTD declaration at the very beginning of your HTML pages. This page will help you figure out exactly how that line should read.

Meanwhile, the shame and humiliation the launch of the new look of this blog has brought upon me only deepens, for I have given up on controlling the placement of divs by getting my floats in order. Screw it. I’ve plunked them into a table. Yeah, I’m CSS-ing like it’s 1995.

Follow me

Categories: tech Tagged with: css • dtd • html • quirk • tech Date: December 30th, 2011 dw

2 Comments »