Joho the Blog[misc][2b2k] Why ontologies make me nervous - Joho the Blog

[misc][2b2k] Why ontologies make me nervous

A few days ago there was a Twitter back and forth between two people I deeply respect: Dan Brickley [twitter:danbri] and Ed Summers [twitter:edsu]. It started with Ed responding to a tweet about a brief podcast I did with Kevin Ford [twitter:3windmills], who is on the team working on BibFrame:

After a couple of tweets, Dan tweeted the following:

There followed some agreement that it's often helpful to have apps driving the development of standards. (Kevin agrees with this, and points to BibFrame's process.) But, Dan's comment clarified my understanding of why ontologies make me nervous.

Over the past hundred years or so, we've come to a general recognition that all classifications and categorizations are tools, not representations of The Real Order. The periodic table of the elements is a useful way of organizing information, and manifests real relationships among the elements, but it is not the single "real" way the elements are arranged; if you're an economist or an industrialist, a chart that arranges the elements based on where they exist on our planet might be just as valid. Likewise, Linneaus' classification scheme is useful and manifests some real relationships, but if you're a chef you might have a different way of carving up the animal kingdom. Linneaus chose to organize species based upon visible differences — which might not be the "essential" differences — so that his scheme would be useful to scientists in the field. Although he was sometimes ambiguous about this, he seems not to have thought that he was discerning God's own order. Since Linnaeus we have become much more explicit in our understanding that how we classify depends on what we're trying to accomplish.

For example, a DTD (document type definition) typically is designed not to capture the eternal essence of some type of document, but to make the document more usable by systems that automate the document's production and processing. For example, an industry might agree on a DTD for parts catalogs that specifies that a parts catalog must have an element called "part" and that a part must have a type, part number, length, height, weight, material, and a description, and optionally can note whether it turns clockwise or counterclockwise. Each of these elements would have a standard name (e.g., "part_number," not "part#"). The result is a document that describes parts in a standard way so that a company can receive descriptions from all of its suppliers and automatically build a database of the parts it uses.

A DTD therefore is designed with an eye toward what properties are going to be useful. In some industries, it might include a term that captures how shiny the part is, but if it's a DTD for surgical equipment, that may not be relevant enough to include...although "sanitary_packaging" might be. Likewise, how quickly a bolt transfers heat might seem irrelevant, at least until NASA places an order. In this DTD's are much like forms: You don't put a field for earlobe length in the college application form you're designing.

Ontologies are different. They can try to express the structure of a domain independent of any particular use, so that the widest variety of applications can share data, including apps from domains outside of the one that's been mapped. So, to use Dan's example, your ontology of jobs would note that jobs have employers and workers, that they may have a salary or other form of compensation, that they can be part-time, full-time, seasonal, etc. As an ontology designer, because you're trying to think beyond whatever applications you already can imagine, your aim (often, not always) is to provide the fullest possible set of slots just in case someone sometime needs that info. And you will carefully describe the relationships among the elements so that apps and researchers can use knowledge that is implicit in the model.

The line between DTD's and ontologies is fuzzy. Many ontologies are designed with classes of apps in mind, and some DTD's have tried to be hugely general purpose. My discomfort really comes down to a distrust of the concept of "knowledge representation" that underlies some ontologies (especially earlier ones). The complexity of the relationships among parts will always outstrip our attempts to capture and codify those relationships. Further, knowledge cannot be fully represented because it isn't a thing apart from our continuous invention, discovery, and engagement with it.

What it comes down to is that if you talk about ontologies as knowledge representations I'll mutter something under my breath and change the topic.

6 Responses to “[misc][2b2k] Why ontologies make me nervous”

  1. I don’t think you’re right about the typical ambitions of working ontology developers. Or even of KR theorists. Personally, I teach[1] in my KR classes that KRs are context sensitive and interest relative and thus need to be built in a manner that is mindful of their use. Indeed, the benefits that (we think) derive from using ontology languages are not limited to reuse, but include speed of construction, verification, and other things.

    Now, we do hope that by working at a more abstracted layer (i.e., closer to Chen’s “cognitive layer” we can achieve better reuse. But we have to ask what “reuse” means there. If we think about classic relational database/ER accounts (or any layered architecture), you hope that you can alter the lower level (the “physical” layer, or the implementation) without affecting the higher levels and things written to it. So, ideally, the meaning of my SQL queries shouldn’t change because my friendly DBA has added an index somewhere. But note that this doesn’t mean that the *system* designer is insensitive to application needs, just that, *ideally* we can separate and modularize concerns to some extent.

    I wouldn’t get hung up with the term KR per se. If to be a KR we must fully represented all of complex reality, then yes, it won’t work. But then we can’t really do anything :)

    [1] (see slide 5; note that this is more or less derived from a fairly popular article “What is a Knowledge Representation?”

  2. That’s certainly a fair comment, Bijan. And useful. I have KR stuck in AI days.

  3. Thanks very much for writing this up David. It is good to know that Bijan is teaching classes where he stresses that the value of a particular knowledge representation is dependent on how well fitted it is to a particular use, or context. You said it nicely with “knowledge cannot be fully represented because it isn’t a thing apart from our continuous invention, discovery, and engagement with it.” So nicely in fact that it reminded me that I still need to read Too Big To Know, which is now waiting to be opened on my kindle :-)

  4. DTDs and RDF ontologies are *orthogonal* – they do not necessarily overlap.

    A DTD defines the *internal shape* of an XML document (resource): it “declares precisely which elements and references may appear where in the document of the particular type” (Wikipedia), whereas an RDF ontology defines the *properties and relations* about and external to a resource: it “represents knowledge as a set of concepts within a domain, and the relationships between pairs of concepts”.

    Extensible MARKUP language. Resource DESCRIPTION format. Markup and description are different.

    Part of the confusion comes from the fact that historically we have (mis)used XML as a format for conveying data about a document (the whole “self-describing” idiom), because something like RDF wasn’t available. So a JobPosting DTD’s ultimate goal is to define what structure a JobPosting document has — for example the body content of the posting (like HTML) — and it can enforce that structure through validation. That structure *may* include elements that describe a JobPosting, but those definitions only exist within the domain of the DTD.

    A JobPosting RDF ontology’s goal is solely to describe a JobPosting document, but the internal structure and shape of the document are invisible to the ontology. This is why RDF works for any kind of resource (not just XML), it treats them opaquely; all an ontology can do is *contextualize* a resource. Could a JobPosting document include a element as well as possess a job:hasSalary property? Absolutely, but they are not the same, even if they have the same literal value.

    This becomes much more relevant as we move beyond XML and into other data-centric document formats like JSON that don’t have DTD-like validation. RDF lives happily independent of the document’s structure, so one can represent *the same* document in various serializations without the context changing.

  5. To be fair, there is a strain of the “capture the knowledge in an ideal form that’s usable everywhere” throughout KR’s history, though finding people who seriously and systematically argue for it is rather difficult. For exam, Pat Hayes’ Second Naive Physics Manifesto could be read this way, since he proposes not to worry about applications or (primarily reasoning) implementations. However, this proposal is primarily *methodological* rather than *prescriptive* or *ideal*. (Barry Smith style realism is far closer to the silly view with real proponents…and that’s modern day.)

    The flip side of the silly representation without application view is the over fitted to the application tangle that besets a lot of data. A critical danger is when the representation of the information gets primarily (or has some key portion) embodied in application code. This is generally unavoidable, but it makes even analyzing your data into the problem of analyzing an arbitrary program.

  6. It’s hard to find your website in google. I found
    it on 13 spot, you should build quality backlinks , it will help you to increase traffic.

    I know how to help you, just type in google – k2 seo tricks

Web Joho only

Comments (RSS).  RSS icon