logo
EverydayChaos
Everyday Chaos
Too Big to Know
Too Big to Know
Cluetrain 10th Anniversary edition
Cluetrain 10th Anniversary
Everything Is Miscellaneous
Everything Is Miscellaneous
Small Pieces cover
Small Pieces Loosely Joined
Cluetrain cover
Cluetrain Manifesto
My face
Speaker info
Who am I? (Blog Disclosure Form) Copy this link as RSS address Atom Feed

July 6, 2013

[misc][2b2k] Why ontologies make me nervous

A few days ago there was a Twitter back and forth between two people I deeply respect: Dan Brickley [twitter:danbri] and Ed Summers [twitter:edsu]. It started with Ed responding to a tweet about a brief podcast I did with Kevin Ford [twitter:3windmills], who is on the team working on BibFrame:

@3windmills @dweinberger @danbri doing data representation independent of apps that use it is bordering on a waste of time imho.

— Ed Summers (@edsu) July 2, 2013

After a couple of tweets, Dan tweeted the following:

@dweinberger @edsu with say a DTD you know what you're getting. A JobPosting XML doc must have an employer, a salary, etc.

— Dan Brickley (@danbri) July 2, 2013

...whereas in rdf/onto you just get to say 'jobs have employers, salaries' without making rules for what each data doc has

— Dan Brickley (@danbri) July 2, 2013

There followed some agreement that it's often helpful to have apps driving the development of standards. (Kevin agrees with this, and points to BibFrame's process.) But, Dan's comment clarified my understanding of why ontologies make me nervous.

Over the past hundred years or so, we've come to a general recognition that all classifications and categorizations are tools, not representations of The Real Order. The periodic table of the elements is a useful way of organizing information, and manifests real relationships among the elements, but it is not the single "real" way the elements are arranged; if you're an economist or an industrialist, a chart that arranges the elements based on where they exist on our planet might be just as valid. Likewise, Linneaus' classification scheme is useful and manifests some real relationships, but if you're a chef you might have a different way of carving up the animal kingdom. Linneaus chose to organize species based upon visible differences — which might not be the "essential" differences — so that his scheme would be useful to scientists in the field. Although he was sometimes ambiguous about this, he seems not to have thought that he was discerning God's own order. Since Linnaeus we have become much more explicit in our understanding that how we classify depends on what we're trying to accomplish.

For example, a DTD (document type definition) typically is designed not to capture the eternal essence of some type of document, but to make the document more usable by systems that automate the document's production and processing. For example, an industry might agree on a DTD for parts catalogs that specifies that a parts catalog must have an element called "part" and that a part must have a type, part number, length, height, weight, material, and a description, and optionally can note whether it turns clockwise or counterclockwise. Each of these elements would have a standard name (e.g., "part_number," not "part#"). The result is a document that describes parts in a standard way so that a company can receive descriptions from all of its suppliers and automatically build a database of the parts it uses.

A DTD therefore is designed with an eye toward what properties are going to be useful. In some industries, it might include a term that captures how shiny the part is, but if it's a DTD for surgical equipment, that may not be relevant enough to include...although "sanitary_packaging" might be. Likewise, how quickly a bolt transfers heat might seem irrelevant, at least until NASA places an order. In this DTD's are much like forms: You don't put a field for earlobe length in the college application form you're designing.

Ontologies are different. They can try to express the structure of a domain independent of any particular use, so that the widest variety of applications can share data, including apps from domains outside of the one that's been mapped. So, to use Dan's example, your ontology of jobs would note that jobs have employers and workers, that they may have a salary or other form of compensation, that they can be part-time, full-time, seasonal, etc. As an ontology designer, because you're trying to think beyond whatever applications you already can imagine, your aim (often, not always) is to provide the fullest possible set of slots just in case someone sometime needs that info. And you will carefully describe the relationships among the elements so that apps and researchers can use knowledge that is implicit in the model.

The line between DTD's and ontologies is fuzzy. Many ontologies are designed with classes of apps in mind, and some DTD's have tried to be hugely general purpose. My discomfort really comes down to a distrust of the concept of "knowledge representation" that underlies some ontologies (especially earlier ones). The complexity of the relationships among parts will always outstrip our attempts to capture and codify those relationships. Further, knowledge cannot be fully represented because it isn't a thing apart from our continuous invention, discovery, and engagement with it.

What it comes down to is that if you talk about ontologies as knowledge representations I'll mutter something under my breath and change the topic.

Tweet
Follow me

Categories: misc Tagged with: 2b2k • dtd • everythingismisc • ontologies • sgml Date: July 6th, 2013 dw

6 Comments »

December 18, 2012

[misc] I bet your ontology never thought of this one!

Paul Deschner and I had a fascinating conversation yesterday with Jeffrey Wallman, head of the Tibetan Buddhist Resource Center about perhaps getting his group’s metadata to interoperate with the library metadata we’ve been gathering. The TBRC has a fantastic collection of Tibetan books. So we were talking about the schemas we use — a schema being the set of slots you create for the data you capture. For example, if you’re gathering information about books, you’d have a schema that has slots for title, author, date, publisher, etc. Depending on your needs, you might also include slots for whether there are color illustrations, is the original cover still on it, and has anyone underlined any passages. It turns out that the Tibetan concept of a book is quite a bit different than the West’s, which raises interesting questions about how to capture and express that data in ways that can be useful mashed up.


But it was when we moved on to talking about our author schemas that Jeffrey listed one type of metadata that I would never, ever have thought to include in a schema: reincarnation. It is important for Tibetans to know that Author A is a reincarnation of Author B. And I can see why that would be a crucial bit of information.


So, let this be a lesson: attempts to anticipate all metadata needs are destined to be surprised, sometimes delightfully.

Tweet
Follow me

Categories: everythingIsMiscellaneous, libraries Tagged with: everythingIsMiscellaneous • metadata • ontologies • tibet Date: December 18th, 2012 dw

3 Comments »

April 29, 2009

Wolfram interview

The Berkman Center has posted the raw audio of my 55 minute interview with Stephen Wolfram, about his deeply cool WolframAlpha program (which he talked about here yesterday). On the other hand, if you wait a few days, you can skip some throat-clearing on my part, as well as my driving him down an alley based on my not seeing where WolframAlpha puts links to other pieces of information. As is so often the case, the edited version will be better.

[Tags: wolfram wolframalpha metadata search google semantic_web ontologies taxonomy everything_is_miscellaneous ]

Tweet
Follow me

Categories: Uncategorized Tagged with: everythingIsMiscellaneous • everything_is_miscellaneous • expertise • google • knowledge • libraries • metadata • ontologies • podcasts • search • semantic_web • taxonomy • wolfram • wolframalpha Date: April 29th, 2009 dw

Be the first to comment »

April 3, 2008

topicmaps] e-government

Petter Thorsrud is a senior advisor to the Norwegian government and responsible for the government’s Web site. He’s going to talk about the “State of the Nation” with regard to semantic interoperability. There was a forum last fall with many governmental groups participating, including education, municipal services, parliament, tax services, etc. Things are moving along.


Marit Lofnes Mellingen [maybe — that’s who’s listed in the program, but they didn’t introduce her by name] gives some examples of semantic interoperability. Semantics is about agreeing on names, she says. The agreement should be minimal so you don’t have to agree on the entire universe.

She points to examples in the health sector. In one case, there are 400 subjects organized into two levels of categories, with synonyms, as well as document type, date, organizational relation (= facet). It uses Dublin Core for documents. “MyPage” is a personalized info portal for citizens. It uses the LOS ontology.

Challenges: Extending the adoption of the common ontologies, merging them with others, driving the categories down to the right level of granularity (so users don’t get too much info). To do this, she thinks we should identify “semantic glue” on a lower level. Also, she’d like to see the ontologies published and made free to use, to enable mashups.

Robert Keil (ex of Razor Fish) says behavior is shifting: People now enter pages through searches, not only through the home page. And the number of portals is increasing. Users want info from the government, but there are many portals to the government.

He shows the Parliament portal” Stortinget.no. It tries to create semantic interoperability around topics. They try to make sure all the retrieved documents are relevant to the query. They use topic maps for this. The status of a matter is presented graphically, with the relevant documents arranged via the info in a topic map. They want to be able to show every parliamentary question with all the relevant info.

Altinn is an Internet portal for “public reporting.” You can get your forms and services there for 20 Norwegian government agencies. The information portal is based on topic maps. It’s smart about the dependency of forms on one another.

Status: Robert quotes Petter: “Before sustems can exchange data, the people behind the systems need to echange information.” Robert says there’s a lot of enthusiasm in the government for semanticizing its information. “We are past the tipping point.” [Tags: topic_maps semantic_web norway ontologies everything_is_miscellaneous ]

Tweet
Follow me

Categories: Uncategorized Tagged with: conference coverage • everythingIsMiscellaneous • norway • ontologies • taxonomy Date: April 3rd, 2008 dw

1 Comment »


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
TL;DR: Share this post freely, but attribute it to me (name (David Weinberger) and link to it), and don't use it commercially without my permission.

Joho the Blog uses WordPress blogging software.
Thank you, WordPress!