Joho the Blog
An Entry from the Archives

« Siderean's tagged facets || Back to Blog | Everything Good Is Bad for Rousseau »

April 29, 2006

Degrees of RDF

I have am undoubtedly dumb question about the Semantic Web.

Let's say I want to express in an RDF triple not simply that A relates to B, but the degree of A's relationship to B. E.g.:

Bill is 85% committed to Mary

The tint of paint called Purple Dawn is 30% red

Frenchie is 75% likely to beat Lefty

Niagara Falls is 80% in Canada

Other than making up a set of 100 different relationships (e.g., "is in 1%," "is in 2%," etc.), how can that crucial bit of metadata about the relationship be captured in RDF?

Note that I am not asking for practical reasons. I have a theoretical interest in the topic. [Tags: semantic_web rdf dumb_questions]

Posted by D. Weinberger at April 29, 2006 03:54 PM


Comments

You can't. What you have to do is to introduce an intermediate node with a value. So instead of A-B we have A-C-B where C has a value of the link between A and B.

eg Bill -> Commitment(85%) -> Mary
instead of
Bill ->85%-> Mary
This turned out to be a bit of a bitch in FOAF.
Simple case: A Knows B
Sub-class: A friendOf B
Intermediate node: A knows KnowsValue knows B
Alternate:
A Knows B,
A knowsvalue(85%) B

It doesn't feel right!

Posted by: julian bond [TypeKey Profile Page] | April 29, 2006 07:21 PM


I suspect there are multiple ways, but one is with identifiers for, say, part-of-niagra-falls (ponf) with triples like:

<ponf-1> <is-part-of> <Niagra Falls>
<ponf-2> <is-part-of> <Niagra Falls>

<ponf-1> <is-in> <United States>
<ponf-1> <is-part-of-whole-to-degree> 80%

<ponf-2> <is-in> <Canada>
<ponf-2> <is-part-of-whole-to-degree> 20%

.micah

Posted by: Micah Dubinko | April 29, 2006 07:25 PM


:bill :loves :mary .
:loves :in { :bill :loves :mary } ;
:hasDegree "85%" .

Not sure though.

Posted by: Mike | April 29, 2006 07:31 PM


There's a good document about this kind of concerns: Defining N-ary Relations on the Semantic Web (http://www.w3.org/TR/2006/NOTE-swbp-n-aryRelations-20060412/#useCase1)

It describes how to handle n-ary relations and use additional attributes for a relation.

Posted by: Juan Manuel Caicedo | April 29, 2006 09:50 PM


As you noticed, it's not something you can express directly in a (useful) statement in RDF. To get the benefit of machines the compromise is that you have to put things in terms that can be handled using easy logic. IANAL, but I believe it's a computational nightmare to be creative with the relationship.

But as Julian, Micah and Mike suggest, the same information can be expressed by twisting the sentences around a little - there is something called Commitment involving Bill, Mary and a percentage.

It's not a problem of RDF per se, consider how you would (again *usefully*) express the same information for a SQL DB or Javascript.

There's a related write-up:

http://www.w3.org/TR/swbp-n-aryRelations/

....

Ok, must try one:

<http://colors.org/pd> a :Tint;
:name "Purple Dawn";
:red "0.3"^^xsd:float .

Posted by: Danny | April 29, 2006 09:56 PM


There are some other approaches to this, including using fuzzy set theory and using probabilities. Yun Peng and his students in the UMBC ebiquity lab have been doing some interesting work on integrating Bayesian reasoning and OWL -- the probabilistic approach. Here are some recent papers.

Posted by: tim finin | April 29, 2006 11:08 PM


Just a quick pointer to a related thread at Shelley Powers'...

Oh, and a sidebar comment about what do these percentages mean anyway? Does the Purple Dawn paint contain thirty percent of a specific colorizing agent, or does it reflect 30% of the red light when it is dry and on the wall? And how much water flows over Niagara Falls, and is that the measurement you're looking for when you say 80% or are you looking for some simple geometric assertion? And the fourth time Frenchy assaults Lefty, does he just slap him/her around a little thus not "beat" him/her and how do you quantize a beating anyway?

Regarding Bill's commitment to Mary, I suspect that "greatly but not totally committed" describes the condition better than 85% committed.

All of which goes to say that I like it that the programmers ran upstairs to chop code while the analysts are still assessing the requirements. You get a lot more work done that way.

Posted by: fp | April 30, 2006 01:17 PM


In general, this problem is similar to how a web of pages isn't good at representing certain database structures. The web is made of autonomous chunks connecting in lots of autonomous ways.

But, it's sometimes harder to represent more strictly / tighly related large chunks on the web.

(IMHO, RDF is designed for a web of data to have the same freedom of autonomy as the web of pages--so it's more like the web than like an RDBMS.)

Your examples could be interpreted as shorthands that imply a bunch of logical relationships. These relationships can be broken up into smaller logical chunks.

(And, RDBMs are often designed specifically to efficiently store these bunches of relationships together, e.g., as n-ary relations. RDF is binary relations.)

So, breaking one example down into smaller chunks:

"Niagara Falls is 80% in Canada"

1. Niagra Falls is an area on a map
2. Canada is an area on a map
3. US is an area on a map

4. Canada intersects with Niagra Falls
5. US intersects with Niagra Falls

6. [4] is an area on a map
7. [5] is an area on a map

8. [1] has dimension 100 square miles
9. [6] has dimension 80 square miles
10. [7] has dimension 20 square miles

This is a lot more complicated than saying "Niagara Falls is 80% in Canada", but the smaller chunks have two redeeming qualities:

a. each statement uses the same data structure (i.e., you don't have to create a new structure every time you want to deal with different kinds of data)

b. most statements are totally independent of others (e.g., they could be spread out in time or place on the web)

Of course, you could also just create an RDF-compatible percent-in-Canada vocabulary, e.g.:

Niagra Falls percent-in-Canada 80

In other words, if you don't need things to be flexible or general, you can always be super specific.

(Then, if someone needed to convert to something different, they could create an ontology that indicates how to map your percent-in-Canada vocabulary to a less specific vocab like the area-based one I suggested above.)

Posted by: Jay Fienberg | April 30, 2006 04:08 PM


NF is-partly-in Canada

(NF is-partly-in Canada) has-degree 80%

Posted by: Anon | April 30, 2006 07:17 PM


You've heard of Microformats, right?

You may want to drop the group a line...

Web standards that are community driven...;)

Posted by: Tara 'Miss Rogue' Hunt | May 8, 2006 09:22 PM


Post a comment

Guidelines for Commenting

Basically, you can say what you want. (Click here for the fine print.)

If you haven't left a comment here before, your comment may be put into a queue for me to approve. Sorry for the delay. Blame the damn spammers.