This work is copyrighted by Pat Hayes.  

Catching the Dreams

By Pat Hayes

 

If the semantic web needed a symbol, a good one to use would be a

Navaho dream-catcher: a small web, lovingly hand-crafted, ely to look

at, and rumored to catch dreams; but really more of a symbol than a

reality.

 

There are many visions of the semantic web, some of them more

interesting, some more likely to make money, some more likely to

happen in the near future. The excitement of these visions has

attracted many people to the concept from a variety of different

intellectual backgrounds - databases, logic programming, AI knowledge

representation, description logics and programming languages, among

others. The result is that there have been many different forces

pulling the language designs in different directions.

 

On the whole, the description logics seem to be winning. OIL -

arguably the first proposed web-based standard - and DAML are

essentially the same language written in different syntactic forms,

and they are both quintessential description logics. Now, description

logics - DLs - have some very fine features. They can be seen as a

kind of hybrid of industrial-strength data modelling tools with a

limited form of conventional logics, located at a particularly nice

place on the trade-off curve slung between the extremes of a highly

expressive - but computationally intractable - full logic, and a

highly efficient - but almost autistic - database notation. DLs have

become a standard tool for professional ontology builders in

industrial and commercial settings.

 

But is this kind of strength needed for the semantic web? My own view

is that this expressiveness/efficiency tradeoff, that has dominated

the professional ontology field's thinking for so long, is far less

relevant to the semantic web vision - or at any rate, the most

exciting versions of that vision - than it has been for the

traditional tasks that ontologies have been designed and used for;

and that the overhead required by DLs, particularly the conceptual

overhead, is now a barrier and an impediment to progress.

 

Considered as content languages, description logics are like logics

with safety guards all over them. They come covered with warnings and

restrictions: you cannot say things of this form, you cannot write

rules like that, you cannot use arbitrary disjunctions, you cannot

use negation freely, you cannot speak of classes of literals, and so

on. A beginning user might ask, why all the restrictions? It's not as

if any of these things are mysterious or meaningless or paradoxical,

so why can't I be allowed to write them down on my web page as

markup?  The answer is quite revealing: if we let you do that, you

could write things that our reasoning engines might be unable to

handle. As long as you obey our rules, we can guarantee that the

inference engines will be able to generate the answers within some

predetermined bounds. That is what DLs are for, to ensure that

large-scale industrial ontologies can be input to inference machinery

and it still be possible to provide a guarantee that answers will be

found, that inferential search spaces will not explode, and in

general that things will go well. Providing the guarantee is part of

the game: DL's typically can be rigorously proven to be at least

decideable, and preferably to be in some tractable complexity class.

 

There is also enough experience with deployed DL use to give our

humble beginner some advice: instead of using negation, you can

rephrase your problem in terms of disjointness of classes, and then

you can do it this way...; or, instead of saying that a equals b

(sorry, we can't let you use "equals" , that is far too dangerous),

you can say that the class whose members are a and b and nothing else

has a cardinality of two... And so on. The result is that users of

DAML+OIL need to take a course in how to say things in peculiar and

unintuitive ways, because the safety guards prevent them from saying

things naturally.

 

Now, this is not an insurmountable barrier to a determined

professional user: it's not harder than learning, say, Applescript.

Once you get used to the rather odd way of thinking, writing DAML+OIL

can even be kind of fun. But it is a huge barrier to widespread

acceptance of a web language for markup; and, more to the point, it

is fundamentally unnecessary. The semantic web doesnt need all these

DL guards and limitations, because it doesn't need to provide the

industrial-quality guarantees of inferential performance. Using DLs

as a semantic web content markup standard is a failure of

imagination: it presumes that the Web is going to be something like a

giant corporation, with the same requirements of predictability and

provable performance. In fact (if the SW ever becomes a reality) it

will be quite different from current industrial ontology practice in

many ways. It will be far 'scruffier', for a start;  people will use

ingenious tricks to scrape partly-ill-formed content from

ill-structured sources, and there is no point in trying to prevent

them doing so, or tutting with disapproval. But aside from that, it

will be on a scale that will completely defeat any attempt to

restrict inference to manageable bounds. If one is dealing with 10|9

assertions, the difference between a polynomial complexity class and

something worse is largely irrelevant. And, further, almost all of

this content will be extremely simple and shallow, seen from a

logical perspective. Worrying about the complexity class of the few

intricate ontologies on the web is like being obsessed with the

quality of the salt in a supermarket. It is notable that almost all

of the DAML so far written uses only a small part of the  vocabulary

of  the language, and is almost entirely concerned with simple class

inheritance. Constructs like daml:minCardinalityQ (a restriction on a

property defining the class of things which have a minimum number of

values of that property in another class...what? Yes, precisely my

point) are rarely, if ever, used.

 

If the entire world were happy using description logics, then carping

would be irrelevant. But it is not. The limitations of DAML are

already a burden to progress, before the language has even been

seriously deployed. The DAML-S effort to express services in DAML is

chafing at the expressive limitations it imposes, and efforts to

develop a 'rules' extension for DAML are being stymied by the

methodological requirement, imposed by the description logicians,

that any ability to add rules that would increase the expressiveness

too far would run the risk of allowing people to say too much.

 

It may be worth making this point in some detail. Like many other

academic research fields, description logics have their own 'ground

rules'. One of the basic assumptions of work in this field is that

full logical expressiveness is to be avoided at all costs. (If one is

trying to find the low point of the expressiveness/efficiency curve,

then one place to definitely avoid is the far left-hand end, since we

know that is as high as it can get.) But this reaction seems

ludicrous when it is used to reject what would be otherwise quite

reasonable proposals. For example, it is easy to imagine what an RDF

rules language would be like; one could just marry together a

Prolog-style Horn-clause reasoner with an RDF triples engine. Several

people have already written such programs and they are in routine use

in research settings. So why the delay? Because these allow one to

express arbitrary logical implications. That sounds to me (and to

logical programmers) like a plus, but to someone trained in the

description logic world, this is a cardinal sin. Some way must be

found to limit, constrain, or otherwise box in, such an ability; if

we allow this kind of expressiveness to leak out, then there is no

telling what our inference engines might do. The proper reaction is

to agree, but learn to be happy about it. Indeed, there would be no

guarantees that answers will always come back, or that inference

engines will never time-out. But one should not expect such global

guarantees on the web. If the semantic web becomes real, then the

economic pressure on both content providers and content users will be

quite sufficient to ensure that practical methods will be found to

avoid a state of permanent disaster. We do not need to worry about

protecting the integrity of our theoretical guarantees before the

business even gets started, particularly when those worries are

impeding progress..

 

I think that what the semantic web needs is two rather different

things, put together in a new way. It needs a content language whose

sole function is to express, transmit and store propositions in a

form that permits easy use by engines of one kind and another. There

is no need to place restrictions or guards on this language, and it

should be compact, easy to use, expressive and syntactically simple.

The W3C basic standard is RDF, which is a good start, but nowhere

near expressive enough. The best starting-point for such a content

language is something like a simple version of KIF, though with an

XML-style syntax instead of KIF's now archaic (though still elegant)

LISP-based format. Subsets of this language can be described which

are equivalent to DLs, but there really is no need to place elaborate

syntactic boundaries on the language itself to prevent users from

saying too much. Almost none of them will, in any case.

 

An aside on logic. There is a widespread misapprehension that logic

is 'difficult' - like calculus is supposed to be in American high

schools. In fact, basic logic is easier to use and understand than

description logics; it has a simpler syntax, it has simpler inference

processes, and it is closer to natural language. While there are some

subtle aspects of logic, one is not obliged to use them or even to

consider them.

 

The second thing that the semantic web needs is a programming

language; or perhaps even a suite of tools in an existing programming

language, for manipulating the content. The current DAML/OIL/WOL

standards get these two aspects jumbled up with one another: the

content is all tangled up with limitations that are in place to

protect the code (which is hidden inside the inference engines, but

still needs protecting.) What we need to do it find a way to give the

code to the world as well as the content, so that the planet-wide

community of programmers can get started on making ingenious tools to

manipulate content. I confess that I do not know how to do this, but

I am sure we are going about it the wrong way at present. And if

anyone has any good ideas, I'd love to hear about them.

 

Pat Hayes

IHMC, University of West Florida

--

---------------------------------------------------------------------

IHMC                                    (850)434 8903   home

40 South Alcaniz St.                    (850)202 4416   office

Pensacola,  FL 32501                    (850)202 4440   fax

phayes@ai.uwf.edu

http://www.coginst.uwf.edu/~phayes

Last changes, July 29, Steffen Staab