By
Pat Hayes
If
the semantic web needed a symbol, a good one to use would be a
Navaho
dream-catcher: a small web, lovingly hand-crafted, ely to look
at,
and rumored to catch dreams; but really more of a symbol than a
reality.
There
are many visions of the semantic web, some of them more
interesting,
some more likely to make money, some more likely to
happen
in the near future. The excitement of these visions has
attracted
many people to the concept from a variety of different
intellectual
backgrounds - databases, logic programming, AI knowledge
representation,
description logics and programming languages, among
others.
The result is that there have been many different forces
pulling
the language designs in different directions.
On
the whole, the description logics seem to be winning. OIL -
arguably
the first proposed web-based standard - and DAML are
essentially
the same language written in different syntactic forms,
and
they are both quintessential description logics. Now, description
logics
- DLs - have some very fine features. They can be seen as a
kind
of hybrid of industrial-strength data modelling tools with a
limited
form of conventional logics, located at a particularly nice
place
on the trade-off curve slung between the extremes of a highly
expressive
- but computationally intractable - full logic, and a
highly
efficient - but almost autistic - database notation. DLs have
become
a standard tool for professional ontology builders in
industrial
and commercial settings.
But
is this kind of strength needed for the semantic web? My own view
is
that this expressiveness/efficiency tradeoff, that has dominated
the
professional ontology field's thinking for so long, is far less
relevant
to the semantic web vision - or at any rate, the most
exciting
versions of that vision - than it has been for the
traditional
tasks that ontologies have been designed and used for;
and
that the overhead required by DLs, particularly the conceptual
overhead,
is now a barrier and an impediment to progress.
Considered
as content languages, description logics are like logics
with
safety guards all over them. They come covered with warnings and
restrictions:
you cannot say things of this form, you cannot write
rules
like that, you cannot use arbitrary disjunctions, you cannot
use
negation freely, you cannot speak of classes of literals, and so
on.
A beginning user might ask, why all the restrictions? It's not as
if
any of these things are mysterious or meaningless or paradoxical,
so
why can't I be allowed to write them down on my web page as
markup? The answer is quite revealing: if we let you
do that, you
could
write things that our reasoning engines might be unable to
handle.
As long as you obey our rules, we can guarantee that the
inference
engines will be able to generate the answers within some
predetermined
bounds. That is what DLs are for, to ensure that
large-scale
industrial ontologies can be input to inference machinery
and
it still be possible to provide a guarantee that answers will be
found,
that inferential search spaces will not explode, and in
general
that things will go well. Providing the guarantee is part of
the
game: DL's typically can be rigorously proven to be at least
decideable,
and preferably to be in some tractable complexity class.
There
is also enough experience with deployed DL use to give our
humble
beginner some advice: instead of using negation, you can
rephrase
your problem in terms of disjointness of classes, and then
you
can do it this way...; or, instead of saying that a equals b
(sorry,
we can't let you use "equals" , that is far too dangerous),
you
can say that the class whose members are a and b and nothing else
has
a cardinality of two... And so on. The result is that users of
DAML+OIL
need to take a course in how to say things in peculiar and
unintuitive
ways, because the safety guards prevent them from saying
things
naturally.
Now,
this is not an insurmountable barrier to a determined
professional
user: it's not harder than learning, say, Applescript.
Once
you get used to the rather odd way of thinking, writing DAML+OIL
can
even be kind of fun. But it is a huge barrier to widespread
acceptance
of a web language for markup; and, more to the point, it
is
fundamentally unnecessary. The semantic web doesnt need all these
DL
guards and limitations, because it doesn't need to provide the
industrial-quality
guarantees of inferential performance. Using DLs
as
a semantic web content markup standard is a failure of
imagination:
it presumes that the Web is going to be something like a
giant
corporation, with the same requirements of predictability and
provable
performance. In fact (if the SW ever becomes a reality) it
will
be quite different from current industrial ontology practice in
many
ways. It will be far 'scruffier', for a start;
people will use
ingenious
tricks to scrape partly-ill-formed content from
ill-structured
sources, and there is no point in trying to prevent
them
doing so, or tutting with disapproval. But aside from that, it
will
be on a scale that will completely defeat any attempt to
restrict
inference to manageable bounds. If one is dealing with 10|9
assertions,
the difference between a polynomial complexity class and
something
worse is largely irrelevant. And, further, almost all of
this
content will be extremely simple and shallow, seen from a
logical
perspective. Worrying about the complexity class of the few
intricate
ontologies on the web is like being obsessed with the
quality
of the salt in a supermarket. It is notable that almost all
of
the DAML so far written uses only a small part of the vocabulary
of the language, and is almost entirely
concerned with simple class
inheritance.
Constructs like daml:minCardinalityQ (a restriction on a
property
defining the class of things which have a minimum number of
values
of that property in another class...what? Yes, precisely my
point)
are rarely, if ever, used.
If
the entire world were happy using description logics, then carping
would
be irrelevant. But it is not. The limitations of DAML are
already
a burden to progress, before the language has even been
seriously
deployed. The DAML-S effort to express services in DAML is
chafing
at the expressive limitations it imposes, and efforts to
develop
a 'rules' extension for DAML are being stymied by the
methodological
requirement, imposed by the description logicians,
that
any ability to add rules that would increase the expressiveness
too
far would run the risk of allowing people to say too much.
It
may be worth making this point in some detail. Like many other
academic
research fields, description logics have their own 'ground
rules'.
One of the basic assumptions of work in this field is that
full
logical expressiveness is to be avoided at all costs. (If one is
trying
to find the low point of the expressiveness/efficiency curve,
then
one place to definitely avoid is the far left-hand end, since we
know
that is as high as it can get.) But this reaction seems
ludicrous
when it is used to reject what would be otherwise quite
reasonable
proposals. For example, it is easy to imagine what an RDF
rules
language would be like; one could just marry together a
Prolog-style
Horn-clause reasoner with an RDF triples engine. Several
people
have already written such programs and they are in routine use
in
research settings. So why the delay? Because these allow one to
express
arbitrary logical implications. That sounds to me (and to
logical
programmers) like a plus, but to someone trained in the
description
logic world, this is a cardinal sin. Some way must be
found
to limit, constrain, or otherwise box in, such an ability; if
we
allow this kind of expressiveness to leak out, then there is no
telling
what our inference engines might do. The proper reaction is
to
agree, but learn to be happy about it. Indeed, there would be no
guarantees
that answers will always come back, or that inference
engines
will never time-out. But one should not expect such global
guarantees
on the web. If the semantic web becomes real, then the
economic
pressure on both content providers and content users will be
quite
sufficient to ensure that practical methods will be found to
avoid
a state of permanent disaster. We do not need to worry about
protecting
the integrity of our theoretical guarantees before the
business
even gets started, particularly when those worries are
impeding
progress..
I
think that what the semantic web needs is two rather different
things,
put together in a new way. It needs a content language whose
sole
function is to express, transmit and store propositions in a
form
that permits easy use by engines of one kind and another. There
is
no need to place restrictions or guards on this language, and it
should
be compact, easy to use, expressive and syntactically simple.
The
W3C basic standard is RDF, which is a good start, but nowhere
near
expressive enough. The best starting-point for such a content
language
is something like a simple version of KIF, though with an
XML-style
syntax instead of KIF's now archaic (though still elegant)
LISP-based
format. Subsets of this language can be described which
are
equivalent to DLs, but there really is no need to place elaborate
syntactic
boundaries on the language itself to prevent users from
saying
too much. Almost none of them will, in any case.
An
aside on logic. There is a widespread misapprehension that logic
is
'difficult' - like calculus is supposed to be in American high
schools.
In fact, basic logic is easier to use and understand than
description
logics; it has a simpler syntax, it has simpler inference
processes,
and it is closer to natural language. While there are some
subtle
aspects of logic, one is not obliged to use them or even to
consider
them.
The
second thing that the semantic web needs is a programming
language;
or perhaps even a suite of tools in an existing programming
language,
for manipulating the content. The current DAML/OIL/WOL
standards
get these two aspects jumbled up with one another: the
content
is all tangled up with limitations that are in place to
protect
the code (which is hidden inside the inference engines, but
still
needs protecting.) What we need to do it find a way to give the
code
to the world as well as the content, so that the planet-wide
community
of programmers can get started on making ingenious tools to
manipulate
content. I confess that I do not know how to do this, but
I
am sure we are going about it the wrong way at present. And if
anyone
has any good ideas, I'd love to hear about them.
Pat
Hayes
IHMC,
University of West Florida
--
---------------------------------------------------------------------
IHMC (850)434 8903 home
40
South Alcaniz St.
(850)202 4416 office
Pensacola, FL 32501 (850)202 4440
fax
phayes@ai.uwf.edu
http://www.coginst.uwf.edu/~phayes