Semantic Web Standards
Slides based on Ian Horrocks class
Where we are Today the Syntactic Web
Hendler Miller 02
The Syntactic Web is
  • A hypermedia, a digital library
  • A library of documents called (web pages)
    interconnected by a hypermedia of links
  • A database, an application platform
  • A common portal to applications accessible
    through web pages, and presenting their results
    as web pages
  • A platform for multimedia
  • BBC Radio 4 anywhere in the world! Terminator 3
  • A naming scheme
  • Unique identity for those documents
  • A place where computers do the presentation
    (easy) and people do the linking and interpreting
  • Why not get computers to do more of the hard

Goble 03
Hard Work using the Syntactic Web
Find images of Peter Patel-Schneider, Frank van
Harmelen and Alan Rector
Rev. Alan M. Gates, Associate Rector of the
Church of the Holy Spirit, Lake Forest, Illinois
Impossible (?) using the Syntactic Web
  • Complex queries involving background knowledge
  • Find information about animals that use sonar
    but are not either bats or dolphins
  • Locating information in data repositories
  • Travel enquiries
  • Prices of goods and services
  • Results of human genome experiments
  • Finding and using web services
  • Visualise surface interactions between two
  • Delegating complex tasks to web agents
  • Book me a holiday next weekend somewhere warm,
    not too far away, and where they speak French or

What is the Problem?
  • Consider a typical web page
  • Markup consists of
  • rendering information (e.g., font size and
  • Hyper-links to related content
  • Semantic content is accessible to humans but not
    (easily) to computers

What information can we see
  • WWW2002
  • The eleventh international world wide web
  • Sheraton waikiki hotel
  • Honolulu, hawaii, USA
  • 7-11 may 2002
  • 1 location 5 days learn interact
  • Registered participants coming from
  • australia, canada, chile denmark, france,
    germany, ghana, hong kong, india, ireland, italy,
    japan, malta, new zealand, the netherlands,
    norway, singapore, switzerland, the united
    kingdom, the united states, vietnam, zaire
  • Register now
  • On the 7th May Honolulu will provide the backdrop
    of the eleventh international world wide web
    conference. This prestigious event
  • Speakers confirmed
  • Tim berners-lee
  • Tim is the well known inventor of the Web,
  • Ian Foster
  • Ian is the pioneer of the Grid, the next
    generation internet

What information can a machine see
  • WWW2002
  • The eleventh international world wide web
  • Sheraton waikiki hotel
  • Honolulu, hawaii, USA
  • 7-11 may 2002
  • 1 location 5 days learn interact
  • Registered participants coming from
  • australia, canada, chile denmark, france,
    germany, ghana, hong kong, india, ireland, italy,
    japan, malta, new zealand, the netherlands,
    norway, singapore, switzerland, the united
    kingdom, the united states, vietnam, zaire
  • Register now
  • On the 7th May Honolulu will provide the backdrop
    of the eleventh international world wide web
    conference. This prestigious event
  • Speakers confirmed
  • Tim berners-lee
  • Tim is the well known inventor of the Web,
  • Ian Foster
  • Ian is the pioneer of the Grid, the next
    generation internet

Solution XML markup with meaningful tags?
  • ltnamegtWWW2002
  • The eleventh international world wide
  • ltlocationgtSheraton waikiki hotel
  • Honolulu, hawaii, USAlt/locationgt
  • ltdategt7-11 may 2002lt/dategt
  • ltslogangt1 location 5 days learn interactlt/slogangt
  • ltparticipantsgtRegistered participants coming from
  • australia, canada, chile denmark, france,
    germany, ghana, hong kong, india, ireland, italy,
    japan, malta, new zealand, the netherlands,
    norway, singapore, switzerland, the united
    kingdom, the united states, vietnam,
  • ltintroductiongtRegister now
  • On the 7th May Honolulu will provide the backdrop
    of the eleventh international world wide web
    conference. This prestigious event
  • Speakers confirmedlt/introductiongt
  • ltspeakergtTim berners-leelt/speakergt
  • ltbiogtTim is the well known inventor of the

But What About
  • ltconfgtWWW2002
  • The eleventh international world wide
  • ltplacegtSheraton waikiki hotel
  • Honolulu, hawaii, USAlt/placegt
  • ltdategt7-11 may 2002lt/dategt
  • ltslogangt1 location 5 days learn interactlt/slogangt
  • ltparticipantsgtRegistered participants coming from
  • australia, canada, chile denmark, france,
    germany, ghana, hong kong, india, ireland, italy,
    japan, malta, new zealand, the netherlands,
    norway, singapore, switzerland, the united
    kingdom, the united states, vietnam,
  • ltintroductiongtRegister now
  • On the 7th May Honolulu will provide the backdrop
    of the eleventh international world wide web
    conference. This prestigious event
  • Speakers confirmedlt/introductiongt
  • ltspeakergtTim berners-leelt/speakergt
  • ltbiogtTim is the well known inventor of the Web,

Machine sees
  • ltnamegtWWW2002
  • The eleventh international world wide webclt/namegt
  • ltlocationgtSheraton waikiki hotel
  • Honolulu, hawaii, USAlt/locationgt
  • ltdategt7-11 may 2002lt/dategt
  • ltslogangt1 location 5 days learn interactlt/slogangt
  • ltparticipantsgtRegistered participants coming from
  • australia, canada, chile denmark, france,
    germany, ghana, hong kong, india, ireland, italy,
    japan, malta, new zealand, the netherlands,
    norway, singapore, switzerland, the united
    kingdom, the united states, vietnam,
  • ltintroductiongtRegister now
  • On the 7th May Honolulu will provide the backdrop
    of the eleventh international world wide web
    conference. This prestigious event
  • Speakers confirmedlt/introductiongt
  • ltspeakergtTim berners-leelt/speakergt
  • ltbiogtTim is the well known inventor of the
  • ltspeakergtIan Fosterlt/speakergt
  • ltbiogtIan is the pioneer of the Grid, the nelt/biogt

Need to Add Semantics
  • External agreement on meaning of annotations
  • E.g., Dublin Core
  • Agree on the meaning of a set of annotation tags
  • Problems with this approach
  • Inflexible
  • Limited number of things can be expressed
  • Use Ontologies to specify meaning of annotations
  • Ontologies provide a vocabulary of terms
  • New terms can be formed by combining existing
  • Meaning (semantics) of such terms is formally
  • Can also specify relationships between terms in
    multiple ontologies

History of the Semantic Web
  • Web was invented by Tim Berners-Lee (amongst
    others), a physicist working at CERN
  • TBLs original vision of the Web was much more
    ambitious than the reality of the existing
    (syntactic) Web
  • TBL (and others) have since been working towards
    realising this vision, which has become known as
    the Semantic Web
  • E.g., article in May 2001 issue of Scientific

Scientific American, May 2001
Beware of the Hype
Beware of the Hype
  • Hype seems to suggest that Semantic Web means
    semantics web AI
  • A new form of Web content that is meaningful to
    computers will unleash a revolution of new
  • More realistic to think of it as meaning
    semantics web AI more useful web
  • Realising the complete vision is too hard for
    now (probably)
  • But we can make a start by adding semantic
    annotation to web resources

Images from Christine Thompson and David Booth
Web Schema Languages
  • Existing Web languages extended to facilitate
    content description
  • XML ? XML Schema (XMLS)
  • RDF ? RDF Schema (RDFS)
  • XMLS not an ontology language
  • Changes format of DTDs (document schemas) to be
  • Adds an extensible type hierarchy
  • Integers, Strings, etc.
  • Can define sub-types, e.g., positive integers
  • RDFS is recognisable as an ontology language
  • Classes and properties
  • Sub/super-classes (and properties)
  • Range and domain (of properties)

  • RDF stands for Resource Description Framework
  • It is a W3C candidate recommendation
  • RDF is graphical formalism ( XML syntax
  • for representing metadata
  • for describing the semantics of information in a
    machine- accessible way
  • RDFS extends RDF with schema vocabulary, e.g.
  • Class, Property
  • type, subClassOf, subPropertyOf
  • range, domain

The RDF Data Model
  • Statements are ltsubject, predicate, objectgt
  • Can be represented using XML serialisation, e.g.
  • ltIan,hasColleague,Uligt
  • Statements describe properties of resources
  • A resource is a URI representing a (class of)
  • a document, a picture, a paragraph on the Web
  • http//
  • a book in the library, a real person (?)
  • isbn//5031-4444-3333
  • Properties themselves are also resources (URIs)

  • URI Uniform Resource Identifier
  • "The generic set of all names/addresses that are
    short strings that refer to resources
  • URIs may or may not be dereferencable
  • URLs (Uniform Resource Locators) are a particular
    type of URI, used for resources that can be
    accessed on the WWW (e.g., web pages)
  • In RDF, URIs typically look like normal URLs,
    often with fragment identifiers to point at
    specific parts of a document
  • http//

Linking Statements
  • The subject of one statement can be the object of
  • Such collections of statements form a directed,
    labeled graph
  • Note that the object of a triple can also be a
    literal (a string)
  • Note also that RDF triples dont by themselves
    give meaning
  • You know that (1) Ian and Carol are most likely
    colleagues (barring multiple jobs for Uli (2)
    (Uli hasCollegue Ian) holds (colleagueness
    unlike love is symmetric). But DOES YOUR

RDF Syntax
  • RDF has an XML syntax that has a specific
  • Every Description element describes a resource
  • Every attribute or nested element inside a
    Description is a property of that Resource with
    an associated object resource
  • Resources are referred to using URIs
  • ltDescription about"some.uri/person/ian_horrocks"
  • lthasColleague resource"some.uri/person/uli_sa
  • lt/Descriptiongt
  • ltDescription about"some.uri/person/uli_sattler"gt
  • lthasHomePagegthttp//
  • lt/Descriptiongt
  • ltDescription about"some.uri/person/carole_goble"
  • lthasColleague resource"some.uri/person/uli_sa
  • lt/Descriptiongt

RDF Schema (RDFS)
  • RDF gives a formalism for meta data annotation,
    and a way to write it down in XML, but it does
    not give any special meaning to vocabulary such
    as subClassOf or type
  • Interpretation is an arbitrary binary relation
  • I.e., ltPerson,subClassOf,Animalgt has no special
  • RDF Schema defines schema vocabulary that
    supports definition of ontologies
  • gives extra meaning to particular RDF
    predicates and resources (such as subClasOf)
  • this extra meaning, or semantics, specifies how
    a term should be interpreted

Background Theory
RDF Schema is really RDF background knowledge!
RDF/RDFS vs. General Knowledge Rep Reasoning
  • We noted that RDF can be seen as base level
    facts and RDFS can be seen as background
  • At this level, inference with RDF/RDFS seems to
    be just a special case of Knowledge
    Representation Reasoning
  • This is good (CSE471 Ahoy!) and bad (reasoning
    over most non-trivial logics is NP-hard or much
    much worse).
  • RDF/RDFS can be seen as an attempt to limit the
    complexity of reasoning by limiting the
    expressiveness of what can be expressed
  • RDF/RDFS together can be seen as capturing a
    certain tractable subset of First Order Logic
  • ..already there is trouble in paradise with
    people complaining that the expressiveness is not
  • Enter OWL, which attempts to provide
    expressiveness equivalent to description logics
    (a sort of inheritance reasoning in First-order

Problems with RDFS
  • RDFS too weak to describe resources in sufficient
  • No localised range and domain constraints
  • Cant say that the range of hasChild is person
    when applied to persons and elephant when applied
    to elephants
  • No existence/cardinality constraints
  • Cant say that all instances of person have a
    mother that is also a person, or that persons
    have exactly 2 parents
  • No transitive, inverse or symmetrical properties
  • Cant say that isPartOf is a transitive property,
    that hasPart is the inverse of isPartOf or that
    touches is symmetrical
  • Difficult to provide reasoning support
  • No native reasoners for non-standard semantics
  • May be possible to reason via FO axiomatisation

RDFS Examples
  • RDF Schema terms (just a few examples)
  • Class
  • Property
  • type
  • subClassOf
  • range
  • domain
  • These terms are the RDF Schema building blocks
    (constructors) used to create vocabularies
  • ltPerson,type,Classgt
  • lthasColleague,type,Propertygt
  • ltProfessor,subClassOf,Persongt
  • ltCarole,type,Professorgt
  • lthasColleague,range,Persongt
  • lthasColleague,domain,Persongt

RDF/RDFS Liberality
  • No distinction between classes and instances
  • ltSpecies,type,Classgt
  • ltLion,type,Speciesgt
  • ltLeo,type,Liongt
  • Properties can themselves have properties
  • lthasDaughter,subPropertyOf,hasChildgt
  • lthasDaughter,type,familyPropertygt
  • No distinction between language constructors and
    ontology vocabulary, so constructors can be
    applied to themselves/each other
  • lttype,range,Classgt
  • ltProperty,type,Classgt
  • lttype,subPropertyOf,subClassOfgt

RDF Schema is now being superseded by OWL
Web Ontology Language Requirements
  • Desirable features identified for Web Ontology
  • Extends existing Web standards
  • Such as XML, RDF, RDFS
  • Easy to understand and use
  • Should be based on familiar KR idioms
  • Formally specified
  • Of adequate expressive power
  • Possible to provide automated reasoning support

From RDF to OWL
  • Two languages developed to satisfy above
  • OIL developed by group of (largely) European
    researchers (several from EU OntoKnowledge
  • DAML-ONT developed by group of (largely) US
    researchers (in DARPA DAML programme)
  • Efforts merged to produce DAMLOIL
  • Development was carried out by Joint EU/US
    Committee on Agent Markup Languages
  • Extends (DL subset of) RDF
  • DAMLOIL submitted to W3C as basis for
  • Web-Ontology (WebOnt) Working Group formed
  • WebOnt group developed OWL language based on
  • OWL language now a W3C Recommendation (i.e., a
    standard like HTML and XML)

OWL Language
  • Three species of OWL
  • OWL full is union of OWL syntax and RDF
  • OWL DL restricted to FOL fragment (¼ DAMLOIL)
  • OWL Lite is easier to implement subset of OWL
  • Semantic layering
  • OWL DL ¼ OWL full within DL fragment
  • DL semantics officially definitive
  • OWL DL based on SHIQ Description Logic
  • In fact it is equivalent to SHOIN(Dn) DL
  • OWL DL Benefits from many years of DL research
  • Well defined semantics
  • Formal properties well understood (complexity,
  • Known reasoning algorithms
  • Implemented systems (highly optimised)

Layer 4½ Mapping Between Ontologies
  • Taxonomy Crisis
  • How can your agent know that my title is your
  • How can my agent know that some of your address
    objects are post-boxes, not physical addresses?!
  • How can my agent know that many Asian first names
    correspond to Western surnames?
  • Semantic Web Solution Services for
    translating/mapping between related ontologies.
  • Suppose uses Dublin Core (title),
    while Fred Hanna uses its own document ontology
    (name). So far my agent is forced to choose
    a ontology, or must be carefully crafted to
    understand both lanuages
  • A better solution A niche now exists for a
    independent entity ( that
    maps title ? name etc

Nick wants tobuy War Peace
Nicksvery complicatedagent
Programmersbank account

Fred Hanna
Nick wants tobuy War Peace
Nicks agent
Joes agent

Janes Agent
Fred Hanna

Bank Account
(In)famous Layer Cake
? Semanticsreasoning
? Relational Data
? Data Exchange
  • Relationship between layers is not clear
  • OWL DL extends DL subset of RDF

Who will annotate the data?
  • Semantic web works if the users annotate their
    pages using some existing ontology (or their own
    ontology, but with mapping to other ontologies)
  • But users typically do not conform to standards..
  • and are not patient enough for delayed
  • Two Solutions
  • 1. Intercede in the way pages are created (act as
    if you are helping them write web-pages)
  • What if we change the MS Frontpage/Claris
    Homepage so that they (slyly) add annotations?
  • E.g. The Mangrove project at U. Wash.
  • Help user in tagging their data (allow graphical
  • Provide instant gratification by running services
    that use the tags.
  • 2. Collaborative tagging!
  • Folksonomies (look at Wikipedia article)
  • FLICKR, Technorati, etc
  • 3. Automated information extraction (next topic)

FolksonomiesThe good
  • Bottom-up approach to taxonomies/ontologies
  • In systems like Furl, Flickr and
    people classify their pictures/bookmarks/web
    pages with tags (e.g. wedding), and then the most
    popular tags float to the top (e.g. Flickr's tags
    or on the right)....
  • Folksonomies can work well for certain kinds of
    information because they offer a small reward for
    using one of the popular categories (such as your
    photo appearing on a popular page). People who
    enjoy the social aspects of the system will
    gravitate to popular categories while still
    having the freedom to keep their own lists of

Classic case of research playing catch-up with
practice -)
Works best when Many people Tag the same Info
Folksonomies the bad
  • On the other hand, not hard to see a few reasons
    why a folksonomy would be less than ideal in a
    lot of cases
  • None of the current implementations have synonym
    control (e.g. "selfportrait" and "me" are
    distinct Flickr tags, as are "mac" and
    "macintosh" on
  • Also, there's a certain lack of precision
    involved in using simple one-word tags--like
    which Lance are we talking about? (Though this is
    great for discovery, e.g. hot or Edmonton)
  • And, of course, there's no heirarchy and the
    content types (bookmarks, photos) are fairly
  • For indexing and library people, folksonomies are
    about as appealing as Wikipedia is to
    encyclopedia editors.
  • But.. there's some interesting stuff happening
    around them.

Computizing Eyeballs
(brain) cycle stealing
Collaborative Computing AKA Brain Cycle
StealingAKA Computizing Eyeballs
  • A lot of exciting research related to web
    currently involves co-opting the masses to help
    with large-scale tasks
  • It is like cycle stealingexcept we are
    stealing human brain cycles (the most idle of
    the computers if there is ever one -)
  • Remember the mice in the Hitch Hikers Guide to
    the Galaxy? (..who were running a mass-scale
    experiment on the humans to figure out the
  • Collaborative knowledge compilation (wikipedia!)
  • Collaborative Curation
  • Collaborative tagging
  • Many big open issues
  • How do you pose the problem such that it can be
    solved using collaborative computing?
  • How do you incentivize people into letting you
    steal their brain cycles?
