Semantic Basics: Markup, Querying, and Reasoning - PowerPoint PPT Presentation

1 / 98
About This Presentation
Title:

Semantic Basics: Markup, Querying, and Reasoning

Description:

Indiana University ... Imagine schemas for science applications and computing resources. ... Most of the work will go into defining the property, hasParameter. ... – PowerPoint PPT presentation

Number of Views:98
Avg rating:3.0/5.0
Slides: 99
Provided by: semant
Category:

less

Transcript and Presenter's Notes

Title: Semantic Basics: Markup, Querying, and Reasoning


1
Semantic Basics Markup, Querying, and Reasoning
  • Marlon Pierce
  • Community Grids Lab
  • Indiana University
  • With Slides and Help from Sean Bechhofer, Carole
    Goble, Line Pouchard, and Dave De Roure

2
Preface Beyond XML
3
Reductio ad Absurdum
  • Physics is the study of the harmonic
    oscillator.
  • H. L. Richards
  • Statistical Mechanics is the study of the Ising
    Model
  • H. L. Richards
  • Web Service standards are the study of ltxsdanygt
    sequences
  • M. E. Pierce, soon to be anonymous

4
Which Web Service Specs?
  • ltxselement name"Header" type"tnsHeader" /gt
  • ltxscomplexType name"Header"gt
  • ltxssequencegt
  • ltxsany namespace"any" processContents"lax"
    minOccurs"0" maxOccurs"unbounded" /gt
  • lt/xssequencegt
  •   ltxsanyAttribute namespace"other"
    processContents"lax" /gt
  •   lt/xscomplexTypegt
  • ltxsdcomplexType name"SecurityHeaderType"gt
  • ltxsdsequencegt
  • ltxsdany
  • processContents"lax"
  • minOccurs"0"
  • maxOccurs"unbounded"gt
  • lt/xsdanygt
  • lt/xsdsequencegt
  •   ltxsdanyAttribute
  • namespace"other"
  • processContents"lax" /gt
  • lt/xsdcomplexTypegt

5
Which, What, and Why?
  • Which is what?
  • Left is the definition of the SOAP header.
  • Right is taken from Web Service Secure Messaging
    Specification.
  • You will find this pattern repeated pretty often
    in web service specifications.
  • Why?
  • We have limited ways of linking several XML
    schema data models.
  • Imagine schemas for science applications and
    computing resources.
  • XML maps relationships to trees.
  • Link application and computer schemas with
    ltxsdanygt.
  • In my applicationcomputer schema, does
    application contain computer as child node, or
    vice versa?
  • Graphs are a more natural way of expressing many
    inter-relationships of concepts.

6
XML is not enough
The Creator of the Resource http//www.w3.org/Ho
me/Lassila is Ora Lassila
  • XML defines grammars to verify and structure
    documents
  • The grammar enforces constraints on tags
  • Different grammars define the same content
  • XML lacks a semantic model it only has a
    surface model which is a tree.

7
XML is not enough
  • Meaning of XML documents is intuitively clear
  • semantic markup tags are domain terms
  • But computers do not have intuition
  • Tag names per se do not provide semantics
  • The semantics are encoded outside the XML
    specification
  • XML makes no commitment on
  • Domain specific ontological vocabulary
  • Ontological modeling primitives
  • ? requires pre-arranged agreement on ? ?
  • Feasible for closed collaboration
  • agents in a small stable community
  • pages on a small stable intranet
  • Semantic Web Markups often are expressed in XML
    but they carry extra meaning.

8
Enter the Semantic Web/Grid
  • The Semantic Web is the representation of data
    on the World Wide Web. It is a collaborative
    effort led by W3C with participation from a large
    number of researchers and industrial partners. It
    is based on the Resource Description Framework
    (RDF), which integrates a variety of applications
    using XML for syntax and URIs for naming.

9
The Semantic Stack
10
Semantic Markups
  • All semantic markup languages should be
    understood as assertion languages.
  • We will assert that certain relationships between
    resources exist.
  • We will express this using RDF, RDFS, and OWL
    using XML
  • We must still provide tools for processing (and
    verifying) the assertions.

11
Resource Description Framework
  • Overview of RDF basic ideas and XML encoding.

12
Resource Description Framework (RDF)
  • RDF is the simplest of the semantic languages.
  • Basic Idea 1 Triples
  • RDF is based on a subject-verb-object statement
    structure.
  • RDF subjects are called resources (classes)
  • Verbs (predicates) are called properties.
  • Objects (values) may be simple literals or other
    resources.
  • Basic Idea 2 Everything is a resource that is
    named with a URI
  • RDF nouns, verbs, and objects are all labeled
    with URIs
  • Recall that a URI is just a name for a resource.
  • It may be a URL, but not necessarily.
  • A URI can name anything that can be described
  • Web pages, creators of web pages, organizations
    that the creator works for,.

13
RDF Graph Model
  • RDF is defined by a graph model.
  • Resources are denoted by ovals (nodes).
  • Lines (arcs) indicate properties.
  • Squares indicate string literals (no URI).
  • Resources and properties are labeled by a URI.

http//.../CMCS/Entries/X
http//purl.org/dc/elements/1.1/creator
http//.../CMCS/People/DrY
http//purl.org/dc/elements/1.1/title
H2O
14
Encoding RDF in XML
  • The graph represents two statements.
  • Entry X has a creator, Dr. Y.
  • Entry X has a title, H2O.
  • In RDF XML, we have the following tags
  • ltRDFgt lt/RDFgt denote the beginning and end of the
    RDF description.
  • ltDescriptiongts about attribute identifies the
    subject of the sentence.
  • ltDescriptiongtlt/Descriptiongt enclose the
    properties and their values.
  • We import Dublin Core conventional properties
    (creator, title) from outside RDF proper.

15
RDF XML The Gory Details
  • ltrdfRDF xmlnsrdf'http//www.w3.org/1999/02/22-r
    df-syntax-ns' xmlnsdc'http//purl.org/dc/eleme
    nts/1.0/'gt
  • ltrdfDescription rdfabout'http//.../Xgt
  • ltdccreator
  • rdfresource'http///people/MEP/gt
    ltdctitle rdfresource'H2O'/gt
    lt/rdfDescriptiongt
  • lt/rdfRDFgt

16
Encoding RDF as Triplets
  • In addition to graphs and XML, RDF may be written
    as triple sentences.
  • A triple is just the subject, predicate, and
    object (in that order) of a graph segment.
  • lthttp//.../CMCS/Entries/Xgthttp//purl.org/dc/elem
    ents/1.1/creatorlthttp//.../CMCS/People/DrYgt
  • This structure may look trivial but is useful in
    expressing queries (more later).

17
Creating RDF Documents
  • Writing RDF XML (or DAML or OWL) by hand is not
    easy.
  • Its a good way to learn to read/write, but after
    you understand it, automate it.
  • Authoring tools are available
  • OntoMat buggy
  • Protégé preferred by CGL grad students
  • IsaViz another nice tool with very good
    graphics.
  • You can also generate these programmatically
    using Hewlett Packard Labs Jena toolkit for
    Java.
  • This is what I did in previous example.

18
What is the Advantage?
  • So far, properties are just conventional URI
    names.
  • All semantic web properties are conventional
    assertions about relationships between resources.
  • RDFS and OWL will offer more precise property
    capabilities.
  • But there is a powerful feature we are about to
    explore
  • Properties provide a powerful way of linking
    different RDF resources
  • Nuggets of information.
  • For example, a publication is a resource that can
    be described by RDF
  • Author, publication date, URL are all metadata
    property values.
  • But publications have references that are just
    other publications
  • DCs hasReference can be used to point from one
    publication to another.
  • Publication also have authors
  • An author is more than a name
  • Also an RDF resource with collections of
    properties
  • Name, email, telephone number,

19
Graph Model Depicting vCard and DC Linking
dry_at_stateu.edu
http//.../CMCS/Entry/1
dccreator
vcardEMAIL
http//.../People/DrY
dctitle
H20
vcardN
vcardFamily
vcardGiven
20
What Else Does RDF Do?
  • Collections typically used as the object of an
    RDF statement
  • Bag unordered collection of resources or
    literals.
  • Sequence ordered collection or resources or
    literals.
  • Alternative collection of resources or literals,
    from which only one value may be chosen
  • And thats about it. RDF does not define
    properties, it just tells you where to put them.
  • Definitions are done by specific groups for
    specific fields (Dublin Core Metadata Initiative,
    for example).
  • RDF Schema provides the rules for defining
    specific resources classes and properties.
  • But the graph model has opened some doors
  • Linked querying across data models.
  • Reasoning about information

21
RDF Schema
22
RDF Schema
  • RDF Schema is a rules system for building RDF
    languages.
  • RDF and RDFS are defined in terms of RDFS
  • DAMLOIL and OWL are defined by RDFS.
  • Take our Dublin Core RDF encoding as an example
  • Can we formalize this process, defining a
    consistent set of rules?
  • Previous example was valid RDF but how do I
    formalize the process of writing sentences about
    creators of entries?
  • Can we place restrictions and use inheritance to
    define resources?
  • What really is the value of creator? Can I
    derive it from another class, like person?
  • Can we provide restrictions and rules for
    properties?
  • How can I express the fact that title should
    only appear once?
  • Current DC encoding in fact is defined by RDFS.

23
Some RDFS Classes (Subjects and Values)
24
Some RDFS Properties
25
Sample RDFS Defining ltPropertygt
ltrdfsClass rdfIDProperty"gt
ltrdfsisDefinedBy rdfresource"http//.../some/ur
i"/gt ltrdfslabelgtPropertylt/rdfslabelgt
ltrdfscommentgtThe class of RDF properties.lt/rdfsc
ommentgt ltrdfssubClassOf
rdfresource"http//.../Resourcegt
lt/rdfsClassgt
  • This is the definition of ltpropertygt, taken from
    the RDF schema.
  • The about attribute labels names this nugget.
  • ltpropertygt has several properties
  • ltlabelgt,ltcommentgt are self explanatory.
  • ltsubClassOfgt means ltpropertygt is a subclass of
    ltresourcegt
  • ltisDefinedBygt points to the human-readable
    documentation.

26
Property Relationships and Simple Reasoning
  • subClassOf
  • Carole is a member of the class ltProfessorgt
  • ltProfessorgt is a subclass of ltUniversityEmployeegt
  • So Carole works for a university.
  • subPropertityOf
  • Marlon hasSibling Susan
  • hasSibling is a subclass of hasRelative
  • So Marlon and Susan are related.
  • Domain and Range
  • hasSibling applies to animal subjects and animal
    objects, so Marlon is a member of the class
    ltAnimalgt.

27
Web Ontology Language(OWL)
  • Eeyore W-O-L. That spells owl.
  • Owl Bless my soul! So it does!
  • (Many Slides Courtesy of Sean Bechhofer)

28
Whats an Ontology?
  • English definitions tend to be vague to
    non-specialists
  • A formal, explicit specification of a shared
    conceptionalization
  • Clearer definition an ontology is a taxonomy
    combined with inference rules
  • T. Berners-Lee, J. Hendler, O. Lassila
  • But really, if you sit down to describe a subject
    in terms of its classes and their relationships,
    you are creating an Ontology.

29
RDFS Limitations
  • RDFS too weak to describe resources in sufficient
    detail
  • No localised range and domain constraints
  • Cant say that the range of hasChild is person
    when applied to persons and elephant when applied
    to elephants
  • No existence/cardinality constraints
  • Cant say that all instances of person have a
    mother that is also a person, or that persons
    have exactly 2 parents
  • No transitive, inverse or symmetrical properties
  • Cant say that isPartOf is a transitive property,
    that hasPart is the inverse of isPartOf or that
    touches is symmetrical
  • Difficult to provide reasoning support
  • No native reasoners for non-standard semantics
  • May be possible to reason via FO axiomatisation

30
OWL Semantic Layering
  • Three language layers
  • OWL Lite
  • A subset of OWL useful for expressing
    classifications and simple relationships
  • OWL DL (Description Logic)
  • Contains all OWL constructions but with
    limitations that guarantee computational
    completeness and decidability.
  • OWL Full
  • All OWL constructs with no restrictions but no
    guaranteed processibility.
  • Syntactic Layering
  • Semantic Layering
  • Layers should agree on semantics.
  • All legal Lite ontologies are legal DL
    ontologies.
  • All legal DL ontologies are legal Full ontologies

Full
DL
Lite
31
OWL Lite Synopsis
  • Built on RDFS, with usual RDFS classes (see
    previous table in these slides).
  • Includes a special class, ltThinggt, that is the
    superclass of all OWL classes.
  • Built in class ltNothinggt that is the most
    specific class (has no instances or subclasses).
  • Built-in class ltIndividualgt for instances of
    classes.
  • In OWL, properties may apply to either
    individuals or to all members of a class.
  • So ltworksForIUgt applies to Marlon but not Dave.
  • Expresses concepts such as equivalent classes,
    synonymous properties.
  • Allows you to assert that properties can be
    inverse, transitive, and symmetric.

32
Some OWL DL and OWL Full Extensions
  • Class Axioms
  • oneOf a class can be defined by its members (ex
    daysOfWeek defined by members)
  • An Enumeration class
  • disjointWith
  • More Boolean Relationships
  • unionOf, complementOf, intersectionOf
  • Unrestricted cardinality
  • Ex daysOfWeek as cardinality of 7

33
Differences Between DL and Full
  • Both DL and Full use the same OWL vocabulary
  • See previous slide.
  • Difference 1 DL classes and properties cannot
    also be individuals (instances), and vice versa.
  • That is, there is a strict separation between
    type and subClassOf.
  • So if you use ltMerlotgt as ltrdftypegt of ltWinegt,
    you cant subclass ltMerlotgt to add additional
    properties in OWL DL.
  • subClass versus instance decisions should be
    made based on the intended use of the ontology.
  • Dont make Merlot an instance if you are
    developing an ontology to describe your wine
    collection, which consists of many bottles of
    Merlot (instances), and you want to use OWL DL
  • Difference 2 All DL properties are required to
    be either
  • owlObjectProperty used to connect instances of
    two classes.
  • owlDataTypeProperty used to connect class
    instances with XML schema types and RDF literal
    strings.
  • (OWL Full allows us to tag DataTypeProperties as
    owlInverseFunctionalProperty, so we can create a
    string literal instance that uniquely identifies
    a class instance. )

34
An OWL Example
  • An Earth Systems Grid example
  • (Courtesy of Line Pouchard)

35
An Example Ontology Climate Data
  • The example shows how to construct a really
    simple ontology and instance.
  • We dont use it to encode all data but rather to
    encode metadata about data files.
  • Where is the data file (URI) that has the
    temperature associated with this dataset?
  • Two classes
  • dataset
  • Parameter
  • One property
  • hasParameter
  • Several parameters cloud_medium,
    bounds_latitude, temperature
  • Line Pouchard (ORNL) created this for ESG using
    Protégé and OilEd.

36
Lets Begin
  • Front matters OWL ontologies begin with the
    ltOntologygt header.
  • A useful place to put metadata about the
    document.
  • Line uses the Dublin Core to establish
    authorship.
  • Next, define two classes dataset and parameter.
  • Class definitions are almost trivial.
  • We really state what something is by its
    properties.
  • Deep philosophical arguments here, Im sure.
  • Most of the work will go into defining the
    property, hasParameter.
  • Begins on bottom of next slide
  • But the full extent of the definition requires a
    separate slide.

37
Ontology header With Dublin Core Parameters.
Class Definitions
hasParameter Definition
38
Defining hasParameter
  • hasParameter domain it applies to the dataset
    class.
  • hasParameter range it applies to a list of 3 OWL
    Things
  • Cloud_medium, bounds_latitude, and temperature.
  • This is done using the awkward RDF list
    structure.
  • Give me the first of the rest recursively until
    I get to nil
  • These three OWL Things are then defined.
  • They are each of type parameter
  • That is, members of the parameter class.
  • Each may also be further defined by additional
    properties and classes.
  • Temperature has units, for example,
    bounds_latitude needs starting and stopping
    values in decimal degrees,etc.
  • Or it may be out of scope. I may just need to
    know that the bounds_latitude for particular
    dataset is located in some resource with a
    specific URI.

39
Parameter Cloud_medium
Parameter Bounds_latitude
Parameter temperature
40
Finally, Apply It to Something
  • What is the file PCM.B06.10.dataset1?
  • Its a member of the dataset class, which we have
    defined.
  • What properties does it have?
  • bounds_latitude and cloud_medium, as all such
    members do.
  • Where can I get the bounds_latitude for this data
    set?
  • Its in the file indicated by the rdfresource.

41
OWL Enriched RDF Metadata about
PCM.B06.10.dataset1
42
Is It Lite, DL, or Full?
  • Our ontology example is (at least) DL because we
    include the oneOf property.

43
OWL Equivalence and Inheritance
  • ltowlClass rdfIDusergt
  • ltowlequivalentClass rdfresourcepersongt
  • ltowlClassgt
  • ltowlClass rdfaboutmagneticSpectrometergt
  • ltrdfssubClassOfgt
  • ltowlRestrictiongt
  • ltowlonProperty rdfresourcehasMagnetsgt
  • ltowlallValuesFrom rdfresourceSpectrometergt
  • lt/owlRestrictiongt
  • lt/rdfs subClassOfgt
  • lt/owlClassgt
  • Other logical relationships
  • that can be asserted
  • inverseOf,
  • TransitveProperty,
  • SymmetricProperty,
  • FunctionalProperty,
  • InverseFunctionalProperty

44
Illustration of Inverse Properties
45
Querying Semantic Data
  • The Data Access Working Group (DAWG)

46
What Is Semantic Querying?
  • Dont confuse querying with inference.
  • Querying just means retrieving data from Semantic
    data models.
  • Post a query to the world of distributed RDF data
    nuggets.
  • For RDF-like structures, this amounts to querying
    triples
  • Examples
  • Finding an Email address from a persons vCard.
  • Searching across subgraphs get me the email of
    the author of this document (Dublin Core
    vCard).
  • Persistent/scheduled queries on updates to
    several multimedia databases.

47
The DAWG Working Group
  • Unfortunately, there are no standards for
    querying RDF, etc.
  • There are solutions, like RDQL/SquishQL
  • These are just not official
  • The W3C Data Access Working Group DAWG is filling
    the query gap.
  • Formed Feb 2004.
  • This is a work in progress
  • Use Cases and Requirements http//www.w3.org/TR/r
    df-dawg-uc/
  • BRQL Query Language http//www.w3.org/2001/sw/Dat
    aAccess/rq23/

48
A Simple Query
  • Consider the following RDF triple
  • lthttp//example.org/book/book1gt
    lthttp//purl.org/dc/elements/1.1/titlegt "BRQL
    Tutorial
  • Recall this is equivalent to the sentence book1
    has title BRQL Tutorial
  • We may have a large set of such triples in our
    data store.
  • We want to make a query on this data like this
    What is the title of book1?

49
The Query and the Results
  • We can construct queries on any of the parts of
    the triple, such as
  • SELECT ?title
  • WHERE lthttp//example.org/book/book1gt
    lthttp//purl.org/dc/elements/1.1/titlegt ?title .
  • Thus just means what is the title of book1?
  • ?title "BRQL Tutorial

50
So What?
  • This was a trivial example in which we posed a
    query on the triples object, which was a string.
  • But the object of the triple may be a URI (an RDF
    resource), not just a literal.
  • Or we may construct queries against subjects or
    verbs of triples.
  • For complicated graphs, this means that the query
    returns a pointer to another section of the
    graph.
  • This means that we can make linked queries that
    allow us to navigate graphs.

51
Linked Queries Across Graph Sections
dry_at_stateu.edu
http//.../CMCS/Entry/1
dccreator
vcardEMAIL
http//.../People/DrY
dctitle
H20
vcardN
What is the given name of the creator of Entry 1?
vcardFamily
vcardGiven
52
What If You Cant Wait?
  • BRQL is still a work in progress.
  • If you need something now, there is Jenas RDQL.
  • RDQL allows you to pose triplet queries similar
    BRQL
  • Jena has a programming interface that allows you
    to construct and execute these queries against
    RDF.

53
Tools for Playing with Things
  • Jena Toolkit Java packages from HPLabs for
    building Semantic Web applications.
  • http//www.hpl.hp.com/semweb/
  • Both IsaViz and Protégé use this.
  • IsaViz A nice authoring/graphing tool
  • http//www.w3.org/2001/11/IsaViz/
  • Protégé Another ontology authoring tool
  • http//protege.stanford.edu/
  • SiRPAC
  • Allows you to parse RDF, convert RDF/XML into
    graphs and triplets.
  • http//www.w3.org/RDF/Validator/

54
Other Tutorials
  • Original Semantic Grid GGF tutorial material is
    here
  • http//www.semanticgrid.org/presentations/ontologi
    es-tutorial/
  • Beginner and Advanced OWL tutorials are here
  • http//www.co-ode.org/resources/
  • Lectures cover working examples (pizza ontology)
    built with Protégé.
  • http//www.semanticgrid.org/presentations/ontologi
    es-tutorial/

55
Advanced OWL Tutorial
  • Courtesy of Sean Bechhofer

56
OWL Syntaxes
  • Abstract Syntax
  • Used in the definition of the language and the
    DL/Lite semantics
  • OWL as RDF triples (and thus as, e.g. RDF/XML or
    N3)
  • the official concrete syntax
  • mapping rules describe how to translate from
    abstract syntax to triples.
  • XML Presentation Syntax
  • XML Schema definition

57
OWL Ontologies
  • An OWL ontology consists of a number of Classes,
    Properties and Individuals
  • All identified via URIs.
  • Classes
  • Have definitions providing their
    characteristics
  • Properties
  • Characteristics such as transitivity or
    functionality
  • Domains and Ranges
  • Individuals
  • Class membership
  • Relationships to other individuals
  • Concrete values.

58
XML Datatypes in OWL
  • OWL supports XML Schema primitive datatypes
  • Clean separation between object classes and
    datatypes
  • Philosophical reasons
  • Datatypes structured by built-in predicates
  • Not appropriate to form new datatypes using
    ontology language
  • Practical reasons
  • Ontology language remains simple and compact
  • Implementability not compromised can use hybrid
    reasoner

59
OWL Class constructors
  • OWL has a number of operators for constructing
    class expressions.
  • Boolean operators
  • and, or, not
  • Restrictions
  • slot fillers with explicit quantification
  • Enumerated Classes.
  • explicit enumerations of the class members

60
OWL Class Constructors
61
OWL Class constructors
  • The operators have an associated semantics
  • Given in terms of a domain
  • D
  • and an interpretation function I
  • Iconcepts ! Ã(D)
  • Iproperties ! Ã(D D)
  • Iindividuals ! D
  • I is then extended to concept expressions.

62
OWL Constructor Semantics
63
OWL Constructor Semantics
64
OWL Axioms
  • Axioms allow us to add further statements about
    arbitrary concept expressions and properties
  • Disjointness, equivalence, transitivity of
    properties etc.
  • An interpretation is then a model of the axioms
    iff it satisfies every axiom in the ontology.

65
Basic Inference Tasks
  • Inference can now be defined w.r.t.
    interpretations/models.
  • C subsumes D w.r.t. K iff for every model I of K,
    I(D) µ I(C)
  • C is equivalent to D w.r.t. K iff for every model
    I of K, I (C) I (D)
  • C is satisfiable w.r.t. K iff there exists some
    model I of K s.t. I (C) ?
  • Querying knowledge
  • x is an instance of C w.r.t. K iff for every
    model I of K, I(x) 2 I(C)
  • hx,yi is an instance of R w.r.t. K iff for, every
    model I of K, (I(x),I(y)) 2 I(R)

66
Why Reasoning?
  • Why do we want it?
  • Semantic Web aims at machine understanding
  • Understanding closely related to reasoning
  • Given key role of ontologies in the Semantic Web,
    it will be essential to provide tools and
    services to help users
  • Design and maintain high quality ontologies,
    e.g.
  • Meaningful all named classes can have instances
  • Correct captured intuitions of domain experts
  • Minimally redundant no unintended synonyms
  • Richly axiomatised (sufficiently) detailed
    descriptions
  • Answer queries over ontology classes and
    instances, e.g.
  • Find more general/specific classes
  • Retrieve annotations/pages matching a given
    description
  • Integrate and align multiple ontologies

67
Why Decidable Reasoning?
  • OWL DL constructors/axioms restricted so
    reasoning is decidable
  • Consistent with Semantic Web's layered
    architecture
  • XML provides syntax transport layer
  • RDF(S) provides basic relational language and
    simple ontological primitives
  • OWL DL provides powerful but still decidable
    ontology language
  • Further layers may (will) extend OWL
  • Will almost certainly be undecidable
  • Facilitates provision of reasoning services
  • Known practical algorithms
  • Several implemented systems
  • Evidence of empirical tractability
  • Understanding dependent on reliable consistent
    reasoning

68
Other Links
69
XML Primer
  • General characteristics of XML

70
Basic XML
  • XML consists of human readable tags
  • Schemas define rules for a particular dialect.
  • XML Schema is the root, defines the rules for
    making other XML schemas.
  • Tree structure tags must be closed in reverse
    order that they are opened.
  • Tags can be modified by attributes
  • name, minOccurs
  • Tags enclose either strings or structured XML
  • ltcomplexType name"FaultType"gt
  • ltsequencegt
  •   ltelement name"FaultName"
  • type"xsdstring" /gt
  •   ltelement name"MapView/gt
  •   ltelement name"CartView/gt
  •   ltelement name"MaterialProps"
    minOccurs"0" /gt
  • ltchoicegt
  •   ltelement name"Slip" /gt 
  • ltelement name"Rate" /gt
  •   lt/choicegt
  •   lt/sequencegt
  •  lt/complexTypegt

71
Namespaces and URIs
  • XML documents can be composed of several
    different schemas.
  • Namespaces are used to identify the source schema
    for a particular tag.
  • Resolves name conflictsfull path
  • Values of namespaces are URIs.
  • URI are just structured names.
  • May point to something not electronically
    retrievable
  • URLs are special cases.
  • ltxsdschema xmlnsxsd"http//www.w3.org/2001/XML
    Schema" xmlnsgem"http//commgrids.indiana.edu/GC
    WS/Schema/GEMCodes/Faultsgt
  • ltxsdannotationgt
  •   lt/xsdannotationgt
  • ltgemfaultgt
  • lt/gemfaultgt
  • lt/xsdschemagt

72
Metadata and the Dublin Core
  • Define metadata and describe its use in physical
    and computer science.

73
What is Metadata?
  • Common definition data about data
  • Traditional Examples
  • Prescriptions of database structure and contents.
  • File names and permissions in a file system.
  • HDF5 metadata describes scientific/numerical
    data set characteristics such as array sizes,
    data formats, etc.
  • Metadata may be queried to learn the
    characteristics of the data it describes.
  • Traditional metadata systems are functionally
    tightly coupled to the data they describe.
  • Prescriptive, needed to interact directly with
    data.

74
Descriptive Metadata and the Web
  • Traditional metadata concepts must be extended as
    systems become more distributed, information
    becomes broader
  • Tight functional integration not as important
  • Metadata used for information, becomes
    descriptive.
  • Metadata may need to describe resources, not just
    data.
  • Everything is a resource
  • People, computers, software, conference
    presentations, conferences, activities, projects.
  • Well next look at several examples that use
    metadata, featuring
  • Dublin Core digital libraries
  • CMCS chemistry

75
The Dublin Core Metadata for Digital Libraries
  • The Dublin Core is a set of simple name/value
    properties that can describe online resources.
  • Usually Web content but generally usable (CMCS)
  • Intended to help classify and search online
    resources.
  • DC elements may be either embedded in the data or
    in a separate repository.
  • Initial set defined by 1995 Dublin, Ohio meeting.

76
Thought Experiment Construct Your Own Metadata
Set
  • Describe yourself your occupation, your
    interests, your place of residence, your parents,
    spouse, children,.
  • Take each sentence
  • The verbs become properties
  • The verbs objects are property values.
  • Metadata is just a collection of these name/value
    pairs.
  • For particular fields (like publishing), we can
    define a conventional set of property names.

77
The Dublin Core Metadata for Digital Libraries
  • The Dublin Core is a set of simple name/value
    properties that can describe online resources.
  • Usually Web content but generally usable (CMCS)
  • Intended to help classify and search online
    library resources.
  • Digital library card catalog.
  • DC elements may be either embedded in the data or
    in a separate repository.
  • Initial set defined by 1995 Dublin, Ohio meeting.

78
Dublin Core Elements
  • Content elements
  • Subject, title, description, type, relation,
    source, coverage.
  • Intellectual property elements
  • Contributor, creator, publisher, rights
  • Instantiation elements
  • Date, format, identifier, language
  • In RDF, these are called properties.

79
Encoding the Dublin Core
  • DC elements are independent of the encoding
    syntax.
  • Rules exist to map the DC into
  • HTML
  • RDF/XML
  • We provide more detailed info on RDF/XML encoding
    in this seminar.

80
Sample RDF/HTML
  • ltheadgt
  • lttitlegtExpressing Dublin Core in HTML/XHTML meta
    and link elementslt/titlegt
  • ltmeta name"DC.title" content"Expressing Dublin
    Core in HTML/XHTML meta and link elements" /gt
  • ltmeta name"DC.creator" content"Andy Powell,
    UKOLN, University of Bath" /gt
  • ltmeta name"DC.type" content"Text" /gt
  • lt/headgt

81
Where Do I Put the Dublin Core Metadata?
  • Dublin core elements may be placed directly in
    HTML pages.
  • Still need DC-aware crawlers or applications to
    find and use them.
  • Or you may have a large database on DC entries
    that are used by DC-aware applications.
  • Well examine a WebDAV-based scheme for chemistry
    in a second.

82
Dublin Core Element Refinements
  • Many of these, and extensible
  • See http//dublincore.org/documents/dcmi-terms/
    for the comprehensive list of elements and
    refinements
  • Examples
  • isVersionOf, hasVersion, isReplacedBy,
    references, isReferencedBy.

83
OWL DL
  • Use of OWL vocabulary restricted
  • Cant be used to do nasty things (i.e., modify
    OWL)
  • No classes as instances
  • Standard DL/FOL model theory (definitive)
  • Direct correspondence with (first order) logic
  • Reasoning via DL engines
  • Some problems with oneOf/inverse
  • Reasoning for full language via FOL engines
  • Would need built in datatypes for performance

DL
84
OWL Full
  • No restriction on use of OWL vocabulary (as long
    as legal RDF)
  • Classes as instances
  • Assertions about vocabulary
  • RDF style model theory
  • Reasoning using FOL engines
  • via axiomatisation
  • Semantics should correspond with OWL DL for
    suitably restricted KBs

Full
85
XML for Knowledge Representation
  • Definition of self-describing data in worldwide
    standardized, non-proprietary format.
  • Structured data and knowledge exchange for
    enterprises in various industries.
  • Integration of information from different sources
    to uniform documents.
  • Exchange of knowledge bases between different AI
    languages, knowledge bases and databases,
    application systems, etc.
  • But.

86
History From RDF to OWL
  • Two languages developed by extending (part of)
    RDF
  • OIL developed by group of (largely) European
    researchers
  • DAML-ONT developed by group of (largely) US
    researchers (in DARPA DAML programme)
  • Efforts merged to produce DAMLOIL
  • Development was carried out by Joint EU/US
    Committee on Agent Markup Languages
  • Extends (subset of) RDF
  • DAMLOIL submitted to W3C as basis for
    standardisation
  • Web-Ontology (WebOnt) Working Group formed
  • WebOnt group developed OWL language based on
    DAMLOIL
  • OWL language now a W3C Recommendation (Feb 2004)

87
RDFS Takeaway
  • RDFS defines a set of classes and properties that
    can be used to define new RDF-like languages.
  • RDFS actually bootstraps itself.
  • You can express inheritance, restriction
  • If you want to learn more, see the specification
  • http//www.w3.org/TR/2003/WD-rdf-schema-20030123/
  • But dont trust the write up
  • Concepts are best understood by looking at the
    RDF XML. English descriptions get convoluted.
  • If you want to see RDFS in action, see the DC
  • http//dublincore.org/2003/03/24/dces

88
Web Ontology Language Requirements
  • Desirable features identified for Web Ontology
    Language
  • Extends existing Web standards
  • Such as XML, RDF, RDFS
  • Easy to understand and use
  • Should be based on familiar KR idioms
  • Of adequate expressive power
  • Formally specified
  • Possible to provide automated reasoning support

89
Short History of Description Logics
  • Phase 1
  • Incomplete systems (Back, Classic, Loom, . . . )
  • Based on structural algorithms
  • Phase 2
  • Development of tableau algorithms and complexity
    results
  • Tableau-based systems for Pspace logics (e.g.,
    Kris, Crack)
  • Investigation of optimisation techniques
  • Phase 3
  • Tableau algorithms for very expressive DLs
  • Highly optimised tableau systems for ExpTime
    logics (e.g., FaCT, DLP, Racer)
  • Relationship to modal logic and decidable
    fragments of FOL

90
Latest Developments
  • Phase 4
  • Mature implementations
  • Mainstream applications and Tools
  • Databases
  • Consistency of conceptual schemata (EER, UML
    etc.)
  • Schema integration
  • Query subsumption (w.r.t. a conceptual schema)
  • Ontologies and Semantic Web (and Grid)
  • Ontology engineering (design, maintenance,
    integration)
  • Reasoning with ontology-based markup (meta-data)
  • Service description and discovery
  • Commercial implementations
  • Cerebra system from Network Inference Ltd

91
What Does This Have to Do with Grid Computing?
  • RDF resources arent just web pages
  • Can be computer codes, simulation and
    experimental data, hardware, research groups,
    algorithms, .
  • Consider the CMCS chemistry example that they
    needed to describe the provenance, annotation,
    and curation of chemistry data.
  • Compound Xs properties were calculated by Dr. Y.
  • CMCS maps all of their metadata to the Dublin
    Core.
  • The Dublin Core is encoded quite nicely as RDF.

92
vCard Representing People with RDF Properties
  • The Dublin Core tags are best used to represent
    metadata about published content
  • Documents, published data
  • vCards are an IETF standard for representing
    people
  • Typical properties include name, email,
    organization membership, mailing address, title,
    etc.
  • See http//www.ietf.org/rfc/rfc2426.txt
  • Like the DC, vCards are independent of (and
    predate) RDF but are map naturally into RDF.
  • Each of these maps naturally to an RDF property
  • See http//www.w3.org/TR/2001/NOTE-vcard-rdf-20010
    222/

93
Example A vCard in RDF/XML
ltrdfRDF xmlnsrdf'http//www.w3.org/1999/02/
22-rdf-syntax-ns' xmlnsvcard'http//www.w3.
org/2001/vcard-rdf/3.0'gt ltrdfDescription
rdfabout'http//cgl.indiana.edu/people/GCF'
vcardEMAIL'gcf_at_indiana.edu'gt
ltvcardFNgtGeoffrey Foxlt/vcardFNgt
ltvcardN vcardGiven'Geoffrey'
vcardFamily'Fox'/gt
lt/rdfDescriptiongtlt/rdfRDFgt
94
Linking vCard and Dublin Core Resources
  • The real power of RDF is that you can link two
    independently specified resources through the use
    of properties.
  • We do this using URIs as universal pointers
  • Identify specific resources (nouns) and
    specifications for properties (verbs)
  • The URIs may optionally be URLs that can be used
    to fetch the information.
  • Linking these resource nuggets allows us to pose
    queries like
  • What is the email address of the creator of this
    entry in the chemical database?
  • What other entries reference directly or
    indirectly on this data entry?
  • Linkages can be made at any time
  • Dont have to be designed into the system

95
A Simple Jena RDQL Example
  • Model modelnew ModelMem()
  • Model.read(new FileReader(a.rdf))
  • String queryString "SELECT ?x, ?fname WHERE
    (?x,lthttp//www.w3.org/2001/vcard-rdf/3.0EMAILgt,
    ?fname)"
  • Query querynewQuery(queryString)
  • query.setSource(model)
  • QueryExecution qenew QueryEngine(query)
  • QueryResults resultsqe.exec()

96
Building Semantic Markup Languages
  • XML essentially defines syntax rules for markup
    languages.
  • Human readable means humans provide meaning
  • We also would like some limited ability to encode
    meaning directly within markup languages.
  • The semantic markup languages attempt to do that,
    with increasing sophistication.
  • Stack indicates direct dependencies OWL is
    defined in terms of RDF, RDFS.

Eric Miller, http//www.w3.org/2002/Talks/www2002-
w3ct-swintro-em/
97
Other Semantic Markup Languages
  • RDF Schema (RDFS)
  • Provides formal definitions of RDF
  • Also provides language tools for writing more
    specialized languages.
  • Well examine in more detail.
  • DARPA Agent Markup Language (DAML)
  • DAML-OIL is the language component of the DAML
    project.
  • Defined using RDF/RDFS.
  • Web-Ontology Language (OWL)
  • Developed by the W3Cs Web-Ontology Working Group
  • Based on/replaces DAML-OIL

98
What Are Description Logics?
  • A family of logic based Knowledge Representation
    formalisms
  • Descendants of semantic networks and KL-ONE
  • Describe domain in terms of concepts (classes),
    roles (relationships) and individuals
  • Distinguished by
  • Formal semantics (typically model theoretic)
  • Decidable fragments of FOL
  • Closely related to Propositional Modal Dynamic
    Logics
  • Provision of inference services
  • Sound and complete decision procedures for key
    problems
  • Implemented systems (highly optimised)
Write a Comment
User Comments (0)
About PowerShow.com