COMPSCI 732: Semantic Web Technologies - PowerPoint PPT Presentation

About This Presentation
Title:

COMPSCI 732: Semantic Web Technologies

Description:

COMPSCI 732: Semantic Web Technologies Semantic Web Architecture In the next s, there s the discussion on the parts of the stack. Examples refer to this ... – PowerPoint PPT presentation

Number of Views:311
Avg rating:3.0/5.0
Slides: 91
Provided by: csAuckla9
Category:

less

Transcript and Presenter's Notes

Title: COMPSCI 732: Semantic Web Technologies


1
COMPSCI 732Semantic Web Technologies
  • Semantic Web Architecture

2
Where are we?
Title
1 Introduction
2 Semantic Web Architecture
3 Resource Description Framework (RDF)
4 Web of Data
5 Generating Semantic Annotations
6 Storage and Querying
7 Web Ontology Language (OWL)
8 Rule Interchange Format (RIF)
3
Overview
  • Introduction and motivation
  • Technical solutions
  • Semantic Web architecture
  • Uniform Resource Identifier
  • eXtensible Markup Language (XML)
  • XML Schema
  • Namespaces
  • Extensions
  • Illustration by a large example
  • Summary
  • References

4
INTRODUCTION AND MOTIVATION
5
A Semantic Web Scenario From Today
  • Queries
  • Which type of music is played by UK radio
    stations?
  • Which UK radio station is playing titles by
    Swedish composers?
  • Information to answer query is available on the
    Web
  • Web search engines analyze Web content one page
    at a time
  • The Semantic Web provides better framework to
    answer such queries
  • combines data
  • distributed across different sources, and
  • described in machine-interpretable manner

6
Steps in Answering Queries
  • Playlists of BBC radio shows published online in
    Semantic Web formats
  • Music groups such as ABBA have an
    identifierhttp//www.bbc.co.uk/music/artists/d87
    e52c5-bb8d-4da8-b941-9f4928627dc8artist
  • Identifier can relate music group to information
    at Musicbrainz
  • Music community portal exposing data on Semantic
    Web
  • http//musicbrainz.org
  • Knows about band members (e.g. Benny Andersson)
  • Aligns its information with Wikipedia
  • Information on UK radio stations may be found in
    lists on Web pages
  • Can be translated into similar Semantic Web
    representation

7
Describing Things and Their Relationships
  • Meaning of Relationships, e.g., band memberships
    explained online, too
  • Using collections of Ontologies available on the
    Web
  • Dublin Core (general properties of information
    resources)
  • http//dublincore.org/
  • SKOS (covering taxonomic descriptions)
  • http//www.w3.org/2004/02/skos/
  • Specialized ontologies (covering the music
    domain)
  • Data at the BBC currently use at least nine
    different ontologies
  • http//www.bbc.co.uk/ontologies/programmes
  • Availability of data in these formats enables
    queries to be answered
  • Based on a query language

8
Towards the Required Infrastructure
  • What infrastructure is required to implement the
    scenario from before?
  • Generic software components, Languages, Protocols
  • Their seamless interaction to satisfy requests
  • Purpose of Lecture
  • Investigate Semantic Web Architecture
  • Analyze requirements from technical need to
    identify and relate data
  • Analyze organizational needs to maintain Semantic
    Web as a whole

9
Web Architecture
  • The Semantic Web is an evolution of the Web
  • Important for the fast growth and adoption of the
    Web are
  • Many people can set up Web servers easily and
    independently from each other
  • More people can create documents, put them
    online, and link them to each other
  • Even more people can browse and access any Web
    server to retrieve documents
  • Web architecture allows graceful degradation of
    user experience when
  • Network is partially slow (World Wide Wait),
    while other parts still operate at full speed
  • Single Web servers break, because others still
    work
  • Hyperlinks are broken, because other links still
    lead somewhere
  • Separation of concerns justifies less quality
    outputs
  • Users can easily create and access documents
  • Distributed nature of system, without need of
    central coordinator, results in robustness

10
Web Architecture Principles
  • Explicit simple data representation
  • Common data representation hides underlying
    technologies (e.g. HTML)
  • Distributed system
  • Data sources without centralized instance
    controlling who owns what type of info
  • Distributed ownership and control can facilitate
    adoption and scalability
  • E.g. Web pages are under full control of their
    producers
  • Cross-referencing
  • Reuse of existing data and data definitions from
    different authorities (e.g. hyperlinks)
  • Loose coupling achieved by common language layers
  • Communication in standardized languages
  • These must be easy to customize
  • Overall communication must not be jeopardized by
    such specialization
  • E.g. Coupling of Web clients/servers HTTP for
    transport, HTML for Web content
  • Ease of publishing and consumption
  • Easy publishing and consumption of simple data
  • Comprehensive publishing and consumption of
    complex data, e.g.HTML simple to convey textual
    info powerful browsers/content management systems

11
Semantic Web Requirements and Examples
  • Must be able to represent entities and their
    relationships (1)
  • A person, the birthday of a person, the name of a
    person (Benny Andersson)
  • Must be serializable in standardized manner to
    easily exchange data between different computing
    nodes (1,2,4)
  • Ease of joining information from MusicBrainz,
    BBC, DBPedia
  • Entities must be referable across borders of
    ownership or computing systems to allow for
    cross-linking of data (1,2,3,4)
  • ABBAs Benny Andersson becomes hard to
    distinguish from other Benny Anderssons
  • Expressive, machine-understandable data
    description language (1,4,5)
  • Manual inspection not scalable refinements of
    basic model impossible
  • BBC Data involves radio stations, shows, their
    versions, songs and their artists
  • A query and manipulation language to select and
    aggregate data (5)
  • The number of Swedish composers being broadcast
    on a specific program
  • Reasoning desirable to facilitate querying (5)
  • Direct relationship between a program and a song
    using inference
  • Transport of data and query and their results by
    agreed-upon protocols (HTTP)
  • May involve encrypted data requests and
    transports (HTTPs) signature of data items to
    ensure authenticity of user requests and control
    access to resources

12
Additional Requirements
  • Core requirements not yet included in language
    architecture
  • Versatile means for user interaction
  • Broad accessibility requires viewing, searching,
    browsing, querying of data
  • While at the same time abstracting from
    intricacies underlying their distributed origin
  • On-the-fly data integration of multiple data
    sources assemble information from multitude of
    sources without a priori knowledge about domain
    or structure of data
  • Facilitation of data production and publishing
    metadata creation and migration of data must be
    made convenient, independent from origin of data
  • Provenance and Trust
  • Authorship and ownership get lost during data
    processing and aggregation
  • Origin, Reliability, Trustworthiness must be
    rethought to apply them for individual and
    aggregated data items, to establish faithful
    authentication at Semantic Web scale
  • Alignment of unconnected sets of data
  • Interlinking implies capability to suggest
    alignments between identifiers or concepts from
    different sets of data, beyond mere use of
    identifiers such as URI/IRIs
  • Such alignment may be necessary to enable a real
    Web of Data

13
Semantic Web Architecture
  • Formalized components and their relationships
  • What technologies make up the Semantic Web
  • What are the dependencies between components
  • Roadmap for steps of developing the Semantic Web

14
TECHNICAL SOLUTION
  • The Semantic Web architecture and its foundations

15
Search and Query the Web I
  • The Web is a constantly growing network of
    distributed resources
  • More than 1 trillion unique URLs
  • More than 100 billion pages
  • More than 200 million web sites
  • Check most updated data on http//news.netcraft.c
    om/archives/web_server_survey.html
  • User needs to be able to efficiently search
    resources/content over the Web
  • When I Google Milan do I find info on the city
    or the soccer team?
  • User needs to be able to perform query over
    largely distributed resources
  • When is the next performance of the rock band
    U2, where it will be located, what are the best
    ways to reach the location, what are the
    attractions nearby

16
Search and Query the Web II
  • On2Broker is the evolution of Ontobroker, a
    systems that aims at providing a solution to the
    problems discussed in the previous slides by
    adopting Semantic Technologies
  • On2Broker is a system that processes distributed
    information sources and that provides intelligent
    information retrieval, query answering
  • On2Broker relies on components of the Semantic
    Web Architecture
  • D. Fensel, S. Decker, M. Erdmann, R. Studer
    Ontobroker in a Nutshell. ECDL 1998 663-664

17
On2Broker Architecture
18
On2Broker Components I
  • Query Interface
  • Provides a structured input that enables users to
    define their queries without any knowledge of the
    query language
  • Input queries are then transformed to the query
    language (e.g. SparQL)
  • Repository
  • Decouples query answering, information retrieval
    and reasoning
  • Provide support for materialization of inferred
    knowledge

19
On2Broker Components II
  • Crawlers and Wrappers (or Info Agent)
  • Extract knowledge from different distributed and
    heterogeneous data sources
  • RDFa pages and RDF repositories can be included
    directly
  • HTML and XML data sources require processing by
    wrappers to derive RDF data
  • Inference Engine
  • Relies on knowledge imported from the crawlers
    and axioms contained in the repository to support
    query answers
  • Adopts Horn logic and closed world assumption

20
On2Broker Example
  1. Tim Berners-Lee knows Christian Bizer and Tom
    Heath

1. Whom does Tim Berners-Lee know?
2. SELECT DISTINCT ?s ?o WHERE ?s foafknows
?o .
  1. Extract RDF from http//www.w3.org/People/Berners
    -Lee/dblp
  1. Extract RDF from fensel.comdblp
  • Extends KBif x dblpcoauthor y then x
    foafknows y
  • if y foafknows x then x foafknows y

21
SemWeb Architecture Requirements
  • Extensibility
  • Each layer should extend the previous one(s)
  • Support for data interchange
  • Using data from one source in other applications
  • Support for ontology description with different
    complexity
  • Including rules
  • Support for data query
  • Support for data provenance and trust evaluation

see the Semantic Web Roadmap http//www.w3.org/De
signIssues/Semantic.html
22
Semantic Web Stack
Rules RIF
Adapted from http//en.wikipedia.org/wiki/Semantic
_Web_Stack
23
UNICODE, URI and XML
  • UNICODE is the standard international character
    set
  • E.g. used to encode the data in the repository
  • Uniform Resource Identifiers (URIs) identify
    things and concepts
  • E.g. used to identify resources on the Web and in
    the repository
  • Be aware to distinguish between information and
    non-information resources
  • http//www.bbc.co.uk/music/artists/d87e52c5-bb8d-
    4da8-b941-9f4928627dc8artist vs.
    http//dbpedia.org/resource/ABBA
  • Data publishers on the Semantic Web use Linked
    data principles
  • Use URIs as names for things
  • Use HTTP URIs so that people can look up those
    names
  • When someone looks up a URI, provide useful
    information, using standards (RDF,SPARQL)
  • Include links to other URIs, so that they can
    discover more things.
  • eXtensible Markup Language (XML) used for data
    exchange
  • Used on the Semantic Web to exchange the
    description of resources
  • E.g. format that can be transformed into RDF and
    imported into the repository

24
RDF, RDFS and OWL
  • Resource Description Framework (RDF)
  • is the HTML of the Semantic Web
  • Simple way to describe resources on the Web
  • Based on triples ltsubject, predicate, objectgt
  • Various serializations, including one based on
    XML
  • A simple ontology language (RDFS)
  • E.g. language used to store the data in the
    repository
  • More in lecture 3
  • Web Ontology Language (OWL)
  • Is a more complex ontology language than RDFS
  • Layered language based on Description Logics
  • Overcomes some RDF(S) limitations
  • E.g. ontology language used to define the schemas
    used in repository
  • More in lecture 7

25
RDF Graph Encoding a Description of ABBA
26
RDF Serialized in RDF/XML
  • lt?xml version1.0gt
  • lt!DOCTYPE rdfRDF
  • lt!ENTITY bbca http//www.bbc.co.uk/music/artists/
    gt
  • lt!ENTITY bbci http//www.bbc.co.uk/music/images/a
    rtists/gt
  • lt!ENTITY mba http//musicbrainz.org/artist/gtgt
  • ltrdfRDF
  • xmlnsrdfhttp//www.w3.org/1999/02/22-rdf-synta
    x-ns
  • xmlnsowlhttp//www.w3.org/2002/07/owl
  • xmlnsfoafhttp//xmlns.com/foaf/0.1/
  • xmlnsmohttp//purl.org/ontology/mo/gt
  • ltmoMusicArtist rdfabouthttp//www.bbc.co.uk/m
    usic/artists/d87e52c5-bb8d-4da8-b941-9f4928627dc8
    artistgt
  • ltrdftype rdfresourcehttp//purl.org/ontology/
    mo/MusicGroup/gt
  • ltfoafnamegtABBAlt/foafnamegt
  • ltfoafhomepage rdfresourcehttp//www.abbasite.
    com//gt
  • ltmoimage rdfresourcebbci542x305/d87e52c5-bb
    8d-4da8-b941-9f4928627dc8.jpggt
  • ltmomember rdfresourcebbca042c35d3-0756-4804
    -b2c2-be57a683efa2artistgt
  • ltmomember rdfresourcebbca2f031686-3f01-4f33
    -a4fc-fb3944532efaartistgt

27
RDF Serialized in Turtle
  • _at_prefix rdf lthttp//www.w3.org/1999/02/22-rdf-sy
    ntax-nsgt .
  • _at_prefix owl lthttp//www.w3.org/2002/07/owlgt .
  • _at_prefix foaf lthttp//xmlns.com/foaf/0.1/gt .
  • _at_prefix mo lthttp//purl.org/ontology/mo/gt .
  • lthttp//www.bbc.co.uk/music/artists/d87e52c5-bb8d
    -4da8-b941-9f4928627dc8artistgt
  • rdftype moMusicArtist, moMusicGroup
  • foafname ABBA
  • foafhomepage lthttp//www.abbasite.com/gt

28
RDFS and OWL Example
  • Reasoning example in RDFS
  • rdfssubClassOf can model class hierarchies
  • moMusicGroup and moMusicArtist specify two
    classes
  • Axiom ltmoMusicGroup, rdfssubClassOf,
    moMusicArtistgt
  • Stating that ABBA is an instance of type
    MusicGroup enables reasoners to conclude that
    ABBA is also an instance of type MusicArtist
  • When query asks for all MusicArtists, then ABBA
    will be contained in query result, even though
    there is no explicit assertion of this
  • Reasoning example in OWL
  • owlsameAs can be used to specify that two
    resources are identical
  • To consolidate information about ABBA from
    multiple sources we can specify
    thathttp//www.bbc.co.uk/music/artists/d87e52c5
    -bb8d-4da8-b941-9f4928627dc8artist and
    http//dbpedia.org/resource/ABBA are the same

29
SPARQL and Rule Languages
  • SPARQL
  • Query language for RDF triples
  • A protocol for querying RDF data over the Web
  • E.g. language used to query the repository from
    the user interface
  • Can also be used for Updates
  • More in lecture 6
  • Rule languages (esp. Rule Interchange Format RIF)
  • W3C recommendation for exchanging rule sets
    between rule engines
  • Extend ontology languages with proprietary axioms
  • Based on different types of logics
  • Description Logic
  • Logic Programming
  • E.g. used to enable reasoning over data to infer
    new knowledge
  • More in lecture 8

30
SPARQL Example
  • SPARQL query for other music groups that members
    of ABBA sing in
  • PREFIX rdf lthttp//www.w3.org/1999/02/22-rdf-syn
    tax-nsgt
  • PREFIX foaf lthttp//xmlns.com/foaf/0.1/gt
  • PREFIX mo lthttp//purl.org/ontology/mo/gt
  • SELECT ?memberName ?groupName
  • WHERE
  • lthttp//www.bbc.co.uk/music/artists/d87e52c5-bb8
    d-4da8-b941-9f4928627dc8artistgt
  • momember ?m .
  • ?x momember ?m .
  • ?x rdftype moMusicGroup .
  • ?m foafname ?memberName .
  • ?x foafname ?groupName
  • FILTER (?groupName ltgt ABBA)

31
SPARQL Example
  • SPARQL query for other music groups that members
    of ABBA sing in
  • Graphical representation of WHERE clause

lthttp//www.bbc.co.uk/music/artists/d87e52c5-bb8d
-4da8-b941-9f4928627dc8artistgt momember ?m
. ?x momember ?m . ?x rdftype moMusicGroup
. ?m foafname ?memberName . ?x foafname ?gro
upName
32
Two RIF rules for mapping FOAF predicates
  • True statements in antecedent of rule mean true
    statements in its conclusion
  • if ?x foaffirstName ?first
  • foafsurname ?last
  • then
  • ?x foaffamily_name ?last
  • foafgivenname ?first
  • foafname funcstring-join(?first ?last)
  • if ?x foafname ?name and
  • predcontains(?name, )
  • then
  • ?x foaffirstName funcstrong-before(?name,
    )
  • foafsurname funcstrong-after(?name, )

33
Logics, Proof and Trust
  • Security and Encryption
  • HTTPs provides data integrity and confidentiality
    when transmitting data and queries
  • Digital signing of RDF graphs provides
    authenticity and non-repudiation
  • Unifying logic
  • Bring together the various ontology and rule
    languages
  • Connect unlinked data to provide more meaning to
    data, and drive data integration
  • E.g. identity management and alignment via
    http//sameas.org
  • Proof
  • Explanation of inference results, data provenance
  • Trust
  • Trust that the system performs correctly
  • Trust that the system can explain what it is
    doing
  • Network of trust for data sources and services
  • Technology and user interface
  • Many open problems, topics for future research

34
Foundations
Rules RIF
35
UNICODE
  • More than a-z, A-Z

36
Character Sets
  • ASCII 7 bit, 128 characters (a-z, A-Z, 0-9,
    punctuation)
  • Extension code pages 128 chars (ß, Ä, ñ, ø, Š,
    etc.)
  • Different systems, many different code pages
  • ISO Latin 1, CP1252 Western languages (197 Å)
  • ISO Latin 2, CP1250 East Europe (197 L)
  • Code page is an interpretation, not a property of
    text
  • Swedish programmer would have to write ä
    aÄiÜ'Ön' ü instead of ai'\n'
  • Thus if we do not interpret correctly the code
    page, the result visualized will not be the
    expected one

37
UNICODE an unambiguous code
  • We need a solution that can be unambiguously
    interpreted, i.e. whether a code corresponds to a
    single character and vice versa
  • Thats why UNICODE was created!
  • Å L Æ ?
  • U0024 U00C5 U0139 U00C6 U03AE
  • ? ? ? ? ?
  • U0643 U215D U2665 U0416
    U0E0D

38
UNICODE
  • ISO standard
  • About 100,000 characters, space for 1,000,000
  • Unique code points from U-0000 through U-FFFF to
    U-10FFFF
  • Well-defined process for adding characters
  • When dealing with any text, simply use UNICODE
  • Character code charts http//www.unicode.org/char
    ts/
  • See also
  • http//www.tbray.org/talks/rubyconf2006.pdf
  • http//tbray.org/ongoing/When/200x/2003/04/06/Unic
    ode

39
URI UNIFORM RESOURCE IDENTIFIERS
  • How to identify things on the Web

40
Identifier, Resource, Representation
Taken from http//www.w3.org/TR/webarch/
41
URI, URN, URL
  • A Uniform Resource Identifier (URI) is a string
    of characters used to identify a name or a
    resource on the Internet
  • A URI can be a URL or a URN
  • A Uniform Resource Name (URN) defines an item's
    identity
  • the URN urnisbn0-395-36341-1 is a URI that
    specifies the identifier system, i.e.
    International Standard Book Number (ISBN), as
    well as the unique reference within that system
    and allows one to talk about a book, but doesn't
    suggest where and how to obtain an actual copy of
    it
  • A Uniform Resource Locator (URL) provides a
    method for finding it
  • the URL http//www.auckland.ac.nz identifies a
    resource (UoA's home page) and implies that a
    representation of that resource (such as the home
    page's current HTML code, as encoded characters)
    is obtainable via HTTP from a network host named
    www.auckland.ac.nz

42
URI Syntax
  • Examples
  • http//www.ietf.org/rfc/rfc3986.txt
  • mailtoJohn.Doe_at_example.com
  • newscomp.infosystems.www.servers.unix
  • telnet//melvyl.ucop.edu/
  • URI Syntax scheme //authority /path
    ?query fragid
  • The scheme distinguishes different kinds of URIs
  • Authority normally identifies a server
  • Path normally identifies a directory and a file
  • Query adds extra parameters
  • Fragment ID identifies a secondary resource

43
URI Syntax contd
  • Reserved characters (like /?_at_ )
  • Many allowed characters
  • Rest percent-encoded by UTF-8
  • http//google.com/search?qtechnikerstraC39Fe
  • IRI Internationalized Resource Identifier
  • Allows whole UNICODE
  • Specifies transformation into URI mostly UTF-8
    encoding

44
URI Schemes
Scheme Description RFC
file Host-specific file names 1738
ftp File Transfer Protocol 1738
http Hypertext Transfer Protocol 2616
https Hypertext Transfer Protocol Secure 2818
im Instant Messaging 3860
imap internet message access protocol 5092
ipp Internet Printing Protocol 3510
iris Internet Registry Information Service 3981
ldap Lightweight Directory Access Protocol 4516
mailto Electronic mail address 2368
mid message identifier 2392
  • Schemes partition the URI space into subspaces
  • Schemes can add or clarify properties of
    resources
  • Ownership (how authorities are formed)
  • Persistence (how stable the URIs should be)
  • Protocol (default access protocol)

From http//www.iana.org/assignments/uri-schemes.h
tml
45
XML EXTENSIBLE MARKUP LANGUAGE
  • How to exchange structured data on the Web

46
eXtensible Markup Language
  • Language for creating languages
  • Meta-language
  • XHTML is a language HTML expressed in XML
  • W3C Recommendation (standard)
  • XML is, for the information industry, what the
    container is for international shipping
  • For structured and semistructured data
  • Main plus wide support, interoperability
  • Platform-independent
  • Applying new tools to old data

47
Structure of XML Documents
  • Elements, attributes, content
  • One root element in document
  • Characters, child elements in content

48
XML Element
  • Syntax ltnamegtcontentslt/namegt
  • ltnamegt is called the opening tag
  • lt/namegt is called the closing tag
  • Examples
  • ltgendergtFemalelt/gendergt
  • ltstorygtOnce upon a time there was. lt/storygt
  • Element names case-sensitive

49
Attributes to XML Elements
  • Name/value pairs, part of element contents
  • Syntax
  • ltname attribute_name"attribute_value"gtcontentslt/n
    amegt
  • Values surrounded by single or double quotes
  • Example
  • lttemperature unit"F"gt64lt/temperaturegt
  • ltswearword language'fr'gtconlt/swearwordgt

50
Empty Elements
  • Empty element ltnamegtlt/namegt
  • This can be shortened ltname/gt
  • Empty elements may have attributes
  • Example
  • ltgrade value'A'/gt

51
Comments
  • May occur anywhere in element contents or outside
    the root element
  • Start with lt!--
  • End with --gt
  • May not contain a double hyphen
  • Comments cannot be nested
  • Exampleltelementgtcontent lt!-- a comment, will
    be ignored in processing --gtlt/elementgtlt!--
    comment outside the root element --gt

52
Nesting Elements
  • Elements may contain other (child) elements
  • The containing element is the parent element
  • Elements must be properly nested
  • Example with improper nesting
  • ltbgtbold ltigtbold-italiclt/bgt italic?lt/igt
  • The above is not XML (not well-formed)

53
Special Characters in XML
  • lt and gt are obviously reserved in content
  • Written as lt and gt
  • Same for ' and " in attribute values
  • Written as apos and quot
  • Now is also reserved
  • Written as amp
  • Any character 223 or xdf ? ß
  • Decimal or hexa-decimal unicode code point
  • Elements and attributes whose name starts with
    xmlare also special

54
Uses of XML
  • Document mark-up XHTML
  • HTML is a language, so it can be expressed in XML
  • Exchanged data
  • Scalable vector graphics SVG
  • E-commerce ebXML
  • Messaging in general SOAP
  • And many more standards
  • Internal data
  • Databases
  • Configuration files
  • Etc.

55
Why XML?
  • For semistructured data
  • Loose but constrained structure
  • Unspecified content length
  • For structured data
  • Table(s) or similar rows
  • Well-defined structure, data types
  • Good interoperability
  • But requirements for quick access, processing

56
XML Parsers
  • Document Object Model (DOM) builder
  • Creates an object model of XML document,
    tree-traversal API
  • In-memory representation, random access
  • DOM complex, simpler JDOM etc.
  • Simple API for XML parsing (SAX)
  • Views XML as stream of events
  • el_start("date"), attribute("day", "10"),
    el_end("date")
  • Content reported as callback to methods on
    handler object of design
  • DOM builder can use SAX
  • Pull parsers
  • Intermediate parsed results can be accessed as
    local variables
  • StAX (JAVA), XMLReader (PHP), System.XML.XMLReader
    (.NET)

57
NAMESPACES
  • How to distinguish categories of resources

58
The Problem
  • Documents use different vocabularies
  • Example 1 CD music collection
  • Example 2 online order transaction
  • Merging multiple documents together
  • Name collisions can occur
  • Example 1 albums have a ltnamegt
  • Example 2 customers have a ltnamegt
  • How do you differentiate between the two?

59
The Solution Namespaces!
  • What is a namespace?
  • A syntactic way to differentiate similar names in
    an XML document
  • Binding namespaces
  • Uses Uniform Resource Identifier (URI)
  • e.g. http//example.com/NS
  • Can bind to a named or default prefix

60
Namespace Binding Syntax
  • Use xmlns attribute
  • Named prefix
  • ltafoo xmlnsahttp//example.com/NS/gt
  • Default prefix
  • ltfoo xmlnshttp//example.com/NS/gt
  • Element and attribute names are qualified
  • URI, local part (or local name) pair
  • e.g. http//example.com/NS , foo

61
Example Document I
  • Namespace binding
  • lt?xml version1.0 encodingUTF-8?gt
  • ltordergt
  • ltitem codeBK123gt
  • ltnamegtCare and Feeding of Wombatslt/namegt
  • ltdesc xmlnshtmlhttp//www.w3.org/1999/xhtmlgt
  • The lthtmlbgtbestlt/htmlbgt book ever written!
  • lt/descgt
  • lt/itemgt
  • lt/ordergt

62
Example Document II
  • Namespace scope
  • lt?xml version1.0 encodingUTF-8?gt
  • ltordergt
  • ltitem codeBK123gt
  • ltnamegtCare and Feeding of Wombatslt/namegt
  • ltdesc xmlnshtmlhttp//www.w3.org/1999/xhtmlgt
  • The lthtmlbgtbestlt/htmlbgt book ever written!
  • lt/descgt
  • lt/itemgt
  • lt/ordergt

63
Example Document III
  • Bound elements
  • lt?xml version1.0 encodingUTF-8?gt
  • ltordergt
  • ltitem codeBK123gt
  • ltnamegtCare and Feeding of Wombatslt/namegt
  • ltdesc xmlnshtmlhttp//www.w3.org/1999/xhtmlgt
  • The lthtmlbgtbestlt/htmlbgt book ever written!
  • lt/descgt
  • lt/itemgt
  • lt/ordergt

64
XML SCHEMA
  • How to define XML document structures

65
What is it?
  • A grammar definition language
  • More expressive than Document Type Definitions
    (DTDs)
  • Uses XML syntax
  • Defined by W3C
  • Primary features
  • Datatypes
  • e.g. integer, float, date, etc
  • More powerful content models
  • e.g. namespace-aware, type derivation, etc

66
XML Schema Types
  • Simple types
  • Basic datatypes
  • Can be used for attributes and element text
  • Extendable
  • Complex types
  • Defines structure of elements
  • Extendable
  • Types can be named or anonymous

67
Simple Types
  • DTD datatypes
  • Strings, ID/IDREF, NMTOKEN, etc
  • Numbers
  • Integer, long, float, double, etc
  • Other
  • Binary (base64, hex)
  • QName, URI, date/time
  • etc

68
Deriving Simple Types
  • Apply facets
  • Specify enumerated values
  • Add restrictions to data
  • Restrict lexical space
  • Allowed length, pattern, etc
  • Restrict value space
  • Minimum/maximum values, etc
  • Extend by list or union

69
A Simple Type Example
  • Integer with value (1234, 5678
  • ltxsdsimpleType nameMyIntegergt
  • ltxsdrestriction basexsdintegergt
  • ltxsdminExclusive value1234/gt
  • ltxsdmaxInclusive value5678/gt
  • lt/xsdrestrictiongt
  • lt/xsdsimpleTypegt

70
A Simple Type Example II
  • Validating integer with value (1234, 5678
  • ltdata xsitype'MyInteger'gtlt/datagt INVALID
  • ltdata xsitype'MyInteger'gtAndylt/datagt INVALID
  • ltdata xsitype'MyInteger'gt-32lt/datagt INVALID
  • ltdata xsitype'MyInteger'gt1233lt/datagt INVALID
  • ltdata xsitype'MyInteger'gt1234lt/datagt INVALID
  • ltdata xsitype'MyInteger'gt1235lt/datagt
  • ltdata xsitype'MyInteger'gt5678lt/datagt
  • ltdata xsitype'MyInteger'gt5679lt/datagt INVALID

71
Complex Types
  • Element content models
  • Simple
  • Mixed
  • Unlike DTDs, elements in mixed content can be
    ordered
  • Sequences and choices
  • Can contain nested sequences and choices
  • All
  • All elements required but order is not important

72
A Complex Type Example I
  • Mixed content that allows ltbgt, ltigt, and ltugt
  • ltxsdcomplexType nameRichText mixedtruegt
  • ltxsdchoice minOccurs0 maxOccursunboundedgt
  • ltxsdelement nameb typeRichText/gt
  • ltxsdelement namei typeRichText/gt
  • ltxsdelement nameu typeRichText/gt
  • lt/xsdchoicegt
  • lt/xsdcomplexTypegt

73
A Complex Type Example II
  • Validation of RichText
  • ltcontent xsitype'RichText'gtlt/contentgt
  • ltcontent xsitype'RichText'gtAndylt/contentgt
  • ltcontent xsitype'RichText'gtXML is
    ltigtawesomelt/igt.lt/contentgt
  • ltcontent xsitype'RichText'gtltBgtboldlt/Bgtlt/contentgt
    INVALID
  • ltcontent xsitype'RichText'gtltfoo/gtlt/contentgt
    INVALID

74
EXTENSIONS
75
Building On The Foundations
  • RDF for semantic data
  • Graphs of linked data
  • Semantic Web
  • Any XML or HTML can support translation to RDF
  • GRDDL a pointer to a transformation
  • RDFa RDF in XHTML
  • Makes existing data part of the Semantic Web
  • XML has encryption and digital signature
  • Necessary technologies for data provenance, trust

76
Web of Linked Data
77
RDF Teaser
  • Resource Description Framework
  • Metadata about Web resources
  • But also any other data
  • Graphs of resources interlinked with properties
  • Rafael Nadal plays Tennis
  • knows Shakira
  • Shakira sings Waka waka
  • Ontology languages for data schemas
  • Various properties knows, plays, sings
  • Classes of resources Person, Athlete, Singer,
    Sport, Song
  • SPARQL for querying the data

78
ILLUSTRATION BY A LARGER EXAMPLE
  • The Semantic Web architecture in practice

79
Semantic Conference
  • All the data about the conference is part of the
    Semantic Web
  • Date, location
  • Organizers, peer-review committees
  • Articles (papers), their authors
  • Detailed program schedule
  • Each Semantic Web architecture layer plays a
    role
  • ISWC is annotating conference data usingSemantic
    Web technologies
  • http//data.semanticweb.org/conference/iswc/2011/h
    tml
  • Currently available data regards only papers and
    authors
  • This could be extended to support features
    discussed above

80
Foundation Layers
  • UNICODE
  • All participants' names should be in UNICODE
    because they are international Denny Vrandecic,
    Diego Meroño, François Maué
  • Same for paper titles "a-decay and ß-decay of
    heavy atoms"
  • URI All important things must have identifiers,
    for example
  • Conference http//data.semanticweb.org/conference
    /iswc/2011
  • Participant http//data.semanticweb.org/person/pi
    ero-bonatti
  • Participant's affiliation http//data.semanticweb
    .org/organization/talis-information-limited
  • Paper http//data.semanticweb.org/conference/iswc
    /2011/paper/tutorial/7

81
Data Layers
  • XML
  • The HTML pages should be in XHTML
  • The RDF data (below) should be in RDF/XML
  • News feed should be in Atom (an XML format)
  • RDF
  • The conference dataset, and any useful subsets,
    should be published in RDF for download for
    example
  • http//data.semanticweb.org/conference/iswc/2011/r
    df

82
Ontologies, Query
  • RDFS, OWL
  • The conference would use various vocabularies and
    ontologies, such as
  • FOAF (Friend of a friend) for talking about the
    attendees and authors/presenters
  • Dublin Core for paper metadata
  • Calendar ontology for the program
  • SPARQL
  • The conference server should have a public SPARQL
    endpoint that can be used for queries over the
    conference data
  • http//data.semanticweb.org/snorql/

83
Browsing ISWC Data
http//data.semanticweb.org/person/tom-heath/html
84
Querying ISWC Data
http//data.semanticweb.org/snorql/
85
SUMMARY
  • Thats almost all for today

86
Things to Keep in Mind
  • Semantic Web builds on the Web
  • For any text, use UNICODE, probably UTF-8
  • URIs can identify anything
  • Not only documents on the Web
  • XML helps with data exchange, interoperability
  • XML languages are distinguished with namespaces

87
References
  • Mandatory
  • http//www.w3.org/TR/webarch/
  • http//www.w3.org/DesignIssues/Architecture.html
  • Further reading
  • http//www.w3.org/Provider/Style/URI
  • http//www.ietf.org/rfc/rfc3986.txt
  • http//www.unicode.org/charts/
  • http//www.tbray.org/talks/rubyconf2006.pdf
  • http//tbray.org/ongoing/When/200x/2003/04/06/Unic
    ode
  • http//www.w3.org/TR/xml/
  • http//www.w3.org/TR/xml-names/
  • http//www.w3.org/TR/xmlschema-1/
  • Fensel et al., On2broker Semantic-Based Access
    to Information Sources at
  • the WWW
  • Fensel et al. Ontobroker in a Nutshell
  • http//www.ics.uci.edu/fielding/pubs/dissertation
    /top.htm
  • http//www.w3.org/DesignIssues/Semantic.html

88
References
  • Wikipedia links
  • http//en.wikipedia.org/wiki/Semantic_Web_Stack
  • http//en.wikipedia.org/wiki/URI
  • http//en.wikipedia.org/wiki/Unicode
  • http//en.wikipedia.org/wiki/XML
  • http//en.wikipedia.org/wiki/XML_Namespaces
  • http//en.wikipedia.org/wiki/Resource_Description_
    Framework
  • http//en.wikipedia.org/wiki/RDF_Schema
  • http//en.wikipedia.org/wiki/Web_Ontology_Language
  • http//en.wikipedia.org/wiki/SPARQL
  • http//en.wikipedia.org/wiki/Rule_Interchange_Form
    at

89
Next Lecture
Title
1 Introduction
2 Semantic Web Architecture
3 Resource Description Framework (RDF)
4 Web of Data
5 Generating Semantic Annotations
6 Storage and Querying
7 Web Ontology Language (OWL)
8 Rule Interchange Format (RIF)
90
Questions?
90
Write a Comment
User Comments (0)
About PowerShow.com