Title: DataMining versus SemanticWeb
1DataMining versus SemanticWeb
- Veljko Milutinovic, vm_at_etf.bg.ac.yu
- http//galeb.etf.bg.ac.yu/vm
This material was developed with financial help
of the WUSA fund of Austria.
2DataMining versus SemanticWeb
- Two different avenues leading to the same goal!
- The goal Efficient retrieval of knowledge,from
large compact or distributed databases, or the
Internet - What is the knowledge Synergistic interaction
of information (data)and their relationships
(correlations). - The major difference Placement of complexity
3Essence of DataMining
- Data and knowledge representedwith simple
mechanisms (typically, HTML)and without metadata
(data about data). - Consequently, relatively complex algorithms have
to be used (complexity migratedinto the
retrieval request time). - In return,low complexity at system design time!
4Essence of SemanticWeb
- Data and knowledge representedwith complex
mechanisms (typically XML)and with plenty of
metadata (a byte of data may be accompanied
with a megabyte of metadata). - Consequently, relatively simple algorithms can
be used (low complexity at the retrieval request
time). - However, large metadata designand maintenance
complexityat system design time.
5Major Knowledge Retrieval Algorithms (for
DataMining)
- Neural Networks
- Decision Trees
- Rule Induction
- Memory Based Reasoning,
- etc
- Consequently, the stress is on algorithms!
6Major Metadata Handling Tools (SemanticWeb)
- XML
- RDF
- Ontology Languages
- Verification (Logic Trust) Efforts in Progress
- Consequently, the stress is on tools!
7Semantic Web Tutorial Structure (Overview)
- Introduction to the Semantic Web
- XML Technologies for the Semantic Web
- Defining vocabularies with RDF
- Ontologies and ontology languages
- Challenges for the Semantic Web
- References
8World Wide Web - Today
Information consumer
preferences
preferences
Information request
Search Engines (eg. Google), Information Portals
Indexing, refences, collections
Information and Service Providers
9Semantic Web - Vision
User
Preferences
Calendar
Calendar
Preferences
Information and Service Provider
10A Definition of the Semantic Web
-
- Semantic Web is an extension of the current
web in which
information is given well-defined meaning, better
enabling computers and people to work in
cooperation - Tim Berners-Lee, James Hendler, Ora Lassila,
The Semantic Web, Scientific American, May 2001
11Why?
- To use the large amount of information on the Web
more effectively - To enable more advanced automated processing on
the Web - machines can understand the content - Intelligent browsers
to help you
find what you are looking for - To derive new information from existing
information (reasoning) - Virtual global database - Advanced applications and services become
possible, e.g. in - - e-business
- - e-government
- - e-learning
12Examples
- Context-awareness -- linking based on the meaning
of the information elements - Filtering -- you could rate the pages you visit,
and this is later used for automatic general
recommendations - Annotations -- you could add comments to the
information on the Web, and these comments can
be shown to other visitors
- Privatization -- you can create your own database
of information from the Web
13How? - Semantic Web layer model
14Trusted Web Resources
DAMLOIL
Shared Terminology
machine ??machine 2010
OWL
XML
Self Describing
Documents 2000
RDF
HTTP
Foundation of Web today 1990
Human ??Machine
HTML
SGML
Document Exchange Format 1985
Hy Time
15Building Blocks
Semantic Web
Metadata
URI
Data about data labeling and
structuring information in a document
Universal Resource Identifier an
universal and unique name for any resource
http//www.something.com/one
16Minimalist Design
- Making it as simple as possible
- Simplicity helps future evolution of Semantic Web
17Inference
- Deriving new data from the existing ones
- Merging data repositories gives new information
- Allows the creation of more powerful applications
(intelligent agents) - Unfortunately, inference can be achieved
completely only when the semantics is defined
formally in a language(e.g. "First Order
Predicate Logic languages)
18Tutorial Structure
- Introduction to the Semantic Web
- XML Technologies for the Semantic Web
- Defining vocabularies with RDF
- Ontologies and ontology languages
- Challenges for the Semantic Web
- References
19XML Technologies for the Semantic Web
- Overview
- XML Instances
- XML Document Type Definition
- XML Linking
- XML Schema
- XML Query Language
20What is an XML-Document ?
lt?xml version"1.0"?gt ltagt ltb id"x1"gt
ltcgtDavidlt/cgt ltcgtMarielt/cgt lt/bgt
ltd/gt ltb id"x2"gt ltcgtJohnlt/cgt
lt/bgt lt/agt
a
a
idx1
idx2
b
d
b
b
d
id
c
c
c
c
David
Marie
John
Schema (Document Type Definition, DTD)
File Format (Instance)
Tree Structure Instance
21The XML Stack
Specific Applications
Standardized Applications XHTML, SVG, SMIL, P3P,
MathML
Metadata - RDF, RDFS
Hyperlinks - XLink - XPointer
Layout - XSL - CSS
Schemas - XSD - Namespaces
API - DOM - SAX
Queries - XPath - XQuery
XML 1.0
Unicode
Locators (URI)
DTDs
22Example of songs.xml
- Example of describing a song in songs.xml using
music.dtd - ltsonggt
lttitlegtGipsy songlt/titlegt
ltartistgtVlatko Stefanovskilt/artistgt
lttype classETHNO /gtltdownload
classYES/gtltcomments/gtlt/songgt
parent element
defined in music.dtd
child elements
defined in music.dtd
23Music.dtd
Parent element
- lt!ELEMENT song (title, artist, album?, type,
format?, download, comments?)gt - lt!ELEMENT title (PCDATA)gt
- lt!ELEMENT artist (PCDATA)gt
- lt!ELEMENT type EMPTYgt
- lt!ATTLIST type
- class (CLASSICAL ROCK POP RAP
- JAZZ TECHNO ETHNO) REQUIREDgt
- lt!ELEMENT download EMPTYgt
- lt!ATTLIST download
- class (YES NO) "YES"
- gt
- lt!ELEMENT comments (PCDATA)gt
Child elements
Attributes describe content
List of values for download
24XML Linking
Simple Link
Extended Link
XPointer
Link Group
25XPath
- A language that enables us to address parts of an
XML document (elements, attributes, ) - Select the title elements of the song elements
of the catalog element and all the artist
elements in the document - /catalog/song/title //artist
- Selects all the song elements of the catalog
element that have a download element with a value
of yes - /catalog/songdownloadyes/title
selects any element in the document
selects the child element
selects several paths
26Also
- Use to select unknown XML elements
- /catalog//artist
- Use _at_attribute_name to specify an attribute
- //song_at_typeclassical'
- XPath expressions logical, arithmetical
- /catalog/songdurationlt5
- XPath functions - count(), id(), last(), name(),
concat(), string(), trenslate(), sum(), round(),
false(), not(), - /catalog/songlast()
- To select nodes from the XML document (IE)
- xmlDoc.selectNodes("/catalog/song/title/text()")
the path
27XPointer
- Locates portions of other XML documents
(elements, attributes), without the need to
place anchors inside those documents (as in
HTML) - More robust to the changes in the target document
- URL XPath
- http//www.music.org/first.xml/xpointer(//song/ti
tle1)
XPointer expression (XPath language)
URL of the document we point into
28XML Schema
- XML Schema defines a class of XML documents
- Defines (explains) the datatypes, elements, and
attributes - Defines and catalogues vocabularies
for classes of XML documents - The document described by an XML schema
can be called an instance (parallel to
OOP) - The schema language, considerably extends the
capabilities of XML 1.0 document type definitions
(DTDs), most importantly with datatypes
29Limitations of DTDs
Practically no reuseof contentmodels
Syntax Not XML
- lt!ELEMENT song (title, artist, album?, type,
format?, download, comments?)gt - lt!ELEMENT title (PCDATA)gt
- lt!ELEMENT artist (PCDATA)gt
- lt!ELEMENT type EMPTYgt
- lt!ATTLIST type
- class (CLASSICAL ROCK POP RAP
JAZZ TECHNO ETHNO) REQUIREDgt - lt!ELEMENT download EMPTYgt
- lt!ATTLIST download
- class (YES NO) "YES"gt
- lt!ELEMENT comments (PCDATA)gt
Constructors Elementset withContent Model
Datentypes Essentially only "String"
30XML Schema Components
- An XML Schema is comprised of a set of schema
components - There are three groups of components
- Primary components - Simple type definitions,
Complex type definitions, Attribute declarations,
Element declarations - Secondary components - Attribute group
definitions, Identity-constraint definitions,
Model group
definitions, Notation declarations - Helper components Annotations, Model groups,
Particles, Wildcards, Attribute Uses
31Example song
Type definition
- ltxsdcomplexType namesong" gt
ltxsdsequencegt
ltxsdelement nametitle"
type"xsdstring"/gt ltxsdelement nameartist"
type"xsdstring"/gt lt/xsdsequencegt - ltxsdattribute namelength" type"xsdduration"/gt
lt/xsdcomplexTypegt - xsd used to denote XML Schema namespace
Complex type
ltxsdchoicegt
Simple type
Type declarations
lt/xsdchoicegt
32Reusability of schemas
- xsinclude to include a schema from another
document (copy-paste) - ltxsinclude schemaLocationcollection.xsd"/gt
- xsredefine same, plus it lets you redefine
schema - xsimport - reusing definitions from other
namespaces (a system of libraries) - ltxsimport namespace"http//www.w3.org/XML/1998/
namespace" schemaLocation"myxml.xsd"/gt - Now we can reference an external element from
the imported namespace in our schema -
33Tutorial Structure
- Introduction to the Semantic Web
- XML Technologies for the Semantic Web
- Defining vocabularies with RDF
- Ontologies and ontology languages
- Challenges for the Semantic Web
- References
34Defining vocabularies with RDF
- Motivation for RDF
- RDF Instances
- Basic concepts and building blocks
- Syntax options
- Reification
- Collections
- RDF Schema Defining your own Vocabularies
- Supporting Interoperability with RDF
35What do we NOT get from XML?
- Superimposing (meta) information
- XML combines metainformation and content
- Datatypes that we can reason about
- ExampleCLASSICAL ROCK POP RAP JAZZ
TECHNO ETHNOis just a choice of allowed
strings. We cannot represent that DIXIE is a
subclass of JAZZ, BLUES overlaps with ROCK,
ETHNO - Bottom up reuse of vocabularies
- Independently evolved XML Schemas for one and
the same thing - How do you model an address?
36RDF Defining Semantics on the Web
- There is a need to describe resources on the Web
in a form that can be interpreted by machines
across the Web - Interpretation depends on the context of a
resource eg. Jaguar (car vs. beast) - Using their experience and cognitive abilities
humans may infer the context of a resource in
many ways, even if it is not made explicit - Software can interpret context only if it is
described explicitly and formally - RDF and the ontology languages building upon RDF
provide means to explicate (part of) this context
37RDF-Resource Description Framework
- Defines a framework for structuring and
describing resources like documents in the
Semantic Web - Enables the definition of vocabularies for the
description of resources in an application
domain - Goals
- Extensibility, interoperability, and reuse of
vocabularies - Improved support for interpretation of data by
machines
38The RDF Data Model
- Simple but powerful datamodel for the description
of resources and the creation of metadata - Consists of three core concepts
- Resource
- Property
- Statement
- Class (in RDF Schema)
- Similar to other modeling approaches (e.g.
object-oriented modeling), but property-centric,
not class-centric
39RDF Statement and Graph
- Each triple (S, P, O) node - arc - node
represents an RDF statement - Gipsy song is performed by Vlatko Stefanovski.
subject (resource)
object (resource or literal)
predicate (property)
http//www.music.org/songs/g/gipsySong
http//www.artist.org/stefanovski
Performed by
Artist represented by his homepage
Song represented by entry in a (fictive) song
directory
40Arcs in the RDF Graph
- An Arc
- represents the predicate of an RDF statement
- is labeled with a URI referring to an RDF
property - is directed pointing from the subject of a
statement to the object of a statement
object
subject
predicate
http//www.artist.org/stefanovski
http//www.music.org/songs/g/gipsySong
musicperformed by
41RDF Resource
- The Resource forms the central concept in RDF
- Anything that can be described can act as a
resource - Web page, part of web page, web site, book,
photograph, persons, - Resources are identified by a
resource identifier - URI (plus
optional anchor IDs) - Compare for an entity
(in the Entity Relationship
model) or an
object (in an object-oriented model)
42RDF Property
- An RDF Property is used to express
- A characteristic of an resource or
- A binary relation between resources
- A predicate in a statement
- A property can be compared to a (binary)
relationship among entities (in the Entity
Relationship model) -
43Example
- The individual whose name is Vlatko Stefanovski
and whose email is V.Stefanovski_at_artists.org, is
the artist of http//www.music.org/songs/g/gipsySo
ng
URI reference
http//www.music.org/songs/g/gipsySong
blank node
musicartist
node
personname
personhomepage
Vlatko Stefanovski
http//www.artists.org/stefanovski
literal
44XML Serialization
- How to translate the RDF graph structure into
XMLs tree-oriented notation -
- ltrdfDescription rdfabout http//www.music.or
g/songs/g/gipsySonggt - ltmusicperformedbygt
- ltrdfDescriptiongt
- ltpersonnamegt Vlatko Stefanovskilt/personn
amegt - ltpersonhomepagegt
- ltrdfDescription
- about http//www.artists.org
/stefanovskigt - lt/rdfDescriptiongt
- lt/personhomepagegt
- lt/rdfDescriptiongt
- lt/musicperformadbygt
- lt/rdfDescriptiongt
http//www.music.org/songs/g/gipsySong
musicperformedby
personname
personhomepage
http//www.artists.org/stefanovski
Vlatko Stefanovski
45Reification
- Latin Res ... Thing -gt Reification ... Thing
Making - Statements themselves can be considered as
resources (things) in RDF. Thus, it is possible
to make statements about statements
(Reification). - Possible applications
- Definition of a context for a statement with
respect to time, place, validity, . - Embed a statements into a discourse (claims,
doubts, proofs of statements) -
- Example
- Statement A ltsonata XYgt ltcomposergt ltMozartgt
- Statement B ltmusic expert Agt ltclaimsgt ltstatement
Agt - ltmusic expert Cgt ltdoubtsgt ltstatement Agt
46Reification Syntax
- The statement to be reified has to be modeled as
an RDF resource - The RDF vocabulary provides special constructs
for this purpose - The class rdfStatements which is the type of
all RDF statements. - The property rdftype which is used to associate
an RDF resource with a class. - The property rdfsubject refers to the subject
of the modeled statement (i.e. to the described
resource) - The property rdfpredicate refers to the property
used as a predicate in the modeled statement - The property rdfobject refers to the object of
the modeled statement (i.e. the property value)
47How to create a reified statement?
- Associate the subject, predicate and object of
the statement with the resource rdf Statement
This is done by using the rdfsubject,
rdfpredicate and rdfobject properties
rdfStatement
rdftype
rdfsubject
rdfpredicate
rdfobject
musiccomposer
www.operas.org/Zauberflöte
www.operas.org/Zauberflöte
www.artists.org/Mozart
www.artists.org/Mozart
musiccomposer
48How to create a reified statement?
- Now the created node which represents the
statements can be used as an object or subject
of another RDF statement
Statement becomes a resource
www.musicExperts.org/ExpertA
rdfStatement
rdftype
musicclaimedBy
rdfsubject
rdfpredicate
rdfobject
musiccomposer
www.operas.org/SonataXY
www.artists.org/Mozart
49XML Syntax for Reification
- ltrdfRDF xmlnsrdf "http//w3.org/1999/02/22-rdf
-syntax-ns" xmlnsmusic"http//ipsi
.fhg.de/music-schema"gt - ltrdfDescriptiongt
- ltrdftype resource" http//w3.org/1999/02/22-rd
f-syntax-nsStatement gt
- ltrdfsubject resource"http//www.operas.org/Son
ataXY " /gt - ltrdfpredicate resource"http//ipsi.fhg.de/musi
c-schemaComposer" /gt - ltrdfobject resource http//www.artists.org/M
ozart /gt - ltmusicclaimedBy resource
http//www.musicExperts.org/ExpertA /gt
- lt/rdfDescriptiongt
- lt/rdfRDFgt
Property of the statement
50RDF Collections
- An RDF Container models a collection of
resources. - The RDF model supports three types of containers
- Bag - an unordered list of resources or literals.
- Sequence - An ordered list of resources or
literals. - Alternative - A list of resources or literals
that represent alternatives for the (single)
value of a property. - Bag and Sequence can be used for multivalued
properties
51Container - RDF Graph Syntax
- Example Collection of Arias
rdfSequence
rdftype
/Aria1
rdf_1
http//www.opera.org/Zauberflöte
/Aria2
rdf_2
musicarias
rdf_3
/Aria3
rdf_4
/Aria4
52Container - XML Syntax
- ltrdfRDFgt
ltrdfDescription
about"http//www.operas.org/Zauberflöte"gt ltmusic
ariasgt ltrdfSequencegt ltrdfli
resource/Aria1gt - ltrdfli resource/Aria2"/gt
- ltrdfli resource/Aria3"/gt
- ltrdfli resource/Aria4"/gt
- lt/rdfSequencegt
- lt/musicariasgt
- lt/rdfDescriptiongt
- lt/rdfRDFgt
53Statements about a Container and its members
- rdfabout is used to make a statement on the
comtainer as a whole - rdfaboutEach is used to make a statement about
each of the members of the container - rdfaboutEachPrefix makes a statement about each
member resource of a Bag that is only implicitly
defined. This Bag contains all the resources
whose fully resolved resource identifiers begin
with the character string given as the value of
the attribute
54RDF Schema
- The RDF vocabulary description language - RDF
Schema stresses - Reuse and extension of existing schemata
- Semantic enrichment by concept hierarchies
- Enables statements on the schema level to
- define classes of resources
- define relationships between these classes
- define the kinds of properties that instances of
that classes have - define relationships between properties
- to restrict possible combinations of classes and
relationships/properties - Allows mixing of schemata
55Defining an RDF Class (Example)
- Class
- ltrdfsClass rdfabout http//www.ipsi.fhg.de/mu
sic-schemaMusicCompositiongt - ltrdfssubClassOf rdfresource http//www.w3.org
/2000/rdf-schemaResourcegt - ltrdfslabelgtMusicCompositionlt/rdfslabelgt
- lt/rdfsClassgt
- Instance
- ltrdfDescription about http//www.operas.org/Za
uberflötegt - ltrdftype
- rdfresource http//www.ipsi.fhg.de/music-sche
maMusicComposotion gt - lt/rdftypegt
-
- lt/rdfDescriptiongt
rdfsResource
rdfssubClassOf
musicMusicComposition
rdftype
www.operas.org/Zauberflöte
56Class-centric vs. Property-centric
- Class-centric
- Attributes as part of the class definition
- Stresses common structure
- Property-centric
- Property as first-class object
- Stresses extensibility and flexibility with
respect to properties
musiccomposer
rdfsrange
rdfsdomain
musicMusicTitle
personPerson
rdfsLiteral
rdfsrange
rdfsdomain
rdflabel
57Defining Concept Hierarchies
- rdfssubClassOf represents a specialization
relationship between RDF classes (transitive).
musicGenre
rdfssubClassOf
rdfssubClassOf
musicModern
musicClassic
rdfssubClassOf
rdfssubClassOf
rdfssubClassOf
musicSonata
musicRock
musicOpera
rdfssubClassOf
rdftype
rdfssubClassOf
musicRockOpera
http//www.operas.org/Zauberflöte
rdftype
musicMusicTitle
58RDF Property Hierarchies
- rdfssubPropertyOf
- Is used to specify that one property is a
specialization of another property - If a resource r has value v for property p1 and
property p1 is subproperty of p2 than r also has
value v for property p2.
SomeSong is also a value of performs
performs
SomeSong
rdfssubPropertyOf
SomeSong is a value for sings
sings
performs
sings
Cher
59Tutorial Structure
- Introduction to the Semantic Web
- XML Technologies for the Semantic Web
- Defining vocabularies with RDF
- Ontologies and ontology languages
- Challenges for the Semantic Web
- References
60Ontologies and Ontology Languages
- What is an Ontology?
- The Ontology Language OWL
- Taxonomies
- Property Restrictions
61Ontology
- An ontology is a specification of a
conceptualization. - A conceptualization is an abstract, simplified
view of the world that we wish to represent for
some purpose. - T. R. Gruber. A translation approach to portable
ontologies. Knowledge Acquisition, 5(2)199-220,
1993
62Ontology Languages
- Ontology languages are semantic markup languages
for defining ontologies - DAMLOIL is a combination of the two predecessor
ontology languages - DAML DARPA Agent Markup Language
- OIL - Ontology Inference Layer
- OWL (Web Ontology Language) is the successor DAML
OIL currently developed by the W3C Web Ontology
Group (Status Working Draft) - Building on RDF ideas
- OWL Lite is a subset of OWL
63OWL Characteristics
- OWL enables the definition of
- various types of relationships between classes
(in addition to subclass hierarchies) - additional restrictions for property values
- additional types of relationships between
properties - different kinds of properties
- OWL distinguishes between classes and instances
(objects) on the one side, and data types and
value on the other side (XML Schema datatypes)
64Ontology Definition
- The body of the ontology consists of
- classes
- properties
- instances (for use in class definitions)
- The main component of an ontology is a taxonomy
i.e. a class hierarchy
65Further Class Relationships
- A class definition may also contain other class
relationships - owldisjointWith this property is used to
express that a class is disjoint with another
class (no instances in common) - owlsameClassAs - this property is used to
express that a class is equivalent to another
class (same instances) - The values of these properties are defined by a
class expressions, which in the simplest case is
the name (URI) of a class -
66Properties in OWL
- OWL properties are deferred from RDF properties
- It is possible to define
- different types of properties,where several
property types can be combined with each other - relationships between properties
67Tutorial Structure
- Introduction to the Semantic Web
- XML Technologies for the Semantic Web
- Defining vocabularies with RDF
- Ontologies and ontology languages
- Challenges for the Semantic Web
- References
68Logic and Proof
- Deduction checking a document against a set of
rules - Add predicate logic and quantifiers
- Logic Digital Signature ? Proof
Oh yeah! Prove it.
You owe me 30.
The check is in the email!
Purchased(user1.book1.AOL)www.confirm.comt12211
22 Priceof(book1, 30)AOL-historyDBt29293910
Purchase(a,b,c)
Priceof(b,d)?Owes(a,c,d)www.ont.com/prodont
69Instead of a Conclusion IPSI Projects
- Scalable Technology and Applicationsfor
theSemantic Web
70Individualized Electronic Newspaper
71Dictionary of Art
72XML Broker Integrating Web Resources via XML
ltgolfdemo ltgolfplatzgt ltadressegt ...
lt/adressegt ltgreenfeegt ... lt/greenfeegt
... lt/golfplatzgt ltwettergt ...
lt/wettergt ltroutegt ... lt/routegt lt/golfdemogt
XSL
XML Broker
Query
73Virtual Gallery