Title: The Semantic Web
1The Semantic Web
- David Cornforth
- School of ITEE
- University of NSW _at_ ADFA
- Partially based on previous material by Bob
McKay, Yin Shan and Biao Wang, also of UNSW _at_
ADFA, and material by Jim Hendler at
http//www.cs.umd.edu/hendler/presentations/
2Outline
- Current Search Technologies
- Why a Semantic Web?
- Metadata and RDF
- Ontologies
3Search Engines
- Definition
- Huge databases of web page files that have been
assembled automatically by machine. - One of the primary ways that Internet users find
information - "search engine" often used generically of both
- crawler-based search engines
- Google, altavista,
- human-powered directories
- Yahoo,
4Crawler-Based Search Engines
- "crawl" or "spider" the web, and create listings
automatically - People search through what they have found
- Web page changes are found based on regular
re-crawl - Crawler-based search engine has three parts
- Spider
- Crawler - a robot program that wanders the web
- Index
- Catalog - built up by the spider
- Search software
- Sifts indexed page database to find matches
- Can be spammed by careful tailoring of website
information (i.e. extra text same colour as
background) - Constant wars between search engine and spammers
5Human-Powered Directories
- The original version of Yahoo was human-powered
directory, and depended on humans for its
listings - A search looks for matches only in the
descriptions submitted - Changing web pages has no effect on existing
listing - Almost impossible to spam, because of the human
gateway - Impractical today only a minute portion of the
web can be indexed this way
6Meta-Search Engines
- E.g. Chubba, Copernic and MetaCrawler
- No database for Web pages
- Submit users queries to major search engines
- Collect and display results to user
- maybe re-ranking
- aggregating into one list
- Advantage
- maximized coverage
- Disadvantage
- hard to handle complex queries
7Ranking Web Pages
- Ranking rules (or algorithms) are the core of
search engines and the main point for competition
between search engines - Older-style algorithms base the ranking on the
relevance of the contents of the page - if a page contains the exact query term in the
title - if the term appears early in the document
(location) - if the term is repeated in the document
(frequency)
8Ranking Web Pages
- More recent engines (especially Google) use
variants of the PageRank algorithm - Ranking is based on the connectivity of the page
- Especially, on the number of pages which refer to
this page - And on their ranking
- Can be spammed
- Search engines have (usually secret) algorithms
for detecting and punishing spamming - Usually combined with relevance ranking
9Ranking Relevance
- lthtmlgt
- ltheadgt
- lttitlegtMark Twainlt/titlegt
- lt/headgt
- ltbodygt
- lth1gtMark Twainlt/h1gt
- Nationality Americanltpgt
- Genre Fictionltpgt
- Summary Mark Twain was the pen name of Samuel
Clemens, an American humorist who lived from
1835-1910 - Work
- ltulgt
- ltligtAdventures of Huckleberry Finn 1884
- ltligt.
- lt/bodygt
- lt/htmlgt
lthtmlgt ltheadgt lttitlegtMark Twain Insurance
Companylt/titlegt lt/headgt ltbodygt lth1gtCall Mark
Twain Insurance todaylt/h1gt The Mark Twain
Insurance has been in business since 1956. During
that time, the folks at Mark Twain have
. lt/bodygt lt/htmlgt
10Ranking and Relevance
- Obviously, most people who search Mark Twain on
the Web hope to get the page 1 as return. - Unfortunately, most relevance rules would rank
the page 2 as more relevant
11Other Ranking Rules
- Most commercial search engines have their own
additional ranking rules. - Infoseek and Hotbot factor meta-content into
their formula - Hotbot promotes pages that can attract visitors
by watching what results someone selects for a
particular search. - But they only incrementally improve search results
12Why arent Search Engines Enough?
- The scale and dynamicity of Web information
- Requires Machine-dependent information searching
- The majority of resources are designed for human
browsing rather than machine browsing - Imprecise search results
- Current web searches turn up lots of totally
irrelevant pages - You have to search through the search results
- One approach is Artificial Intelligence
- More sophisticated searching techniques
13The Semantic Web A Solution?
- Vision
- Data on the Web defined and linked in a way that
it can be used by machines - Not just for display purposes
- For automation, integration and reuse of data
across various applications - Establish a machine-understandable Web More
- Homogeneous
- Data-like
- Amenable to search
- Approach
- Establish metadata architecture for Web resources
14What is Metadata?
- For traditional database
- Data about data
- For Web
- Data describing Web resources
- Tim Berners-Lee
- Metadata is machine understandable information
about - web resources
- other things
- The distinction between "data" and "metadata" is
not absolute - many times the same resource will be interpreted
in both ways simultaneously
15Features of Metadata
- Metadata is data
- Metadata can describe metadata
- Metadata may refer to any resource which has a
URI - Metadata may be stored in any resource
- no matter to which resource it refers
- Metadata can be regarded as a set of assertions
- each assertion being about a resource
- Assertions which state a named relationship
between two resources are known links
16Main Principles of Semantic Web
- Everything can be identified by URI's
- Resources and links can have types
- Partial information is tolerated
- There is no need for absolute truth
- Evolution is supported
- Minimalist design
17Main Purpose of Semantic Web
- providing an infrastructure that enables not just
web pages, but - Databases
- Services
- Programs
- Sensors
- personal devices
- even household appliances
- to both consume and produce data on the web
18Layers of Semantic Web
19Roles of Building Blocks
- Unicode
- Means for use of international character sets
- URI
- Means for identifying the objects in Semantic Web
- XML
- Interoperable syntactical foundation
- Upon which the more important issue of
representing relationships and meaning can be
built - Resource Description Framework (RDF) and RDF
Schema System for - Making statements about objects with URI's
- Defining vocabularies that can be referred to by
URI's
20Roles of Building Blocks (cont)
- Ontology
- Supports the evolution of vocabularies
- To allow definition of relations between
different concepts - Digital Signature
- Detecting alterations to documents
- Logic Layer
- Permits the writing of rules
- Proof layer
- Execution of the rules
- Evaluation together with the Trust layer
mechanism - Whether to trust the given proof
- Hence whether to trust the data
21An Example of Metadata
- lthtmlgt
- ltheadgt
- lttitlegtDocumentlt/titlegt
- ltmeta namekeywords contentweb search, RDF,
metadatagt - lt/headgt
- ...
- The most widely known is probably the simple
keywords and descriptions embedded into HTML
META tags - Collected and indexed by the large Web search
engines like Alta Vista. - Only useful if everyone uses the same standards
22What is RDF?
- Resource Description Framework
- A framework for metadata
- provides interoperability between applications
that exchange machine-understandable information
on the Web - emphasizes facilities to enable automated
processing of Web resources - provides the basic building blocks for supporting
the Semantic Web
23How can RDF Help?
- Resource discovery - provide better search engine
capabilities - Cataloguing
- Describing content, content relationships
- available at a particular Web site, Page, or
Digital library - Intelligent software agents
- knowledge sharing
- Exchange
- Content rating (Eg PICS - Platform for Internet
Content Selection) - Describing collections of pages
- Representing a single logical document
- Describing intellectual property rights of Web
pages
24Simple RDF Statement(assertion)
Tom is the author of the paper Pictorially
AuthorOf
Paper
Tom
Subject
Predicate
Object
25RDF Model
- A model is a set of statements
- A statement
- Predicate(subject,object)
- Predicate is a resource
- Subject is a resource
- Object is either a resource or a literal
- Object Predicate(subject)
- Previous Example
- Tom AuthorOf(Paper)
26RDF Resources
- A resource can be anything that has identity.
Familiar examples include an electronic document,
an image, a service (e.g., "today's weather
report for Los Angeles"), and a collection of
other resources. Not all resources are network
"retrievable" e.g., human beings, corporations,
and bound books in a library can also be
considered resources. The resource is the
conceptual mapping to an entity or set of
entities, not necessarily the entity which
corresponds to that mapping at any particular
instance in time. Thus, a resource can remain
constant even when its content---the entities to
which it currently corresponds---changes over
time, provided that the conceptual mapping is not
changed in the process. - (Quoted from RFC 2396)
27RDF Resources (cont)
- A resource is identified by a Resource Identifier
- URI optional anchor id (sub-component)
- The resource identified by a URI may be abstract
- not network retrievable
- Even the link can have a URI
- A statement about a property is called reification
28RDF Syntax
- A RDF data model needs a concrete representation
syntax so it can create and exchange metadata - XML is an obvious choice
- The RDF specification uses it
- BUT
- The RDF data model is not tied to a particular
syntax
29Basic RDF XML Syntax
- Multiple statements for same resource grouped
- into a Description element
- The Description has an attribute about
- Names the resource
- Description contains other elements
- cause the creation of statements in the model
instance - Other descriptions or property values can refer
to it - Using the value of the resources id attribute
in their own about attribute
30XML Syntax for First Example
- ltrdfRDFgt
- ltrdfDescription rdfhrefdcorn/Publications/
Paper01.pdf"gt - ltTitlegtBuilding your own perpetual motion
machinelt/Titlegt - ltAuthorgtDavid Cornforthlt/Authorgt
- lt/rdfDescriptiongt
- lt/rdfRDFgt
- (Description has an initial uppercase)
31Example
- A set of statements can be visualized as a graph.
dcorn/Publications/Paper01.pdf
Author
Title
Building your own perpetual motion machine
David Cornforth
32Namespaces in RDF
- BUT
- Is my Title the same as your Title ?
- XML namespaces can be used to uniquely identify
elements - Each namespace has a URI associated with it.
- RDF schemas identify allowable property types
33Namespaces in RDF
- Predicates must be also labelled by URI
- To eliminate ambiguities that arise from using
only word identifiers. - Eg vocabulary providers can define different
versions of the predicate hasHomepage. - The XML-namespace syntax can be used to
abbreviate URIs in statements - Eg we can define the substitution of the
namespace-prefix w6 for http//www.w6.org/schema/
- Then use simply w6hasHomepage.
34XML for Decker Example
- lt?xml version1.0?gt
- lt rdfRDF
- xmlnshttp//www.w6.org/schema/
- xmlnsrdfhttp//www.w3.org/1999/02/22-rdf
-syntax-ns - xmlnssohttp//www.semantic.org/elements/
gt - ltrdfRDFgt
- ltrdfDescription about"http//www.corn.org/rese
arch"gt - lthasHomepagegt
- ltrdfDescription about"http//www.corn.org/
homepage"gt - ltsoAuthorgtTom Deckerlt/soAuthorgt
- lt/rdfDescriptiongt
- lt/hasHomepagegt
- lt/rdfDescriptiongt
- lt/rdfRDFgt
35Graphical view is
http//www.corn.org/research
hasHomepage
http//www.corn.org/homepage
Author
Tom Decker
36Alternative Form
-
- ltrdfRDFgt
- ltrdfDescription about"http//www.corn.org/re
search/group1"gt - lthasHomepage rdfresource"http//www.corn
.org/homepage"/gt - lt/rdfDescriptiongt
- ltrdfDescription about"http//www.corn.org/ho
mepage"gt - ltsoAuthorgtOra Lassilalt/soAuthorgt
- lt/rdfDescriptiongt
- lt/rdfRDFgt
37Compared to Xlink
- RDF is able to attach URIs to the link properties
itself. Example of Xlink - ltsupervisor xlinkhrefITEE/staff/fstein.xml"
...gt - ltsupervisorname xmllang"en"gt
- ltnamegt
- lttitlegtDr.lttitlegt
- ltgivengtFrankielt/givengt
- ltfamilygtSteinlt/familygt
- lt/namegt
- lt/supervisornamegt
- lt/supervisorgt
38RDF ApplicationDublin Core Metadata
- A set of elements for describing documents
- Intended use
- internet resource discovery tools
- Readily implemented in RDF
- Although the original implementation was in HTML
meta elements
39Dublin Core Properties
- Title
- A name for the resource
- Creator
- Entity primarily responsible for creating the
content - Subject
- The topic of the content
- Description
- A summary of the content
- Publisher
- Entity responsible for making the resource
available
40Dublin Core Properties
- Contributor
- An entity responsible for making contributions to
the content - Date
- A date associated with an event in the life cycle
- Type
- The nature or genre of the content
- Format
- The physical or digital manifestation
- Identifier
- An unambiguous reference to the resource in a
context - Eg a URI
41Dublin Core Properties
- Source
- A reference to a resource from which the current
resource is derived - Language
- A language of the content
- Relation
- A reference to a related resource
- Coverage
- Extent or scope of the content
- Rights
- Information about rights held in/over the resource
42What is an Ontology?
- Philosophy
- A systematic account of Existence
- AI
- A Knowledge-Based System
- Definition
- An explicit formal specification of how to
represent the objects, concepts and other
entities that are assumed to exist in some area
of interest and the relationships that hold among
them. - Purpose
- Enabling knowledge sharing and reuse
43Why Web Ontology?
- To solve the problem Is my title the same as
your Title ? - Explicitly represent
- the meaning of terms in vocabularies
- the relationships between those terms
- Knowledge management in large/distributed
organisations - Explicitly represent semantics of semi-structured
information - Support for Information
- Acquisition
- Maintenance
- Access
44Ontology Layer
- More meta information, such as
- Transitive property
- Unique, Unambiguous, Cardinality, etc
- Ontology definition languages
- DL
- OIL
- SHOE
- OWL
- etc.
- Huge extra usage for extra functionality
- Not Turing complete or tractable
- Wide interoperability interconversion required
45OWL
- Web Ontology Language
- Intended to extend RDF/Schema
- to permit more powerful reasoning about resources
- Three sub-languages
- OWL Lite
- OWL DL
- OWL Full
- Only OWL Full is actually a superset of RDF/Schema
46OWL Full
- Permits maximum expressiveness
- For example handles classes of classes
- Reasoning in OWL Full is undecidable
- No complete reasoning system can be built
47OWL DL
- Computational completeness
- All conclusions can be computed
- Decidability
- Computations will finish in finite time
- Some limits on expressivity
- Based on Description Logics
- Restrictions from OWL Full
- For example, a class cannot be an instance of
another class
48OWL Lite
- Lightweight reasoning
- Hence easy to support
- Efficient
- Supports
- Classification hierarchy
- Simple constraints
- Eg cardinality constraints
- But only 0 or 1
49Search in the Future Semantic Web
- Example try to find a medical specialist by
invoking a software agent - Agent needs to take into account
- Potential providers
- Insurance coverages
- Location maps
- Your schedule
- How can Google do it for you?
50What are Agents?
- Common sense defintion
- Agent does something on behalf of another
- Real Estate agent
- Flight agent
- Computer system
- variety of definitions
- "Autonomous agents are computational systems that
inhabit some complex dynamic environment, sense
and act autonomously in this environment, and by
doing so realize a set of goals or tasks for
which they are designed. - Maes 1995
- "Intelligent agents are software entities that
carry out some set of operations on behalf of a
user or another program with some degree of
independence or autonomy, and in so doing, employ
some knowledge or representation of the user's
goals or desires." - IBM
51Characteristics of Agents
- The following basic characteristics differ agents
from normal programs (Most agents dont have all
of them) - Autonomous - Requiring limited outside direction
- Reactive - sensing , acting
- Goal-oriented - pro-active, purposeful
- Communicative - social
- Learning - adaptive
- Persistent - continuous
- Personality
52Categories of Internet Agents
- Web Search Agents
- MetaCrawler
- Information-filtering Agents
- NewsPage Direct
- Off-line Delivery Agents
- PointCast Network
- Notification Agents
- BotSpot
- Negotiation Agents
- Kasbah
- AuctionBot
- eMediator
- Many others Job Agents, Book Agents, .
53Questions?