Title: Finding and Ranking Knowledge on the Semantic Web
1Finding and RankingKnowledge on theSemantic Web
- Li Ding, Rong Pan, Tim Finin, Anupam Joshi, Yun
Peng and Pranam Kolari - University of Maryland, Baltimore County
? http//creativecommons.org/licenses/by-nc-sa/2.0
/ This work was partially supported by DARPA
contract F30602-97-1-0215, NSFgrants CCR007080
and IIS9875433 and grants from IBM, Fujitsu and
HP.
2This talk
- Motivation
- Swoogle overview
- Bots navigate the Semantic Web
- Ranking Semantic Web content
- Use cases and applications
- Conclusions
3Google has made us smarter
4But what about our agents?
- A Google for knowledge on the SemanticWeb is
needed by people and software agents
5This talk
- Motivation
- Swoogle overview
- Bots navigate the Semantic Web
- Ranking Semantic Web content
- Use cases and applications
- Conclusions
6title
7Swoogle Architecture
data analysis
interface
IR analyzer
SWD analyzer
Web Server
Web Service
SWD Metadata
SWD Cache
metadata creation
Agent Service
SWD Reader
SWD discovery
The Web
Candidate URLs
Web Crawler
Swoogle 2 340K SWDs, 48M triples, 5K SWOs, 97K
classes, 55K properties, 7M individuals
(4/05) Swoogle 3 700K SWDs, 135M triples, 7.7K
SWOs, (11/05)
8Find Time Ontology
Demo1
We can use a set of keywords to search ontology.
For example, time, before, after are basic
concepts for a Time ontology.
9Digest Time Ontology (document view)
Demo2(a)
10Digest Time Ontology (term view)
Demo2(b)
TimeZone
before
.
intAfter
11Find Term Person
Demo3
Not capitalized! URIref is case sensitive!
12Digest Term Person
Demo4
167 different properties
562 different properties
13Demo5(a)
Swoogle Today
14Demo5(b)
Swoogle Statistics
FOAF
Trustix
W3C
Stanford
15Swoogles Triple Store lets you shop
And check out your triples into any of several
reasoners
16Summary
2004
- Automated SWD discovery
- SWD metadata creation and search
- Ontology rank (rational surfer model)
- Swoogle watch
- Web Interface
Swoogle (Mar, 2004)
- Ontology dictionary
- Swoogle statistics
- Web service interface (WSDL)
- Bag of URIref IR search
- Triple shopping cart
Swoogle2 (Sep, 2004)
- Better (re-)crawling strategies
- Better navigation models
- Index instance data
- More metadata (ontology mapping and OWL-S
services) - Better web service interfaces
- IR component for string literals
2005
Swoogle3 (July 2005)
17This talk
- Motivation
- Swoogle overview
- Bots navigate the Semantic Web
- Ranking Semantic Web content
- Use cases and applications
- Conclusions
18The Semantic Web Onion
Universal RDF Graph
The Semantic Web (About 10M documents)
Physically hosting knowledge (About 100 triples
per SWD in average)
RDF Document
Class-instance
triples modifying the same subject
Molecule
Finest lossless set of triples
Triple
Atomic knowledge block
Swoogle maintains metadata about objects in
different layers of the Semantic Web Onion.
19Semantic Web Navigation Model
Navigating the HTML web is simple theres just
one kind of link. The SW has more kinds of links
and hence more navigation paths.
20Semantic Web Navigation Model
sameNamespace, sameLocalname Extends
class-property bond
Term Search
1
RDF graph
Resource
SWT
literal
uses populates
2
5
4
defines
3
officialOnto isDefinedBy
isUsedBy isPopulatedBy
Web
rdfssubClassOf
SWD
SWO
6
7
rdfsseeAlso rdfsisDefinedBy
owlimports
Document Search
Relations in 1 and 3 and parts of 4 require a
global view to discover
21An Example
http//xmlns.com/foaf/0.1/index.rdf
http//xmlns.com/foaf/0.1/index.rdf
http//www.w3.org/2002/07/owl
owlimports
owlClass
owlInverseFunctionalProperty
owlThing
rdftype
rdftype
rdftype
rdfsrange
foafPerson
foafAgent
foafmbox
rdfssubClassOf
rdfsdomain
http//www.cs.umbc.edu/finin/foaf.rdf
http//www.cs.umbc.edu/dingli1/foaf.rdf
rdftype
rdftype
foafPerson
foafPerson
foafmbox
rdfsseeAlso
mailtofinin_at_umbc.edu
http//www.cs.umbc.edu/finin/foaf.rdf
We navigate the Semantic Web via links in the
physical layer of RDF documents and also via
links in the logical layer defined by the
semantics of RDF and OWL.
22This talk
- Motivation
- Swoogle overview
- Bots navigate the Semantic Web
- Ranking Semantic Web content
- Use cases and applications
- Conclusions
23Rank has its privilege
- Google introduced a new approach to ranking query
results using a simple popularity metric. - It was a big improvement!
- Swoogle ranks its query results also
- When searching for an ontology, class or
property, wouldnt one want to see the most used
ones first? - Ranking SW content requires different algorithms
for different kinds of SW objects - For SWDs, SWTs, individuals, assertions,
molecules, etc
24Googles PageRank
- A pages rank is a function ofhow many links
point to it and the rank of the pages hosting
those links. - The random surfer model provides the intuition
- Jump to a random page
- Select and follow a random link on the page and
repeat until bored - If bored, go to (1)
- Ranked pages by the relative frequency with which
they are visited.
yes
no
25Ranking Semantic Web Documents
- Target a pure SW dataset
- Nodes a collection of online SWDs (330K SWDs,
1.5 are labeled as ontologies) - Links in addition to hyperlinks, term level
relations are generalized into TM, EX, IM. - Rational surfer model (extension of weighted
PageRank) - Semantic content (term level relations) encoded
into links - rank of node iteratively spread via links
- weight/capacity of link vary according to link
semantics - propagate weight to imported ontologies
- Evaluation
- Method Compare OntoRank with PageRank for
promoting ontologies even using the same Pure SW
Dataset
26An Example
http//www.w3.org/2000/01/rdf-schema
wPR 300
OntoRank 403
TM
http//xmlns.com/wordnet/1.6/
TM
wPR 3
OntoRank 103
EX
http//xmlns.com/foaf/1.0/
TM
wPR 100
OntoRank 100
http//www.cs.umbc.edu/finin/foaf.rdf
wPR 0.2
OntoRank 0.2
27Ontology Dictionary
- Motivation
- One ontology does not always provide all needed
vocabulary - There could be many scenario that requires
assembling terms from multiple ontologies - DIY ontology engineering
- Search an appropriate class C
- Search for popular properties used for modifying
Cs class instance - Go back to step 1 if more classes are needed
28Ranking Semantic Web Terms
- Pr(TermDoc) can be measured by the normalized
value of the product of the terms - Popularity how many SWDs is using the term.
- Frequency how many times the term is used in
the SWD - SWDs are accessed non-uniformly by OntoRank
- TermRank estimates a terms importance as
- ? Pr(TermDoc) OntoRank(Doc)
- Evaluation
- Compare TermRank with Terms popularity for the
top 10 highest rated terms and compose analytical
evaluation.
29Class-Property Bonds
- Class-Property Bond
- (introduced by ontology)
- foafmbox
- foafname
SWD1
foafmbox
- Class Definition
- rdfssubClassOf -- foafAgent
- rdfslabel Person
- Class-Property Bond
- (introduced by instances)
- foafname
- dctitle
foafname
rdfsdomain
rdfsdomain
SWD3
SWD2
rdftype
owlClass
rdftype
foafPerson
foafname
rdfssubClassOf
Tim Finin
foafAgent
dctitle
rdfscomment
Tims FOAF File
a human being
30This talk
- Motivation
- Swoogle overview
- Bots navigate the Semantic Web
- Ranking Semantic Web content
- Use cases and applications
- Conclusions
31Supporting Semantic Web Developers
- Finding SW content
- Ontologies, classes, properties, molecules,
triples, partial ontology mappings, authoritative
copies - Ad hoc data collection
- Exploring how the SW is being used, e.g.
- Computing basic statistics
- Ranking properties used with foafperson
- And misused
- Finding common typos
32Applications and use cases
- Supporting Semantic Web developers, e.g.,
- Ontology designers
- Vocabulary discovery
- Whos using my ontologies or data?
- Etc.
- Searching specialized collections, e.g.,
- Proofs in Inference Web
- Text Meaning Representations of news stories in
SemNews - Supporting SW tools, e.g.,
- Discovering mappings between ontologies
33(No Transcript)
34(No Transcript)
35(No Transcript)
36This talk
- Motivation
- Swoogle overview
- Bots navigate the Semantic Web
- Ranking Semantic Web content
- Use cases and applications
- Conclusions
37Will it Scale? How?
- Heres a rough estimate of the data in RDF
documents on the semantic web based on Swoogles
crawling
We think Swoogles centralized approach can be
made to work for the next few years if not longer.
38How much reasoning?
- SwoogleN (Nlt3) does limited reasoning
- Its expensive
- Its not clear how much should be done
- More reasoning would benefit many use cases
- e.g., type hierarchy
- Recognizing specialized metadata
- E.g., that ontology A some maps terms from B to C
39Conclusion
- The web will contain the worlds knowledge in
forms accessible to people and computers - We need better ways to discover, index, search
and reason over SW knowledge - SW search engines address different tasks than
html search engines - So they require different techniques and APIs
- Swoogle like systems can help create consensus
ontologies and foster best practices
40For more information
http//ebiquity.umbc.edu/
Annotatedin OWL