Finding knowledge, data and answers on the Semantic Web - PowerPoint PPT Presentation

1 / 60

About This Presentation

Title:

Finding knowledge, data and answers on the Semantic Web

Description:

Ontology designers, vocabulary discovery, who's using my ... Searching specialized collections. Spire: aggregating observations and data from biologists ... – PowerPoint PPT presentation

Number of Views:94

Avg rating:3.0/5.0

Slides: 61

Provided by: ebiqui

Category:

more less

Transcript and Presenter's Notes

Title: Finding knowledge, data and answers on the Semantic Web

1
Finding knowledge, data and answers on the
Semantic Web

Tim Finin
University of Maryland, Baltimore County
http//ebiquity.umbc.edu/resource/html/id/202/
Joint work with Li Ding, Anupam Joshi, Yun Peng,
Cynthia Parr, Pranam Kolari, Pavan Reddivari,
Sandor Dornbush, Rong Pan, Akshay Java, Joel
Sachs, Scott Cost and Vishal Doshi

? http//creativecommons.org/licenses/by-nc-sa/2.0
/ This work was partially supported by DARPA
contract F30602-97-1-0215, NSF grants CCR007080
and IIS9875433 and grants from IBM, Fujitsu and
HP.
2
This talk

Motivation
Swoogle Semantic Websearch engine
Use cases and applications
Observations
Conclusions

3
Google has made us smarter
4
But what about our agents?

Agents still have a very minimal understanding of
text and images.

5
But what about our agents?

A Google for knowledge on the Semantic Web is
needed by software agents and programs

6
This talk

Motivation
Swoogle Semantic Websearch engine
Use cases and applications
Observations
Conclusions

http//swoogle.umbc.edu/
Running since summer 2004
1.8M RDF docs, 320M triples, 10K ontologies,15K
namespaces, 1.3M classes, 175K properties, 43M
instances, 600 registered users

8
Swoogle Architecture
9
A Hybrid Harvesting Framework
true
Swoogle Sample Dataset
Manual submission
Inductive learner
would
Seeds R
Seeds M
Seeds H
RDF crawling
Bounded HTML crawling
Meta crawling
google
Google API call
crawl
crawl
the Web
10
Performance Site Coverage

SW06MAR - Basic statistics (Mar 31, 2006)
1.3M SWDs from 157K websites
268M triples
61K SWOs including gt10K in high quality
1.4M SWTs using 12K namespaces
Significance
Compare with existing works ( DAML crawler,
scutter )
Compare SW06MAR with Googles estimated SWDs

SWDs per website
Website
11
Performance crawlers contribution

High SWD ratio 42 URLs are confirmed as SWD
Consistent growth rate 3000 SWDs per day
RDF crawler best harvesting method
HTML crawler best accuracy
Meta crawler best in detecting websites

of documents
12
This talk

Motivation
Swoogle Semantic Websearch engine
Use cases and applications
Observations
Conclusions

13
Applications and use cases

Supporting Semantic Web developers
Ontology designers, vocabulary discovery, whos
using my ontologies or data?, use analysis,
errors, statistics, etc.
Searching specialized collections
Spire aggregating observations and data from
biologists
InferenceWeb searching over and enhancing proofs
SemNews Text Meaning of news stories
Supporting SW tools
Triple shop finding data for SPARQL queries

1
2
3
14
1
15
80 ontologies were found that had these three
terms
By default, ontologies are ordered by their
popularity, but they can also be ordered by
recency or size.
Lets look at this one
16
Basic Metadata hasDateDiscovered 2005-01-17
hasDatePing 2006-03-21 hasPingState
PingModified type SemanticWebDocument
isEmbedded false hasGrammar RDFXML
hasParseState ParseSuccess hasDateLastmodified
2005-04-29 hasDateCache 2006-03-21
hasEncoding ISO-8859-1 hasLength 18K
hasCntTriple 311.00 hasOntoRatio 0.98
hasCntSwt 94.00 hasCntSwtDef 72.00
hasCntInstance 8.00
17
(No Transcript)
18
rdfsrange was used 41 times to assert a value.
owlObjectProperty was instantiated 28 times
timeCal defined once and used 24 times (e.g.,
as range)
19
These are the namespaces this ontology uses.
Clicking on one shows all of the documents using
the namespace.
All of this is available in RDF form for the
agents among us.
20
Heres what the agent sees. Note the swoogle and
wob (web of belief) ontologies.
21
We can also search for terms (classes,
properties) like terms for person.
22
10K terms associated with person! Ordered by
use.
Lets look at foafPersons metadata
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
87K documents used foafgender with a foafPerson
instance as the subject
27
3K documents used dccreator with a foafPerson
instance as the object
28
Swoogles archive saves every version of a SWD
its seen.
29
(No Transcript)
30
2

An NSF ITR collaborative project with
University of Maryland, Baltimore County
University of Maryland, College Park
U. Of California, Davis
Rocky Mountain Biological Laboratory

31
An invasive species scenario

Nile Tilapia fish have been found in a California
lake.
Can this invasive species thrive in this
environment?
If so, what will be the likelyconsequences for
theecology?
Sowe need to understandthe effects of
introducingthis fish into the food webof a
typical California lake

32
Food Webs

A food web models the trophic (feeding)
relationships between organisms in an ecology
Food web simulators are used to explore the
consequences of changes in the ecology, such as
the introduction or removal of a species
A locations food web is usually constructed from
studies of the frequencies of the species found
there and the known trophic relations among them.
Goal automatically construct a food web for a
new location using existing data and knowledge
ELVIS Ecosystem Location Visualization and
Information System

33
East River Valley Trophic Web
http//www.foodwebs.org/
34
Species List Constructor

Click a county, get a species list

35
The problem

We have data on what species are known to be in
the location and can further restrict and fill in
with other ecological models
But we dont know which of these the Nile Tilapia
eats of who might eat it.
We can reason from taxonomic data (simlar
species) and known natural history data (size,
mass, habitat, etc.) to fill in the gaps.

36
(No Transcript)
37
Food Web Constructor

Predict food web links using database and
taxonomic reasoning.

In an new estuary, Nile Tilapia could compete
with ostracods (green) to eat algae. Predators
(red) and prey (blue) of ostracods may be affected
38
Evidence Provider

Examine evidence for predicted links.

39
Status

Goal is ELVIS (Ecosystem Location Visualization
and Information System) as an integrated set of
web services for constructing food webs for a
given location.
Background ontologies
SpireEcoConcepts concepts and properties to
represent food webs, and ELVIS related tasks,
inputs and outputs
ETHAN (Evolutionary Trees and Natural History)
Concepts and properties for natural history
information on species derived from data in the
Animal diversity web and other taxonomic sources
Under development
Connect to visualization software
Connect to triple shop to discover more data

40
UMBC Triple Shop
3

http//sparql.cs.umbc.edu/
Online SPARQL RDF query processing with several
interesting features
Automatically finds SWDs for give queries using
Swoogle backend database
Datasets, queries and results can be saved,
tagged, annotated, shared, searched for, etc.
RDF datasets as first class objects
Can be stored on our server or downloaded
Can be materialized in a database or(soon) as a
Jena model

41
Web-scale semantic web data access
data access service
the Web
agent
Index RDF data
ask (person)
Search vocabulary
Search URIrefs in SW vocabulary
inform (foafPerson)
Compose query
ask (?x rdftype foafPerson)
Search URLs in SWD index
Populate RDF database
inform (doc URLs)
Fetch docs
Query local RDF database
42
Who knows Anupam Joshi? Show me their names,
email address and pictures
43
The UMBC ebiquity site publishes lots of RDF
data, including FOAF profiles
44
PREFIX foaf lthttp//xmlns.com/foaf/0.1/gt SELECT
DISTINCT ?p2name ?p2mbox ?p2pix FROM ??? WHERE
?p1 foafsurname "Joshi" . ?p1
foaffirstName Anupam" . ?p1
foafmbox ?p1mbox . ?p2
foafknows ?p3 . ?p3 foafmbox
?p1mbox . ?p2 foafname ?p2name
. ?p2 foafmbox ?p2mbox .
OPTIONAL ?p2 foafdepiction ?p2pix .
ORDER BY ?p2name
45
(No Transcript)
46
(No Transcript)
47
(No Transcript)
48
302 RDF documents were found that might have
useful data.
49
Well select them all and add them to the current
dataset.
50
Well run the query against this dataset to see
if the results are as expected.
51
The results can be produced in any of several
formats
52
(No Transcript)
53
Looks like a useful dataset. Lets save it and
also materialize it the TS triple store.
54
(No Transcript)
55
We can also annotate, save and share queries.
56
Work in Progress

There are a host of performance issues
We plan on supporting some special datasets,
e.g.,
FOAF data collected from Swoogle
Definitions of RDF and OWL classes and properties
from all ontologies that Swoogle has discovered
Expanding constraints to select candidate SWDs to
include arbitrary metadata and embedded queries
FROM documents trusted by a member of the SPIRE
project
We will explore two models for making this useful
As a downloadable application for client machines
As an (open source?) downloadable service for
servers supporting a community of users.

57
This talk

Motivation
Swoogle Semantic Websearch engine
Use cases and applications
Observations
Conclusions

58
Will Swoogle Scale? How?

Heres a rough estimate of the data in RDF
documents on the semantic web based on Swoogles
crawling

We think Swoogles centralized approach can be
made to work for the next few years if not longer.
59
How much reasoning should Swoogle do?

SwoogleN (Nlt3) does limited reasoning
Its expensive
Its not clear how much should be done
More reasoning would benefit many use cases
e.g., type hierarchy
Recognizing specialized metadata
E.g., that ontology A some maps terms from B to C

60
A RDF Dictionary

We hope to develop an RDF dictionary.
Given an RDF term, returns a graph of its
definiton
Term ? definition from official ontology
TermURL ? definition from SWD at URL
Term ? union definition
Optional argument recursively adds definitions of
terms in definition excluding RDFS and OWL terms
Optional arguments identifies more namespaces to
exclude

61
This talk

Motivation
Swoogle Semantic Websearch engine
Use cases and applications
Observations
Conclusions

62
Conclusion

The web will contain the worlds knowledge in
forms accessible to people and computers
We need better ways to discover, index, search
and reason over SW knowledge
SW search engines address different tasks than
html search engines
So they require different techniques and APIs
Swoogle like systems can help create consensus
ontologies and foster best practices
Swoogle is for Semantic Web 1.0
Semantic Web 2.0 will make different demands

63
For more information
http//ebiquity.umbc.edu/
Annotatedin OWL

Write a Comment

User Comments (0)