Title: Finding knowledge, data and answers on the Semantic Web
1Finding knowledge, data and answers on the
Semantic Web
- Tim Finin
- University of Maryland, Baltimore County
- http//ebiquity.umbc.edu/resource/html/id/223/
- Joint work with Li Ding, Anupam Joshi, Cynthia
Parr,Joel Sachs, Andriy Parafiynyk and Lushan Han
? http//creativecommons.org/licenses/by-nc-sa/2.0
/ This work was partially supported by DARPA
contract F30602-97-1-0215, NSF grants CCR007080
and IIS9875433
2This talk
- Motivation
- Semantic Web background
- Swoogle Semantic Websearch engine
- Use cases and applications
- Social Semantic Web
- Conclusions
3Google has made us smarter
4But what about our agents?
- Agents still have a very minimal understanding of
text and images.
5But what about our agents?
- A Google for knowledge on the Semantic Web is
needed by software agents and programs
6This talk
- Motivation
- Semantic Web background
- Swoogle Semantic Websearch engine
- Use cases and applications
- Social Semantic Web
- Conclusions
7Brief history of the Semantic Web
- Tim Berners-Lees original 1989 WWW proposal
described a web of relationships among
namedobjects unifying many info. management
tasks. - Guhas MCF (94)
- XMLMCFgtRDF (96)
- Semantic Web coined (97)
- RDFOOgtRDFS (99)
- RDFSKRgtDAMLOIL (00)
- W3Cs SW activity (01)
- W3Cs OWL (03)
- SPARQL (06)
- Rules, RDFa, .
- http//www.w3.org/History/1989/proposal.html
8Interest is high
- Interest in industry, government and VCs is high
- RDF is in Adobes products, Oracle 10g and 11g,
Microsoft Vista, and Yahoos food portal - Several high-visibility startups use RDF
- Joost (internet TV), Teranode (Bioinformatics),
Garlik (personal info monitoring) - And, if you want more evidence that interest is
high
91795
695CD Only
10What do we mean by Semantic Web
SemanticWeb
explicitsemantics
KR based
RDFOWL
11RDF is the first SW language
Graph
XML Encoding
RDF Data Model
ltrdfRDF ..gt lt.gt lt.gt lt/rdfRDFgt
Good For HumanViewing
Good for MachineProcessing
Triples
stmt(docInst, rdf_type, Document) stmt(personInst,
rdf_type, Person) stmt(inroomInst, rdf_type,
InRoom) stmt(personInst, holding,
docInst) stmt(inroomInst, person, personInst)
Good For Reasoning
- RDF is a simple language for building graph based
representations - Grounded in web standards
- With terms to support ontologies, description
logic, rules and much of first order logic
12IMHO
- Better NLP will help search engines, its a long
term, incremental project - We need an well-defined and extensible
representation system for explicit knowledge - It should be backed by open, non-proprietary
standards supported by industry, Government and
other interested parties - The W3C approach is not perfect
- But The perfect is the enemy of the good.
- Semantic Web vs. semantic web
13This talk
- Motivation
- Semantic Web background
- Swoogle Semantic Websearch engine
- Use cases and applications
- Social Semantic Web
- Conclusions
14- http//swoogle.umbc.edu/
- Running since summer 2004
- 2.1M RDF docs, 420M triples, 10K ontologies,15K
namespaces, 1.5M classes, 185K properties, 49M
instances, 800 registered users
15Swoogle Architecture
16A Hybrid Harvesting Framework
true
Swoogle Sample Dataset
Submissions pings
Inductive learner
would
Seeds R
Seeds M
Seeds H
RDF crawling
Bounded HTML crawling
Meta crawling
google
Google API call
crawl
crawl
the Web
17Performance Site Coverage
- SW06MAR - Basic statistics (Mar 31, 2006)
- 1.3M SWDs from 157K websites
- 268M triples
- 61K SWOs including gt10K in high quality
- 1.4M SWTs using 12K namespaces
- Significance
- Compare with existing works ( DAML crawler,
scutter ) - Compare SW06MAR with Googles estimated SWDs
SWDs per website
Website
18Performance crawlers contribution
- High SWD ratio 42 URLs are confirmed as SWD
- Consistent growth rate 3000 SWDs per day
- RDF crawler best harvesting method
- HTML crawler best accuracy
- Meta crawler best in detecting websites
of documents
19This talk
- Motivation
- Semantic Web background
- Swoogle Semantic Websearch engine
- Use cases and applications
- Social Semantic Web
- Conclusions
20Applications and use cases
- Supporting Semantic Web developers
- Ontology designers, vocabulary discovery, whos
using my ontologies or data?, use analysis,
errors, statistics, etc. - Searching specialized collections
- Spire aggregating observations and data from
biologists - InferenceWeb searching over and enhancing proofs
- SemNews Text Meaning of news stories
- Supporting SW tools
- Triple shop finding data for SPARQL queries
1
2
3
211
2280 ontologies were found that had these three
terms
By default, ontologies are ordered by their
popularity, but they can also be ordered by
recency or size.
Lets look at this one
23Basic Metadata hasDateDiscovered  2005-01-17
hasDatePing  2006-03-21 hasPingState
 PingModified type  SemanticWebDocument
isEmbedded  false hasGrammar  RDFXML
hasParseState  ParseSuccess hasDateLastmodified
 2005-04-29 hasDateCache  2006-03-21
hasEncoding  ISO-8859-1 hasLength  18K
hasCntTriple  311.00 hasOntoRatio  0.98
hasCntSwt  94.00 hasCntSwtDef  72.00
hasCntInstance  8.00
24Who uses this ontology and how do they access it?
25rdfsrange was used 41 times to assert a value.
owlObjectProperty was instantiated 28 times
timeCal defined once and used 24 times (e.g.,
as range)
26These are the namespaces this ontology uses.
Clicking on one shows all of the documents using
the namespace.
All of this is available in RDF form for the
agents among us.
27Heres what the agent sees. Note the swoogle and
wob (web of belief) ontologies.
28We can also search for terms (classes,
properties) like terms for person.
2910K terms associated with person! Ordered by
use.
Lets look at foafPersons metadata
30Metadata stored for a term is information about
its definition both what and by whom
3110K terms associated with person! Ordered by
use.
32How do other terms use foafPerson? 100 documents
assert that foafpublication is a property of a
foafPerson
3387K documents used foafgender with a foafPerson
instance as the subject
343K documents used dccreator with a foafPerson
instance as the object
35Swoogles archive saves every version of a SWD
its seen.
36(No Transcript)
372
- An NSF ITR collaborative project with
- University of Maryland, Baltimore County
- University of Maryland, College Park
- U. Of California, Davis
- Rocky Mountain Biological Laboratory
38An invasive species scenario
- Nile Tilapia fish have been found in a California
lake. - Can this invasive species thrive in this
environment? - If so, what will be the likelyconsequences for
theecology? - Sowe need to understandthe effects of
introducingthis fish into the food webof a
typical California lake
39Food Webs
- A food web models the trophic (feeding)
relationships between organisms in an ecology - Food web simulators are used to explore the
consequences of changes in the ecology, such as
the introduction or removal of a species - A locations food web is usually constructed from
studies of the frequencies of the species found
there and the known trophic relations among them. - Goal automatically construct a food web for a
new location using existing data and knowledge - ELVIS Ecosystem Location Visualization and
Information System
40East River Valley Trophic Web
http//www.foodwebs.org/
41Species List Constructor
- Click a county, get a species list
42The problem
- We have data on what species are known to be in
the location and can further restrict and fill in
with other ecological models - But we dont know which of these the Nile Tilapia
eats of who might eat it. - We can reason from taxonomic data (similar
species) and known natural history data (size,
mass, habitat, etc.) to fill in the gaps.
43(No Transcript)
44Food Web Constructor
- Predict food web links using database and
taxonomic reasoning.
In an new estuary, Nile Tilapia could compete
with ostracods (green) to eat algae. Predators
(red) and prey (blue) of ostracods may be affected
45Evidence Provider
46Status
- ELVIS (Ecosystem Location Visualization and
Information System) as an integrated set of web
services for constructing food webs for a given
location. - Background ontologies
- SpireEcoConcepts concepts and properties to
represent food webs, and ELVIS related tasks,
inputs and outputs - ETHAN (Evolutionary Trees and Natural History)
Concepts and properties for natural history
information on species derived from data in the
Animal diversity web and other taxonomic sources.
250K classes on plants and animals - Under development
- Connect to visualization software
- Connect to triple shop to discover more data
47Supporting SW Tools
3
- Semantic Web applications can access Swoogle
through a REST-based Web interface or via SQL. - Two examples
- A system to help scientists construct datasets
from RDF documents on the Web - Tools to manage Semantic Web data in Blogs and
other forms of social media
48UMBC Triple Shop
- http//sparql.cs.umbc.edu/
- Online SPARQL RDF query processing with several
interesting features - Automatically finds SWDs for give queries using
Swoogle backend database - Datasets, queries and results can be saved,
tagged, annotated, shared, searched for, etc. - RDF datasets as first class objects
- Can be stored on our server or downloaded
- Can be materialized in a database or(soon) as a
Jena model
49Whats SPARQL?
- SPARQL is the standard language ( protocol) for
querying RDF graphs - Think SQL for RDF
- PREFIX rdf lthttp//www.w3.org/1999/02/22-rdf-synt
ax-nsgt - PREFIX foaf lthttp//xmlns.com/foaf/0.1/gt
- SELECT ?person ?name ?email
- FROM lthttp//rdf.example.org/people.rdfgt
- WHERE ?person a foafPerson .
- ?person foafname ?name .
- OPTIONAL ?person foafmbox
?email . -
50The Fractal nature of SW systems
- A SPARQL endpoint can make any Web data source
look like a RDF graph that can be queried - Give a graph as a query, get a graph as a result
51Web-scale semantic web data access
data access service
the Web
agent
Index RDF data
ask (person)
Search vocabulary
Search URIrefs in SW vocabulary
inform (foafPerson)
Compose query
ask (?x rdftype foafPerson)
Search URLs in SWD index
Populate RDF database
inform (doc URLs)
Fetch docs
Query local RDF database
52Who knows Anupam Joshi? Show me their names,
email address and pictures
53The UMBC ebiquity site publishes lots of RDF
data, including FOAF profiles
54PREFIX foaf lthttp//xmlns.com/foaf/0.1/gt SELECT
DISTINCT ?p2name ?p2mbox ?p2pix FROM ??? WHERE
?p1 foafsurname "Joshi" . ?p1
foaffirstName Anupam" . ?p1
foafmbox ?p1mbox . ?p2
foafknows ?p3 . ?p3 foafmbox
?p1mbox . ?p2 foafname ?p2name
. ?p2 foafmbox ?p2mbox .
OPTIONAL ?p2 foafdepiction ?p2pix .
ORDER BY ?p2name
55(No Transcript)
56(No Transcript)
57(No Transcript)
58302 RDF documents were found that might have
useful data.
59Well select them all and add them to the current
dataset.
60Well run the query against this dataset to see
if the results are as expected.
61The results can be produced in any of several
formats
62(No Transcript)
63Looks like a useful dataset. Lets save it and
also materialize it the TS triple store.
An extension will let us ask that it be
automatically updated when constituents change
64(No Transcript)
65We can also annotate, save and share queries.
66This talk
- Motivation
- Semantic Web background
- Swoogle Semantic Websearch engine
- Use cases and applications
- Social Semantic Web
- Conclusions
67- Social media sites have become thebiggest source
of new content on the Web - Blogs, Wikis, Photo sites, forums, etc.
- Accounting for 1/3 of new Web content
68- Its a global phenomenon
- Japanese is now the mostcommon language
69- Social media sites have embraced newways of
letting users add semanticinformation - Showing users the potential of semantics
70Social Media and the Semantic Web
- Many are exploring how Semantic Web technology
can work with social media - Social media like blogs are typically temporally
organized - valued for their timely and dynamic information!
- If static pages form the Webs long term memory,
then the Blogosphere is its stream of
consciousness - Maybe we can (1) help people publish data in RDF
on their blogs and (2) mine social media sites
for useful information
71(No Transcript)
72(No Transcript)
73A good Semantic Web opportunity
- We want to make it easy for scientists to enter
and collect information from social media - Professionals, students and amateurs!
- Two early examples
- SPOTter a tool to add Semantic Web data to
blogs - Splickr a system to mine Flickr for images of
organisms
74SPOTter SPire Observation Tool
- Weve developed some simple components to help
people add RDF data to blogs and ping Swoogle to
get it indexed. - SPOTter is an initial prototype that uses the
ETHAN ontology and is being used in some BioBlitz
activities with students. - Were working toward a version that uses Twitter
so that people can make the blog entries from the
cell phones via SMS - The SPOTter agent will get the entries (via RSS)
and index the data
75SPOTter button
Once entered, the data isembedded into the blog
postand Swoogle is pinged to index it
76- We can draw a bounding box onThe map and find
observations - An RSS feed provided for eachquery
Prototype SPOTter Search engine
77Flickr
- The Flickr photo sharing site has millions of
photographs - Many of plants and animals
- Most of them have descriptions, timestamps, tags
and even geo-tags - Flickr has even introduced machine tags that
can be mapped into RDF - Any Flickr users (humans or bots) can add
comments and annotations - Theres a good API
- It could be a good source of ecological
information
78(No Transcript)
79(No Transcript)
80Results for people and machines
81This talk
- Motivation
- Semantic Web background
- Swoogle Semantic Websearch engine
- Use cases and applications
- Social Semantic Web
- Conclusions
82Conclusion
- The web will contain the worlds knowledge in
forms accessible to people and computers - We need better ways to discover, index, search
and reason over SW knowledge - SW search engines address different tasks than
html search engines - So they require different techniques and APIs
- Swoogle like systems can help create consensus
ontologies and foster best practices - Social media provide new challenges and
opportunities for the Semantic Web
83For more information
http//ebiquity.umbc.edu/
Annotatedin OWL