Title: TAP
1TAP
- R.V.Guha, IBM Research
- Rob McCool, Stanford KSL
2TAP Context
- Islands of XML from disparate web services
- Example Tori Amos
- Up to consumer to put these chunks together
- Situation analogous to pre-web hypertext systems
and RDBMS today
3TAP Goal
- Create a coherent semantic web from disparate
chunks - Effectively make the web a giant distributed DB
- Why --- Bringing the Internet to programs
4TAP What We Do
- Inspired by DNS and early web --- simple
contracts, everything decentralized - Protocols to publish navigate
- a small simple set of publishing access
guidelines that knit together schematically
unified whole create - Bootstrapping Create comprehensive chunks of the
semantic web in a few areas - Applications Semantic Search, Internet Wet Lab
-
5 TAP Protocol GetData
- Simple API to navigate this web
- DNS GetHostByName(lthostgt) gt ip addr.
- TAP GetData(ltresourcegt, ltpropertygt) gt value
- GetData(ltTori Amosgt, birthplace) gt ltNewton, NCgt
- GetData(ltNewton, NCgt, temperature) gt 57 F
- GetData(ltNewton, NCgt, locatedIn) gt ltNorth
Carolinagt - Publisher exposes data as a graph via GetData
- Consumer uses GetData to navigate graph
- Key tech. issues Caching, Directories, Names
6The Name Problem
- We dont get nice sub-graphs
- like these, with easy to use
- assembly instructions
North Carolina
USA
Located in
Located in
Geo Almanac
City
instanceof
Weather channel
Newton, NC
temperature
Newton, NC
62 F
Newton, NC
CDNow
birthplace
Tori Amos
Under The Pink
Atlantic
Musician
publisher
Author
instanceof
Tori Amos
Author
instanceof
Date Of Birth
instanceof
publisher
People Magazine
8/22/63
Crucify
Music Album
EMI
7 We get a mess like this
USA
North Carolina
Located in
Located in
Geo Almanac
City
instanceof
USNC0491
Weather channel
NTNC
temperature
62 F
Newton,_NorthCar
CDNow
birthplace
328723677
Under The Pink
Atlantic
Musician
publisher
Author
instanceof
0,9855,109071,00
Author
instanceof
Date Of Birth
instanceof
publisher
8/22/63
Crucify
Music Album
EMI
People Magazine
8The Name Problem
- Names are crucial in information exchange
- 2 parties cannot exchange information about an
object without agreeing on how they are going to
refer to it - The Problem too many names to keep track off!
- No URN for ltNewton, NCgt or ltTori Amosgt
- Different sites have different names for the same
thing! - URN efforts to date largely failures
- Traditional Approach Name-Mapping tables
9 USA
North Carolina
USNC0491
Weather channel
Located in
temperature
62 F
Located in
Geo Almanac
City
instanceof
USNC0491
NTNC
NTNC
Calling program
328723677 lt-gt 0,9855,1 USNC0491 lt-gt NTNC lt-gt
. . .
Newton,_NorthCar
Newton,_Nor 0,9855,
328723677
birthplace
328723677
Under The Pink
Atlantic
Musician
publisher
Author
instanceof
0,9855,109071,00
Author
instanceof
Date Of Birth
instanceof
publisher
8/22/63
Crucify
Music Album
EMI
People Magazine
CDNow
10TAP Naming
- Reference by descriptions
- E.g., A Musician whose firstName is Tori and
whose lastName is Amos and whose - Names are degenerate descriptions
- AmznB000002UB2, CDNOW 328723677
- Description based name negotiation
- Core Insight
- Dont require globally unique names for
everything if we can describe things using a
starting vocabulary - Need a description language, starting vocabulary
and negotiation mechanism - Bootstrapping some shared meaning into more
shared meaning
11 The vision descriptions choreograph the
integration
North Carolina
USA
Located in
Weather channel
Geo Almanac
temperature
Located in
62 F
City
USNC0491
D1
instanceof
NTNC
D1
Calling program
D1 description of Newton, NC D2 description
of Tori Amos
Newton,_NorthCar
D1, D2
CDNow
D2
birthplace
Under The Pink
Atlantic
0,9855,109071,00
Musician
publisher
Author
328723677
instanceof
Author
instanceof
Date Of Birth
instanceof
publisher
8/22/63
Crucify
Music Album
EMI
People Magazine
12Description based References
- The core protocol GetData
- GetData(Resource Description, arc-label)
- GetData(ltTori Amosgt, birthplace)
- GetData(RDF Description of Tori Amos, birthplace)
- A form of loose coupling
- Handling Ambiguity, Failure to denote,
- The core contract
- Expose your data as a Graph
- Map incoming descriptions to nodes in your graph
- In return, your data is now integrated into the
global semantic web
13Infrastructure Kernel Vocabulary
- Provides vocabulary for descriptions
- Purpose is to provide the infrastructure for
constructing descriptions with which programs can
refer to things - A Musician whose firstName is Tori and whose
lastName is Amos and whose - It doesnt reside anywhere its a specification
14Applications
- Good infrastructures have waves of applications
- WWW home pages, portals, ecommerce,
- DNS email, telnet, ftp, gopher, WWW
- Semantic Search
- Adding Semantics to Search
- Crawl, grab, index model of search doesnt work
for dynamic web sites or web applications - Semantic based Search Augmentation enables search
to cover time sensitive data - Internet Wet Lab
15 Semantic Web Application Semantic Search
16 Search Augmentation Example
17How the Semantic Infrastructure gets
used in Semantic Search
KB
UDDI
Musician whose genre is ClassicalMusic, First
name is
Who has - concert dates? - discography? -
auctions? - bio? For musician whose
Search Front End
Yo Yo Ma
Caching Buffering
Concert Dates for Musician whose
Auctions for
Bio for
Discography for
AllMusic
TicketMaster
EBay
CDNow
18 TAP KBs for Semantic Search
- Large Knowledge Base of specific musicians,
cities, athletes, - Currently covers about 20 of search terms
- Built in a largely automated fashion
- Scrapers for free data sources
- Simple noun phrase analysis of news articles
- AP, Reuters,
- Scrapers for important sites to bootstrap
- KB also helps bootstrap the semantic web
19KB Coverage Today
- Music
- Musicians, instr., styles
- Movies
- Movies, actors, tv-shows
- Authors
- Top authors, classic books,
- Sports
- Athletes, sports, sports teams, equipment
- Autos
- Auto models, motorcycles, .
- Companies
- Fortune 500
- Home Appliances
- Types, brands
- Toys
- Types, brands
- Baby products
- Types, brands
- Places
- Countries, cities, tourist attractions,
- Consumer electronics
- Audio/Video, Communication
- Game consoles, titles,
- Health
- Diseases, Drugs,
20Semantic Site Search
- Semantic Search useful not just for internet wide
search, but also for site search - Same principles as internet-wide search
- KBs created for searching related individual
sites can be shared between sites - These KBs feed into global semantic web
- Example Semantic Search for www.w3.org
21TAP Appl Internet Wet Lab
- In many sciences, more data will be produced in
the next 2 years than exists today - Increasingly, research consists of writing
programs that mine this data - Data is isolated as islands in different labs
- Data from one lab not easily available to
programs in another lab - We want to use TAP to create a single virtual
net-wide database containing all this
experimental data - Example Clinical Trial Data
22TAP Organization
- TAP is a multi-organization research effort
- IBM, Stanford KSL, Stanford Logic Group, CMU
West, - KBs, source-code, etc. freely available (via BSD
license) - A number of new projects starting up places,
entertainment, - We invite you to join
- URL http//tap.stanford.edu/
23TAP Summary
- Small set of guidelines that create a coherent
semantic web out of disparate web services - Potential solution to naming problem
- Relevant to all web services
- Semantics Search Internet Wet Lab as driving
applications - TAP is a research project
- Lot of fundamental work remains to be done
- Everything freely available. We want you to join!
24Questions
25US State
USA
instanceof
City
Country
Geo Almanac
CDNow
North Carolina
Located in
People Magazine
instanceof
instanceof
Located in
Weather channel
temperature
62 F
Bg KB
Newton, NC
birthplace
Under The Pink
Tori Amos
Atlantic
Author
publisher
instanceof
Musician
Date Of Birth
Author
instanceof
Crucify
publisher
Music Album
8/22/63
EMI
instanceof
26 USA
North Carolina
Located in
Located in
Newton, NC
Weather channel
Geo Almanac
temperature
City
62 F
instanceof
Newton, NC
Newton, NC
CDNow
birthplace
Tori Amos
Under The Pink
Atlantic
Musician
publisher
Author
instanceof
Tori Amos
Author
instanceof
Date Of Birth
instanceof
publisher
People Magazine
8/22/63
Crucify
Music Album
EMI
27TAP Summary
- Focus is shifting from just storing and
retrieving data to exchanging data. XML provides
syntax. We need semantics - We need infrastructure layer for semantics
- Applications drive infrastructures. The driving
application for this layer is Semantics based
Search News Augmentation.
28What is an Internet Infrastructure Layer?
- There is a data structure, pieces of which are in
different places on the net - DNS Hash table of host names to ip addresses
accessed via GetHostByName - WWW Directed graph of documents accessed via
HTTP GET/POST - Infrastructure layer provides a set of standards
APIs to unify the different pieces so that a
client can pretend it is all local
29Application 2 RTA for news articles
30RTA for News Articles
Knowledge Base
Text analysis
Directory
SportsTeam_TexasRangers, AthleteRodriguez_Alex
Whose - team schedule? - posters? -
auctions? - bio?
Search/ Syndication Front End
News article
Team Schedule for team whose title
Auctions for
Poster for
Videos for
AllPosters
MLB.com
EBay
AOL Shopping