RDF as a Lingua Franca: Key Architectural Strategies - PowerPoint PPT Presentation

About This Presentation

Title:

RDF as a Lingua Franca: Key Architectural Strategies

Description:

Senior Software Architect, Cleveland Clinic's SemanticDB project ... Different components must be versioned in lock step. With monotonicity: ... – PowerPoint PPT presentation

Number of Views:82

Avg rating:3.0/5.0

Slides: 42

Provided by: netw91

Learn more at: http://dbooth.org

Category:

more less

Transcript and Presenter's Notes

Title: RDF as a Lingua Franca: Key Architectural Strategies

1
RDF as a Lingua Franca Key Architectural
Strategies

David Booth, Ph.D.
Cleveland Clinic (contractor)
Semantic Technology Conference
15-June-2009
Latest version of these slides
http//dbooth.org/2009/stc/

2
About the speaker

Senior Software Architect, Cleveland Clinic's
SemanticDB project
Senior research architect, HP Software
W3C GRDDL standard
W3C Fellow 2002-2005
W3C Web Services Architecture document
W3C WSDL 2.0 standard
ATT Bell Labs
Ph.D. Computer Science, UCLA

3
Outline

Part 1 The Problem
Babelization
SOA and RDF
Part 2 Architectural Strategies
RDF message semantics
GRDDL transformations from XML to RDF
REST-based SPARQL endpoints
Semantic Data Federation
Named graphs
Monotonicity
Part 3 Example Cleveland Clinic SemanticDB

4
PART 1The Problem
5
Problem 1 Babelization

Proliferation of data models (XML schemas, etc.)
Parsing issues influence data models
No consistent semantics
Data chaos

Tower of Babel, Abel Grimmer (1570-1619)
6
Problem 2 Integration complexity

Many data producers, many data consumers
Producers and consumers interact in complex ways
Tight coupling hampers independent versioning . .
.

7
Problem 3 Client/service versioning

Need to version clients and services
independently
Data models evolve
No such thing as the data model
There are several, slightly different but related
models

8
RDF and SOA

RDF can help
Bridge vocabularies / data formats
Looser data coupling
Consistent semantics across applications
SOA can help
Looser process coupling
How?

9
PART 2Architectural Strategies
10
1. RDF message semantics

Interface contract can specify RDF, regardless of
serialization
RDF pins the semantics

11
But Web services use XML!

XML is well known and used
Existing apps may require specific XML or other
formats that cannot be changed
How can we gain the benefits of RDF message
semantics while still accommodating XML?

12
Custom XML serializations of RDF

Recall RDF is syntax independent
Specifies info model -- not syntax!
Can be serialized in any agreed-upon way
Therefore
Can view existing XML formats as custom
serialization of RDF!
How? GRDDL . . .

13
What is GRDDL?

"Gleaning Resource Descriptions from Dialects of
Languages"
W3C standard
Permits RDF to be "gleaned" from XML
XML document or schema specifies GRDDL
transformation
GRDDL transformation produces RDF from XML
document
Transformation is typically written in XSLT

14
2. GRDDL transformations from XML to RDF

Therefore
Same XML document can be consumed by
Legacy XML app
RDF app
App interface contract can specify RDF
Serializations can vary
Semantics are pinned by RDF
Helps bridge XML and RDF worlds

15
Bridging XML and RDF
Service
Normalizeto RDF
XML/other
Core AppProcessing
Client
Serialize asXML/other/RDF

Input Accept whatever formats are required
Use GRDDL to transform XML to RDF
Output Serialize to whatever formats are
required
Generate XML/other directly (or even RDF!), or
SPARQL query can generate specific view first

16
3. REST-based SPARQL endpoints
HTTP
RDF
SPARQL
Consumer
Producer

Why REST and why SPARQL?

17
What is REST?

REST Representational State Transfer
Architectural style
Identified by Roy Fielding in PhD thesis
Based on uniform interface
HTTP GET, PUT, POST, DELETE

18
Why REST?

HTTP is ubiquitous
Simpler than SOAP-based Web services (WS)
Looser process coupling
Easier to change/version the process flow

19
What is SPARQL?

W3C standard
Query language for RDF
Modeled after SQL
SELECT ...
WHERE ...

20
Why SPARQL?

RDF gives looser data coupling
Insulates consumers from internal model changes
Inferencing can transform data to consumer's
desired model
One endpoint supports multiple consumer needs
Each consumer gets what it wants
Simpler interface for consumers
Uniform SPARQL interface instead of a different
set of parameters for each REST endpoint

21
4. Semantic Data Federation
A1
X
A2
A3
SPARQL
Adapters
SemanticDataFederation
B1
B2
C1
C2
Z

Get data from multiple sources
Provide data to consumers
Model transformation, caching, etc.
Conceptual component -- not necessarily a
separate service

22
Key features of semantic data federation

REST-based SPARQL endpoint
Client gets just the data it wants
Support for a variety of data sources
E.g., SQL, SPARQL(!), etc.
Easy to add a new data source adapter, e.g., HTTP
Caching
Not multiple masters
Inferencing
Provides loose coupling at both data and process
levels

23
Why inferencing?

Allows new data sources to be more readily
connected to existing data
Allows new output vocabularies to be more readily
supported in response to client needs
Easier versioning with both clients and data
sources
Inferencing can help bridge across versions

24
Data source adapters
SemanticDataFederation
SPARQL
Adapters

Responsible for
Mechanics of getting the data
Transforming from native format to RDF
May involve custom code or reusable tools
E.g., Gloze performs XMLlt--gtRDF lift/drop

25
Add a new data source
Semantic Data Federation
SPARQL
Adapter
Adapter
DataSource

Strategy
Adapter transforms native format to corresponding
RDF
Not directly to hub ontology!
Bridging rules transform to hub ontology

26
Adding a new output vocabulary
Semantic Data Federation
SPARQL
Adapter
DataSource
Client

Strategy
Bridging rules transform from hub ontologies to
new output vocabulary
Client can query using desired vocabulary

27
5. Named graphs

Different queries require different subsets of
data
Entire data may be too big to process all at once
So . . .
Sets of RDF data can be bundled as named graphs
Query strategy can pull in only the named graphs
that are needed, i.e., a working set
Graphs can be freely merged
Contents can overlap

28
Using named graphs for data subsets

Examples
Specific longitudinal data across patients
Detailed data for each surgical event
Data on a particular group of patients

29
6. Monotonicity

Monotonicity Old conclusions remain true when
new facts are added
System design choice not automatic
Without monotonicity
Data change invalidates everything downstream
System is more tightly coupled
Different components must be versioned in lock
step
With monotonicity
New data can be added freely
Easier versioning
More robust

30
Monotonicity is valuable, but not free!

Data models can be simpler without monotonicity
Engineering trade-off
Non-monotonic design
Patient123 highBloodPressure true
Monotonic design
Patient123 highBloodPressure true at 1222PM
23-Aug-2007
Patient123 highBloodPressure false at 0405PM
24-Aug-2007
How to get the best of both worlds?

31
Distilling data to simplify queries

Detailed raw data can be distilled into simpler
assertion sets
Easier for specific queries
Example raw data
Patient123 BP 150/96 at 1222PM 23-Aug-2007
Patient123 BP 155/97 at 0632PM 23-Aug-2007
Patient123 BP 155/97 at 0632PM 23-Aug-2007
Distilled for 23-Aug-2007
Patient123 highBloodPressure true
Meaning Patient123 had high blood pressure at
some time

32
Using named graphs for distilled data

Distilled data
Easier for specific queries
Less general than raw data
May involve information loss
Named graph can act as context
Semantics are qualified (or loosened)
E.g. Named graph for 23-Aug-2007 indicates
Patient123 had high blood pressure at some time
SPARQL update language (SPARUL) will make named
graphs easy to create from queries
Raw data should also be kept (in separate named
graphs)

33
Adding named graphs for distilled data
Named graphsof distilled data
Raw data

Is obese
Had high blood pressure prior to admission
Has condition X

34
Abandoning unneeded named graphs
Named graphsof distilled data
Raw data

Unneeded named graphs can be ignored
And eventually discarded

35
Summary of monotonicity strategy

Don't change data!
Create new named graphs instead
Use named graphs to compartmentalize data
But if you must change data
Use named graphs to limit downstream impact
Only regenerate those that are affected
Retain both raw data and distilled data (in
separate named graphs)

36
Summary of architectural strategies

RDF message semantics
GRDDL transformations from XML to RDF
REST-based SPARQL endpoints
Semantic Data Federation
Named graphs
Monotonicity

37
PART 3Example Cleveland Clinic SemanticDB
38
SemanticDB Project

Applies semantic web technology to
Clinical research
Outcomes reporting
Quality reporting
Sponsored by Cleveland Clinic's Heart and
Vascular Institute

39
Cleveland Clinic SemanticDB Project
User interfaces
Ontologies
Cyc natural language processing
Patient Data Entry
Natural languagequery
SPARQL interface
Structured query
Semantic wiki
Data-source adaptors
Instance data
Patient-centric systems
Patientregistry
Geneticpatientregistry
Tagged literature, e.g., PUBMED
. . .
. . .
40
More information

Cleveland Clinic SemanticDB projecthttp//www.w3
.org/2001/sw/sweo/public/UseCases/ClevelandClinic/
RDF and SOAhttp//dbooth.org/2007/rdf-and-soa/
SPARQLhttp//jena.sourceforge.net/ARQ/Tutorial/
GRDDLhttp//www.w3.org/TR/grddl-primer/

41
Questions?

Write a Comment

User Comments (0)