RDF as a Lingua Franca: Key Architectural Strategies - PowerPoint PPT Presentation

About This Presentation
Title:

RDF as a Lingua Franca: Key Architectural Strategies

Description:

Senior Software Architect, Cleveland Clinic's SemanticDB project ... Different components must be versioned in lock step. With monotonicity: ... – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 42
Provided by: netw91
Learn more at: http://dbooth.org
Category:

less

Transcript and Presenter's Notes

Title: RDF as a Lingua Franca: Key Architectural Strategies


1
RDF as a Lingua Franca Key Architectural
Strategies
  • David Booth, Ph.D.
  • Cleveland Clinic (contractor)
  • Semantic Technology Conference
  • 15-June-2009
  • Latest version of these slides
  • http//dbooth.org/2009/stc/

2
About the speaker
  • Senior Software Architect, Cleveland Clinic's
    SemanticDB project
  • Senior research architect, HP Software
  • W3C GRDDL standard
  • W3C Fellow 2002-2005
  • W3C Web Services Architecture document
  • W3C WSDL 2.0 standard
  • ATT Bell Labs
  • Ph.D. Computer Science, UCLA

3
Outline
  • Part 1 The Problem
  • Babelization
  • SOA and RDF
  • Part 2 Architectural Strategies
  • RDF message semantics
  • GRDDL transformations from XML to RDF
  • REST-based SPARQL endpoints
  • Semantic Data Federation
  • Named graphs
  • Monotonicity
  • Part 3 Example Cleveland Clinic SemanticDB

4
PART 1The Problem
5
Problem 1 Babelization
  • Proliferation of data models (XML schemas, etc.)
  • Parsing issues influence data models
  • No consistent semantics
  • Data chaos

Tower of Babel, Abel Grimmer (1570-1619)
6
Problem 2 Integration complexity
  • Many data producers, many data consumers
  • Producers and consumers interact in complex ways
  • Tight coupling hampers independent versioning . .
    .

7
Problem 3 Client/service versioning
  • Need to version clients and services
    independently
  • Data models evolve
  • No such thing as the data model
  • There are several, slightly different but related
    models

8
RDF and SOA
  • RDF can help
  • Bridge vocabularies / data formats
  • Looser data coupling
  • Consistent semantics across applications
  • SOA can help
  • Looser process coupling
  • How?

9
PART 2Architectural Strategies
10
1. RDF message semantics
  • Interface contract can specify RDF, regardless of
    serialization
  • RDF pins the semantics

11
But Web services use XML!
  • XML is well known and used
  • Existing apps may require specific XML or other
    formats that cannot be changed
  • How can we gain the benefits of RDF message
    semantics while still accommodating XML?

12
Custom XML serializations of RDF
  • Recall RDF is syntax independent
  • Specifies info model -- not syntax!
  • Can be serialized in any agreed-upon way
  • Therefore
  • Can view existing XML formats as custom
    serialization of RDF!
  • How? GRDDL . . .

13
What is GRDDL?
  • "Gleaning Resource Descriptions from Dialects of
    Languages"
  • W3C standard
  • Permits RDF to be "gleaned" from XML
  • XML document or schema specifies GRDDL
    transformation
  • GRDDL transformation produces RDF from XML
    document
  • Transformation is typically written in XSLT

14
2. GRDDL transformations from XML to RDF
  • Therefore
  • Same XML document can be consumed by
  • Legacy XML app
  • RDF app
  • App interface contract can specify RDF
  • Serializations can vary
  • Semantics are pinned by RDF
  • Helps bridge XML and RDF worlds

15
Bridging XML and RDF
Service
Normalizeto RDF
XML/other
Core AppProcessing
Client
Serialize asXML/other/RDF
  • Input Accept whatever formats are required
  • Use GRDDL to transform XML to RDF
  • Output Serialize to whatever formats are
    required
  • Generate XML/other directly (or even RDF!), or
  • SPARQL query can generate specific view first

16
3. REST-based SPARQL endpoints
HTTP
RDF
SPARQL
Consumer
Producer
  • Why REST and why SPARQL?

17
What is REST?
  • REST Representational State Transfer
  • Architectural style
  • Identified by Roy Fielding in PhD thesis
  • Based on uniform interface
  • HTTP GET, PUT, POST, DELETE

18
Why REST?
  • HTTP is ubiquitous
  • Simpler than SOAP-based Web services (WS)
  • Looser process coupling
  • Easier to change/version the process flow

19
What is SPARQL?
  • W3C standard
  • Query language for RDF
  • Modeled after SQL
  • SELECT ...
  • WHERE ...

20
Why SPARQL?
  • RDF gives looser data coupling
  • Insulates consumers from internal model changes
  • Inferencing can transform data to consumer's
    desired model
  • One endpoint supports multiple consumer needs
  • Each consumer gets what it wants
  • Simpler interface for consumers
  • Uniform SPARQL interface instead of a different
    set of parameters for each REST endpoint

21
4. Semantic Data Federation
A1
X
A2
A3
SPARQL
Adapters
SemanticDataFederation
B1
B2
C1
C2
Z
  • Get data from multiple sources
  • Provide data to consumers
  • Model transformation, caching, etc.
  • Conceptual component -- not necessarily a
    separate service

22
Key features of semantic data federation
  • REST-based SPARQL endpoint
  • Client gets just the data it wants
  • Support for a variety of data sources
  • E.g., SQL, SPARQL(!), etc.
  • Easy to add a new data source adapter, e.g., HTTP
  • Caching
  • Not multiple masters
  • Inferencing
  • Provides loose coupling at both data and process
    levels

23
Why inferencing?
  • Allows new data sources to be more readily
    connected to existing data
  • Allows new output vocabularies to be more readily
    supported in response to client needs
  • Easier versioning with both clients and data
    sources
  • Inferencing can help bridge across versions

24
Data source adapters
SemanticDataFederation
SPARQL
Adapters
  • Responsible for
  • Mechanics of getting the data
  • Transforming from native format to RDF
  • May involve custom code or reusable tools
  • E.g., Gloze performs XMLlt--gtRDF lift/drop

25
Add a new data source
Semantic Data Federation
SPARQL
Adapter
Adapter
DataSource
  • Strategy
  • Adapter transforms native format to corresponding
    RDF
  • Not directly to hub ontology!
  • Bridging rules transform to hub ontology

26
Adding a new output vocabulary
Semantic Data Federation
SPARQL
Adapter
DataSource
Client
  • Strategy
  • Bridging rules transform from hub ontologies to
    new output vocabulary
  • Client can query using desired vocabulary

27
5. Named graphs
  • Different queries require different subsets of
    data
  • Entire data may be too big to process all at once
  • So . . .
  • Sets of RDF data can be bundled as named graphs
  • Query strategy can pull in only the named graphs
    that are needed, i.e., a working set
  • Graphs can be freely merged
  • Contents can overlap

28
Using named graphs for data subsets
  • Examples
  • Specific longitudinal data across patients
  • Detailed data for each surgical event
  • Data on a particular group of patients

29
6. Monotonicity
  • Monotonicity Old conclusions remain true when
    new facts are added
  • System design choice not automatic
  • Without monotonicity
  • Data change invalidates everything downstream
  • System is more tightly coupled
  • Different components must be versioned in lock
    step
  • With monotonicity
  • New data can be added freely
  • Easier versioning
  • More robust

30
Monotonicity is valuable, but not free!
  • Data models can be simpler without monotonicity
  • Engineering trade-off
  • Non-monotonic design
  • Patient123 highBloodPressure true
  • Monotonic design
  • Patient123 highBloodPressure true at 1222PM
    23-Aug-2007
  • Patient123 highBloodPressure false at 0405PM
    24-Aug-2007
  • How to get the best of both worlds?

31
Distilling data to simplify queries
  • Detailed raw data can be distilled into simpler
    assertion sets
  • Easier for specific queries
  • Example raw data
  • Patient123 BP 150/96 at 1222PM 23-Aug-2007
  • Patient123 BP 155/97 at 0632PM 23-Aug-2007
  • Patient123 BP 155/97 at 0632PM 23-Aug-2007
  • Distilled for 23-Aug-2007
  • Patient123 highBloodPressure true
  • Meaning Patient123 had high blood pressure at
    some time

32
Using named graphs for distilled data
  • Distilled data
  • Easier for specific queries
  • Less general than raw data
  • May involve information loss
  • Named graph can act as context
  • Semantics are qualified (or loosened)
  • E.g. Named graph for 23-Aug-2007 indicates
    Patient123 had high blood pressure at some time
  • SPARQL update language (SPARUL) will make named
    graphs easy to create from queries
  • Raw data should also be kept (in separate named
    graphs)

33
Adding named graphs for distilled data
Named graphsof distilled data
Raw data
  • Is obese
  • Had high blood pressure prior to admission
  • Has condition X

34
Abandoning unneeded named graphs
Named graphsof distilled data
Raw data
  • Unneeded named graphs can be ignored
  • And eventually discarded

35
Summary of monotonicity strategy
  • Don't change data!
  • Create new named graphs instead
  • Use named graphs to compartmentalize data
  • But if you must change data
  • Use named graphs to limit downstream impact
  • Only regenerate those that are affected
  • Retain both raw data and distilled data (in
    separate named graphs)

36
Summary of architectural strategies
  • RDF message semantics
  • GRDDL transformations from XML to RDF
  • REST-based SPARQL endpoints
  • Semantic Data Federation
  • Named graphs
  • Monotonicity

37
PART 3Example Cleveland Clinic SemanticDB
38
SemanticDB Project
  • Applies semantic web technology to
  • Clinical research
  • Outcomes reporting
  • Quality reporting
  • Sponsored by Cleveland Clinic's Heart and
    Vascular Institute

39
Cleveland Clinic SemanticDB Project
User interfaces
Ontologies
Cyc natural language processing
Patient Data Entry
Natural languagequery
SPARQL interface
Structured query
Semantic wiki
Data-source adaptors
Instance data
Patient-centric systems
Patientregistry
Geneticpatientregistry
Tagged literature, e.g., PUBMED
. . .
. . .
40
More information
  • Cleveland Clinic SemanticDB projecthttp//www.w3
    .org/2001/sw/sweo/public/UseCases/ClevelandClinic/
  • RDF and SOAhttp//dbooth.org/2007/rdf-and-soa/
  • SPARQLhttp//jena.sourceforge.net/ARQ/Tutorial/
  • GRDDLhttp//www.w3.org/TR/grddl-primer/

41
Questions?
Write a Comment
User Comments (0)
About PowerShow.com