Title: The ACGT Data Access Infrastructure
1The ACGT Data Access Infrastructure
- Luis Martín (lmartin_at_infomed.dia.fi.upm.es)
- HealthInf 08
- 30/01/2008
2The Data Access Infrastructure Aims At
- Providing homogeneous, seamless access to
heterogeneous (in terms of syntax and semantics)
sources of information. - Providing querying services to both end users and
data analysis tools.
3Main Resources
- Within the framework of the Data Access
Infrastructure in ACGT, several tools are being
developed, namely - The ACGT Semantic Mediator
- The ACGT Master Ontology on Cancer
- The ACGT Data Access Services
4Data Access Infrastructure within ACGT
5Data Access Architecture
6The ACGT Master Ontology
- The ACGT Master Ontology on Cancer aims at
- Enhancing cancer managment in Europe by enabling
semantic interoperability. - Meeting all necessary preconditions of the
project infrastructure. - Creating an ontology that is both philosophically
and technically valid and sound. -
7Development Procedure
- Continuous iterative development process that
includes domain experts via face-to-face
meetings, online telcos and e-mail discussions - At all times feedback is highly encouraged and
integrated in the development
ontologydevelopers
clinicians researchers
8Introduction
- Following examples are takenfrom the clinical
trial formsof the TOP trial onbreast cancer - Another source arethe forms from clinicaltrials
on nephroblastomadone by and
9(No Transcript)
10Ontology as Black Box
- Ontology has a heavilycomplex internal
structurethat should not be exposedto the
actual end user - End users access the ontology onlyvia
specialized tools - Ontology Viewer
- Mapping Tool
- Querying Tool
11The ACGT Semantic Mediator
- The ACGT Semantic Mediator aims at
- Providing access to integrated repositories of
semantically heterogeneous databases. - Offering users a friendly interface to query
these data. -
12Scientific Foundations of the Semantic Data
Integration Approach (I)
- Query Translation vs. Data
Warehouses - Given the nature of the data in ACGT, a query
translation based approach was selected
13Scientific Foundations of the Semantic Data
Integration Approach (II)
- Global as View vs. Local
as View - A LaV based approach has been selected. Master
Ontology will act as Global Schema.
14The ACGT Semantic Mediation Process
- Data Integration using the mediator
- A query is performed using the interface (query
based on the ACGT Master Ontology). - The query is split, and different queries for the
underlying databases are generated (via the
mapping filter). - Queries are performed in the databases (through
corresponding Data Access Services). - Results are returned and integrated (using the
selected format).
15The ACGT Semantic Mediator
- Different components addressing different aspects
of the same problem - Query Formulation Interface ? Helping end-users
in formulating queries - Master Ontology ? Acting as Global Schema
- Mediation Layer ? Resolving the query translation
problem - OntoQueryClean ? Dealing with query identifier
heterogeneities - OntoDataClean ? Addressing instance level
heterogeneities - Mapping API and GUI ? Aiding in the virtual views
creation process.
16Mediator SIOP Dicom query
SELECT ?PatientIdentifier.ClinicalTrialPatientNum
ber, ?PatientIdentifier.pnr, ... WHERE ( ?a,
rdftype, hPatientIdentifier ), ... (
?a, hPatientIdentifier.hasStudy.Study, ?b
) USING ...
PREFIX h lthttp//gridnode.ehv.campus.philips.c
om/dicom/gt PREFIX xsd lthttp//www.w3.org/20
01/XMLSchemagt SELECT ?PatientID
?PatientsName WHERE OPTIONAL ?a
hPatientID ?PatientID . OPTIONAL ?a
hPatientsName ?PatientsName .
SELECT DISTINCT patient.siopnr, patient.pnr,
... FROM patient
17Results
ltrdfRDF xmlnsj.0"http//infomed.dia.fi.upm.
es/SIOPDicom" xmlnsrdf"http//www.w3.org/19
99/02/22-rdf-syntax-ns" xmlnsj.1"http//www
.w3.org/2001/XMLSchema" xmlnsrdfs"http//ww
w.w3.org/2000/01/rdf-schema"
xmlnsowl"http//www.w3.org/2002/07/owl ...
ltowlClass rdfabout"http//infomed.dia.fi.upm.es
/SIOPDicomPatientIdentifier"/gt
ltowlDatatypeProperty rdfabout"http//infomed.di
a.fi.upm.es/SIOPDicomPatientIdentifier.ClinicalTr
ialPatientNumber"gt ... ltj.0PatientIdentifier
rdfabout"http//infomed.dia.fi.upm.es/SIOPDicom
PatientIdentifier13"gt ltj.0PatientIdentifier.H
ospitalIdentifiergt ltj.1stringgt
ltrdfvaluegtWithout Informationlt/rdfvaluegt
lt/j.1stringgt lt/j.0PatientIdentifier.Hospital
Identifiergt ltj.0PatientIdentifier.FirstNamegt
...
18The ACGT Data Access Services
- The ACGT Data Access Services aim at
- Provide uniform interface
- uniform transport protocol
- uniform message syntax
- uniform query syntax
- uniform data format
- Hide query peculiarities of data source
- Hide query limitations of data source
- Export data model of data source
-
19Main types of data sources
- Relational databases
- CRF data, microarray data
- DICOM servers
- Medical image data
- Public web databases
- Gene and protein sequence databases
- Files in various formats
- Excell, XML, comma separated
20Technology choices
- OGSA-DAI
- The standard web services framework for Data
Access Interfaces - Supports activity framework for efficient and
flexible services invocation - SPARQL
- Modern RDF query language
- Fits needs of mediator
- Intermediate level of expressiveness
- E.g. more expressive than DICOM query
capabilities, less expressive than SQL - Suitable as an initial query language for wrappers
21SPARQL for querying DICOM
- Uniform query syntax
- Any DICOM query can be expressed as SPARQL
- SPARQL does not impose any limitations
- Hide query limitations of data source
- SPARQL filters can be used to create queries that
cannot be expressed as DICOM queries - However, not all SPARQL queries can be
efficiently converted to DICOM queries - Therefore, the data access service does not
accept all queries - This is unavoidable, for performance reasons
22Image retrieval
- Hide query peculiarities of data source
- Using DICOM Q/R you can only retrieve images by
hosting a DICOM Application Entity - With the data access service, images can be
delivered to URL - No need for the client to host a DICOM server
- The use of various DICOM querying information
models is hidden from user
23Using the DICOM levels
SELECT ?patientId ?studyId ?seriesId WHERE
?patient dicomPatientID ?patientId
dicomPatientsName "Huge, Lurch" . ?study
dicomPatient ?patient
dicomStudyInstanceUID ?studyId . ?series
dicomStudy ?study
dicomSeriesNumber "3"
dicomSeriesInstanceUID ?seriesId .
sparqlQuery Statement
sparqlResults ToXML
ltresultgt ltbinding name"patientId"gt200650lt/bindi
nggt ltbinding name"studyId"gt1.3.46.670589.5.2.12.
2158432007.1002671691.401594lt/bindinggt ltbinding
name"seriesId"gt1.3.46.670589.5.2.12. 2158432007.1
002671552.91561lt/bindinggt lt/resultgt
24Conclusions
- We have successfully resolved the issues related
to - DICOM and relational database integration
- General mapping format
- Implementation of a range of supporting tools
- Open issues
- Public databases integration
- Development of a friendly query interface
(exploring NL) - Ontology Mantainance and Extension (submission
system)
25Thank you