Title: Brain Data
1 Brain Data Knowledge Grid
- (or Towards Services for Knowledge-Based
Mediation of Neuroscience Information Sources)
National Center for Microscopy and Imaging
Research (NCMIR) Mark Ellisman Maryann
Martone Steve Peltier Steve Lamont ...
Data-Intensive Computing Environments San Diego
Supercomputer Center (SDSC) Reagan Moore Chaitan
Baru Amarnath Gupta Bertram Ludäscher Richard
Marciano Arcot Rajasekar Ilya Zaslavsky ...
University of California, San Diego
2 Infrastructure for Sharing Neuroscience Data
- SOURCES
- NCMIR, U.C. San Diego
- Caltech Neuroimaging
- Center for Imaging Science, John Hopkins
- Center for Computational Biology, Montana State
- Laboratory of Neuro Imaging (LONI), UCLA
- Computatuonal Neurobiology Laboratory, Salk
Inst. - Van Essen Laboratory, Washington University
-
- Data Management Infrastructure (DICE/NPACI)
- MIX Mediation in XML
- MCAT information discovery
- SRB data handling
- HPSS storage
- ...
Knowledge-based GRID infrastructure
?
?
?
?
Data Management Infrastructure (Data
Grid) GTOMO, Telemicroscopy, Globus, SRB/MCAT,
HPSS
3Sharing Resources on the Brain Data Grid
- Scientific groups ...
- create data products (e.g., text data, images,
simulation data ) - put them in collections
- add metadata (who created it, what is the data
about ) - make it available for sharing (on the web, in
data caches, in HPSS, ) - Technical challenges ...
- size packaging of data
- heterogeneity data types, storage technologies,
transport mechanisms, authentication, ... - access levels collection, object, fragment
data-specific functions (data blades) - Data Grid technologies can help ...
- distributed data management, e.g., Storage
Request Broker/Metadata Catalog (SRB/MCAT),
computing (Globus), ... - focus is on resource sharing (data, networks,
cycles)
4 Integration Issue Semantic Integration/Mediation
??? SEMANTIC INTEGRATION ???
- SYNTACTIC/STRUCTURAL Integration
- Integrated Views (Src-XML gt Intgr-XML)
- Schema Integration (DTD gtDTD)
- Wrapping, Data Extraction (Text gt XML)
MIX Mediation of Information using XML
Distributed Query Processing
SRB/MCAT
storage, query capabilities protocols services
Globus JDBC DOM CORBA
SYSTEM INTEGRATION
TCP/IP grid-ftp HTTP
5Standard Mediator/Wrapper Architecture
Client/User-Query
XML Q/A
INTEGRATED VIEW
domain semantics ???
GRID federation services ???
Integration logic
protocol translation
SRB/MCAT, DOM, X(ML)Query
structure
syntax
Wrapper
Wrapper
Wrapper
transport
storage
Files
Lab1
Lab2
Lab3
(Neuro)Science (Re)Sources
6The Need for Semantic Integration
Cross-source queries
What is the cerebellar distribution of rat
proteins with more than 70 homology with human
NCS-1? Any structure specificity? How about other
rodents?
Cross-source relationships are modeled
Semantic (knowledge-based) mediation services
Data, relationships, constraints are modeled (CMs)
Wrapper
Wrapper
Wrapper
Wrapper
Web
protein localization
morphometry
neurotransmission
CaBP, Expasy
7Hidden Semantics Protein Localization
- ltprotein_localizationgt
- ltneuron typepurkinje cell /gt
- ltprotein channelredgt
- ltnamegtRyRlt/gt
- .
- lt/proteingt
- ltregion h_grid_pos1 v_grid_posAgt
- ltdensitygt
- ltstructure fraction0.8gt
- ltnamegtspinelt/gt
- ltamount nameRyRgt0lt/gt
- lt/gt
- ltstructure fraction0.2gt
- ltnamegtbranchletlt/gt
- ltamount nameRyRgt30lt/gt
- lt/gt
8Hidden Semantics Morphometry
- ltneuron namepurkinje cellgt
- ltbranch level10gt
- ltshaftgt
-
- lt/shaftgt
- ltspine number1gt
- ltattachment x5.3 y-3.2 z8.7 /gt
- ltlengthgt12.348lt/gt
- ltmin_sectiongt1.93lt/gt
- ltmax_sectiongt4.47lt/gt
- ltsurface_areagt9.884lt/gt
- ltvolumegt7.930lt/gt
- ltheadgt
- ltwidthgt4.47lt/gt
- ltlengthgt1.79lt/gt
- lt/headgt
- lt/spinegt
-
9Knowledge-Based (Semantic) Mediation
- Multiple Worlds Integration Problem
- compatible terms not directly joinable
- complex, indirect associations among attributes
- unstated integrity constraints
- Approach
- a theory under which terms can be semantically
joined - gt lift mediation to the level of conceptual
models (CMs) - gt formalize domain knowledge, ICs become rules
over CMs - gt Knowledge-Based/Model-Based (Semantic)
Mediation
10XML-Based vs. Model-Based Mediation
CM Descr.Logic, ER, UML, RDF/XML(-Schema),
CM-QL F-Logic, OIL, DAML,
XML Models
11Knowledge-Based Mediator Prototype
USER/Client
CM (Integrated View)
Domain Map DM
Integrated View Definition IVD
CM Plug-In
CM Queries Results (exchanged in XML)
Logic API (capabilities)
12Mediation Services Source Registration (System
Issues)
Source
Data Type
Query Capability
Result Delivery
Access Protocol
ARC
SQL
XML QL
DOOD
table
tree
file
SRB
HTTP
JDBC
Tuple-at-a-time
Stream
Set-at-a-time
SPJ
Selections
Binary for Viewer
13Mediation Services Source Registration
(Semantics Issues)
- Domain Map Registration
- provide concept space/ontology
- as a private object (myANATOM)
- merge with others (give semantic bridges)
- and check for conflicts
- Conceptual Model Registration
- schema classes, associations, attributes
- domain constraints
- put data into context (linking data to the
domain map)
Next
14ANATOM Domain Map
ANATOM
Back
15Senselab (Yale) and NCMIR (UCSD) Semantic
Bridge
anatom_dom(X) - (ucsd_has_a(X,_)
ucsd_has_a(_,X) ucsd_isa(X,_)
ucsd_isa(_,X)). senselab_dom(X) - (sl_has_a(X,_)
sl_has_a(_,X) sl_isa(X,_) sl_isa(_,X)).
map Senselab anatom terms to equivalent UCSD
ANATOM sl2ucsd(X,X) - senselab_dom(X),
anatom_dom(X). sl2ucsd('A',axon). sl2ucsd('AH',axo
n). sl2ucsd('Dad',spiny_branchlet). should
map to a PATH not just the end of the
path sl2ucsd('Dam',main_branches). some of
the main_branches based on the branch
level sl2ucsd('Dap',main_branches). sl2ucsd('Dbd',
spiny_branchlet). sl2ucsd('Dbm',main_branches). sl
2ucsd('Dbp',main_branches). sl2ucsd('Ded',spiny_br
anchlet). sl2ucsd('Dem',main_branches). sl2ucsd('D
ep',main_branches). sl2ucsd('T',axon). keep
has_a edge if at least one node is known from
UCSD has_a(X,Y) - sl2ucsd(_,X),
ucsd_has_a(X,Y). has_a(X,Y) - sl2ucsd(_,Y),
ucsd_has_a(X,Y). keep all and only UCSD is_a
rels isa(X,Y) - ucsd_isa(X,Y). Back
16Refinement of a Domain Map (Ontology) Putting
Data in Context via Registration of new Classes
Relationships
Neuron
MyNeuron
Neostriatum
Compartment
Spiny Neuron
ALLhas
Axon
Soma
Dendrite
Medium Spiny Neuron
Neurotransmitter
MyDendrite
exp
AND
GABA
Substance P
OR
exp
Dopamine R
Substantia Nigra Pc
Substantia Nigra Pr
Globus Pallidus Int.
Globus Pallidus Ext.
17Mediation Services Integrated View Definition
- DERIVE
- protein_distribution(Protein, Organism,
Brain_region, Feature_name, Anatom, Value) - FROM
- Iprotein_label_image proteins -gtgt Protein
organism -gt Organism anatomical_structures -gtgt - ASanatomical_structurename-gtAnatom ,
from PROLAB - NAEneuro_anatomic_entityname-gtAnatom
from ANATOM - located_in-gtgtBrain_region,
- AS..segments..featuresname-gtFeature_name
value-gtValue.
- provided by the domain expert and mediation
engineer - declarative language (here Frame-logic)
18Example Query Evaluation (I)
- Example protein_distribution
- given organism, protein, brain_region
- Use DOMAIN-KNOWLEDGE-BASE
- recursively traverse the has_a_star paths under
brain_region collect all anatomical_entities - Source PROLAB
- join with anatomical structures and collect the
value of attribute image.segments.features.featur
e.protein_amount where image.segments.features.f
eature.protein_name protein and
study_db.study.animal.name organism - Mediator
- aggregate over all parents up to brain_region
- report distribution
19Example Query Evaluation (II)
"How does the parallel fiber output
(Yale/SENSELAB) relate to the distribution of
Ryanodine Receptors (UCSD/NCMIR)?"
- _at_SENSELAB X1 select output from parallel
fiber - _at_MEDIATOR X2 hang off X1 from Domain Map
- _at_MEDIATOR X3 subregion-closure(X2)
- _at_NCMIR X4 select PROT-data(X3,
Ryanodine Receptors) - _at_MEDIATOR X5 compute aggregate(X4)
20Mediation Services Client Registration
Client
Update Client
Query Client
Thin Result Viewer
Fat Result Viewer
Navigate/ Ad-hoc
Query Capability
Query on Schema
Derive Before Insert
Check Data
Merge Before Insert
Client-side Processing
Client-side Buffer
Send Full Data
Context Sensitive
Server-side Buffer
Server-Push/ Client-Pull
21Example Client Query Formulation and Result
Display
- combination of ad hoc and navigational queries
- client side visualization (left)
- results are shown in semantic context (right)
22Mediation Services Semantic Annotation Tools
line drawing annotationgt (spatial) database
for mediation
23 Mediator Architecture Blueprint
Mediation Services
Mediator Layer
- Source model lifting
- domain knowledge reconciliation
- model transformation
- Query formulation
- user query
- integrated view definition
Deductive Engine
Model Reasoner
- Source registration
- domain knowledge
- model schema
- query computation capabilities
- Query processing
- view unfolding
- semantic optimization
- capability-based rewriting
Optimizer
Wrapper Layer
- Query interface (down API)
- SDLIP, SOAP, ...
- (subsets of) SQL, X(ML)-Query, CPL,...
- DOM
- SRB-based access
- Result delivery interface (up API)
- SDLIP, SOAP, ...
- pull (tuple/set-at-a-time, DOM) vs. push
(stream) - synchronous/asynchronous
- direct data/data reference
XML Sources
RDB Sources
File Sources
HTML Sources
Digital Libraries (Collections)
Spatial Sources
Boston Univ.
NCMIR UCSD
Yale Univ.
Montana Univ.
SDLIP
ARC IMS
24 Coming up Knowledge-Based/Semantic Mediation
of Brain Data
PROTLOC
Result (XML/XSLT)
Result (VML/SVG)
ANATOM
25Some Open Issues
- Data/Knowledge Modeling
- Extensibility how to handle a source with new
data types and operations? - Temporal Data instrument readings, video
microscopy - Spatial Data Integrating with spatial database
systems - Image database systems
- Conflict Management
- Grades of certainty
- Alternate Hypothesis
- Integrating Services
- Registration and warping of my image slice to a
reference - Integrating into Larger Applications
- M-Cell simulation
- Telemicroscopy
- Visualization
26References
- Model-Based Mediation with Domain Maps, Bertram
Ludäscher, Amarnath Gupta, Maryann Martone, Intl.
Conference on Data Engineering (ICDE),
Heidelberg, 2001 - Knowledge-Based Mediation of Heterogeneous
Neuroscience Information Sources, Amarnath Gupta,
Bertram Ludäscher, Maryann Martone, Intl.
Conference on Scientific and Statistical
Databases (SSDBM), Berlin, 2000. - Model-Based Information Integration in a
Neuroscience Mediator System, Bertram Ludäscher,
Amarnath Gupta, Maryann Martone, Intl. Conference
on Very Large Data Bases (VLDB), Cairo, 2000.