Knowledge-Based Integration of Neuroscience Data Sources - PowerPoint PPT Presentation

About This Presentation
Title:

Knowledge-Based Integration of Neuroscience Data Sources

Description:

Knowledge-Based Integration of Neuroscience Data Sources Amarnath Gupta Bertram Lud scher Maryann Martone University of California San Diego – PowerPoint PPT presentation

Number of Views:307
Avg rating:3.0/5.0
Slides: 18
Provided by: Amar100
Learn more at: https://users.sdsc.edu
Category:

less

Transcript and Presenter's Notes

Title: Knowledge-Based Integration of Neuroscience Data Sources


1
Knowledge-Based Integration of Neuroscience Data
Sources
  • Amarnath Gupta
  • Bertram Ludäscher
  • Maryann Martone
  • University of California San Diego

2
A Standard Information Mediation Framework
Client Query
Integrated XML View
Mediator
XML View
XML View
XML View
Wrapper
Wrapper
Data Source
XML Data Source
Data Source
3
A Neuroscience Question
Cerebellar distribution of rat proteins with more
than 70 homology with human NCS-1? Any structure
specificity? How about other rodents?
Integrated View
Mediator
Wrapper
Wrapper
Wrapper
Wrapper
WWW
CaBP, Expasy
protein localization
morphometry
neurotransmission
4
Integration Issues
  • Structural Heterogeneity
  • Resolved by converting to common semistructured
    data model
  • Heterogeneity in Query Capabilities
  • Resolved by writing wrappers with binding
    patterns and other capability-definition
    languages
  • Semantic Heterogeneity
  • Schema conflicts
  • Partially resolved by mapping rules in the
    mediator
  • Hidden Semantics?

5
Hidden SemanticsProtein Localization
  • ltprotein_localizationgt
  • ltneuron typepurkinje cell /gt
  • ltprotein channelredgt
  • ltnamegtRyRlt/gt
  • .
  • lt/proteingt
  • ltregion h_grid_pos1 v_grid_posAgt
  • ltdensitygt
  • ltstructure fraction0.8gt
  • ltnamegtspinelt/gt
  • ltamount nameRyRgt0lt/gt
  • lt/gt
  • ltstructure fraction0.2gt
  • ltnamegtbranchletlt/gt
  • ltamount nameRyRgt30lt/gt
  • lt/gt

6
Hidden Semantics Morphometry
  • ltneuron namepurkinje cellgt
  • ltbranch level10gt
  • ltshaftgt
  • lt/shaftgt
  • ltspine number1gt
  • ltattachment x5.3 y-3.2 z8.7 /gt
  • ltlengthgt12.348lt/gt
  • ltmin_sectiongt1.93lt/gt
  • ltmax_sectiongt4.47lt/gt
  • ltsurface_areagt9.884lt/gt
  • ltvolumegt7.930lt/gt
  • ltheadgt
  • ltwidthgt4.47lt/gt
  • ltlengthgt1.79lt/gt
  • lt/headgt
  • lt/spinegt

7
The Problem
  • Multiple Worlds Integration
  • compatible terms not directly joinable
  • complex, indirect associations among schema
    elements
  • unstated integrity constraints
  • Why not use ontologies?
  • typical ontologies associate terms along limited
    number of dimensions
  • Whats needed
  • a theory under which non-identical terms can be
    semantically joined

8
Our Approach
  • Modify the standard Mediation Architecture
  • Wrapper
  • Extend to encode an object-version of the
    structure schema
  • Mediator
  • Redesign to incorporate auxiliary knowledge
    sources to
  • Correlate object schema of sources
  • Define additional objects not specified but
    derivable from sources
  • At the Mediator
  • Use a logic engine to
  • Encode the mapping rules between sources
  • Define integrated views using a combination of
    exported objects from source and the auxiliary
    knowledge sources
  • Perform query decomposition
  • We still use Global-as-View form of mediation

9
The KIND Architecture
Integrated User View
View Definition Rules
Auxiliary Knowledge Source 1
Logic Engine
Integration Logic
Auxiliary Knowledge Source 2
Schema of Registered Sources
Materialized Views
Src 2
Src 1
10
The Knowledge-Base
  • Situate every data object in its anatomical
    context
  • An illustration
  • New data is registered with the knowledge-base
  • Insertion of new data reconciles the current
    knowledge-base with the new information by
  • Indexing the data with the source as part of
    registration
  • Extending the knowledge-base
  • Creating new views with complex rules to encode
    additional domain knowledge

11
F-Logic for the Mediation Engine
  • Why F-Logic?
  • Provides the power of Datalog (with negation) and
    object creation through Skolem IDs
  • Correct amount of notational sugar and rules to
    provide object-oriented abstraction
  • Schema-level reasoning
  • Expressing variable arity
  • F-Logic in KIND
  • Source schema wrapped into F-Logic schema
  • Knowledge-sources programmed in F-Logic
  • Definition of Integrated Views

12
Wrapping into Logic Objects
  • Automated Part

lt!ELEMENT Studies (Study)gt lt!ELEMENT Study
(study_id, animal,
experiments, experimentersgt lt!ELEMENT experiments
(experiment)gt lt!ELEMENT experiment (description,
instrument, parameters)gt
studyDBstudies ? ? study. studystudy_id ?
string animal ? animal
experiments ? ? experiment
experimenters ? ? string.
  • Non-automated Part
  • Subclasses
  • Rules
  • Integrity Constraints

mushroom_spinespine
Smushroom_spine IF Sspinehead?_neck ?_.
ic1(S)alerttype ? invalid spine object S IF
Sspineundef ? ? head, neck.
13
Computing with Auxiliary Sources
  • Creating Mediated Classes
  • Reasoning with Schema

animalM?R IF Ssource, S.animal M?R
. animaltaxon ? TAXON.taxon. Xtaxon?T IF X
PROLAB.animalname ?N,
words(N,W1,W2_), T
TAXON.taxongenus ?W1species ?W2.
14
Integrated View Definition
  • Views are defined between sources and knowledge
    base
  • Example protein_distribution
  • given organism, protein, brain_region
  • KB Anatom
  • recursively traverse the has_a paths under
    brain_region collect all anatomical_entities
  • Source PROLAB
  • join with anatomical structures and collect the
    value of attribute image.segments.features.featur
    e.protein_amount where image.segments.features.f
    eature.protein_name protein and
    study_db.study.animal.name organism
  • Mediator
  • aggregate over all parents up to brain_region
  • report distribution

15
Query Evaluation Example
  • protein distribution of Human NCS-1 homologue
  • from wrapped CaBP website
  • get the amino acid sequence for human NCS-1
  • from wrapped Expasy website
  • submit amino acid sequence, get ranked homologues
  • at Mediator
  • select homologues H found in rat, and homology gt
    0.70
  • at Mediator
  • for each h in H
  • from previous view
  • protein_distribution(rat, h, cerebellum,
    distribution)
  • Construct result

a second integrated view
16
Implementation
  • System
  • Flora as F-Logic Engine
  • Communicate with ODBC databases through
    underlying XSB Prolog
  • XML wrapping and Web querying through XMAS, our
    XML query language and custom-built wrappers
  • Data
  • Human Brain Project sites
  • NPACI Neuroscience Thrust sites

17
Work in Progress
  • Architecture
  • plug-in architecture for
  • domain knowledge sources
  • conceptual models from data sources
  • Functionality
  • better handling of large data
  • operations
  • expressive query language
  • operators for domain knowledge manipulation
  • query evaluation
  • query optimization using domain knowledge
  • Demonstration
  • at VLDB 2000
Write a Comment
User Comments (0)
About PowerShow.com