Biological Databases - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Biological Databases

Description:

The purpose of the biological experiment is to understand working of the ... diazepam (GABA-a facilitator) Synaptic Profiles. Pharmacological Data (sp-Profile) ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 45
Provided by: asif4
Category:

less

Transcript and Presenter's Notes

Title: Biological Databases


1
Biological Databases
  • Asif Jan
  • Brain Mind Institute
  • EPFL

2
Introduction
  • The purpose of the biological experiment is to
    understand working of the biological organs such
    as brain, cells etc
  • The purpose is to study the interplay of
    different structural, chemical and electrical
    signals that gave rise to natural and disease
    processes.
  • The data is acquired from different angles to
    serve different research purposes i.e. different
    animal models, using physiological approaches,
    anatomical approach and levels of protein
    activities etc

3
Sequence
  • Biological Data - Types and Constraints
  • Neocortical Microcircuit Data
  • Databasing Neocortical Microcircuit
  • Conclusion and Discussion

4
Part I
  • Biological Databases

5
Biological Databases- Challenges
  • A great deal of diversity in the data types
  • Unconventional and adhoc query requirements
  • Ubiquitous uncertainty in the data
  • Requirements for data curation
  • Need for detailed Data annotations
  • A need for large scale data integration
  • Non-availability of universal taxonomy
  • Support for rapid Schema evolution
  • Temporal Data Management
  • (Ref Data Management for Molecular and Cell
    Biology - Workshop report www.lbl.gov/olken/wdmbi
    o )

6
Data Types (1/2)
  • Sequences
  • DNA, RNA, amino-acid sequences (proteins). The
    data has grown enormously due to availability of
    automated sequence machines and large scale
    sequencing projects such as human and mouse
    genome.
  • Graphs
  • Biological pathways such as metabolic pathways,
    gene regulatory networks, 3D protein structures
  • High Dimensional Data
  • Micro-array experiments (thousands of genes),
    hundreds of experimental conditions, clustering
    studies on genes etc
  • Shapes
  • 3D molecular structural data augmented by
    chemical distribution, 3D cell morphology data

7
Data Types (2/2)
  • Temporal Data
  • Useful for studying the dynamics of biological
    system e.g. electrophysiology recordings,
    development biology, protein structure dynamics,
    cellular structure dynamics etc.
  • Model Data
  • Representation of biological phenomenon as
    computational, mathematical and statistical
    models used for parameter estimation, testing
    etc. Models shall also be represented and stored
    in query-able format.
  • Scalar and Vector Fields
  • Charge distribution across cell surface, calcium
    and protein fluxes across cell surface etc
  • Extracted Features Data
  • Numerical data extracted from the combination of
    one of the above data types

8
Adhoc Query Requirements
  • Biologists understand the relation across
    different data types and these relations are not
    necessary obvious from the database point of view
    i.e.
  • Two labs one studying dendritic spines of PC
    in hippocampus, primary schema element being the
    anatomical entities (dendrites etc) reconstructed
    from 3D serial sections. The other studying
    Purkinje cells in the cerebellum branching
    patterns from the dendrites of neurons and
    protein localization in various compartments
  • Thus a researcher, modeling effects of
    neurotransmission in hippocampal spines would get
    structural information from lab 1 and information
    on calcium binding proteins found in spines from
    lab 2. Assumption Like PC, Purkinje cells also
    possess dendritic spines and release of calcium
    in spiny dendrites occurs as a result of
    neurotransmission and causes change in spine
    morphology. Propagation of calcium signals
    through out the neuron depends on the morphology
    of dendrites
  • Ref Ludascher B, Gupta A and Martone M, Model
    Based Mediator System for Scientific Data
    Management

9
Uncertainty in the data
  • Biological data has a great deal of uncertainty
    as it represents a biological phenomenon that is
    observed and assumed (based on some evidence) to
    be true.
  • For example, the spiking behavior of cell under
    specific stimuli, protein sequence in the protein
    database that is based on partial protein report
    etc.
  • The uncertainty must also be modeled and recorded
    as part of the data as it has consequences for
    subsequent usage of the data.

10
Requirements for data curation
  • The data is collected across different structural
    and function boundaries, there might be many
    missing links and inconsistencies (some
    inconsistencies due to lack of core domain
    knowledge etc).
  • Often is the case that expert intervention is
    required for cross correlation of the data and
    for filling in missing links and/or improving the
    data consistency.
  • However, large scale biological database entail
    explicit representation of uncertainties and
    cross structural/functional boundaries in order
    to have automatic curation.

11
Data Annotations
  • Biological data is specific to the purpose of the
    individual performing the data collection.
  • For example, while studying calcium regulation
    researcher A might adapt a physiological approach
    using patch electrodes and researcher B may take
    anatomical approach mapping different isoforms of
    calcium current to structure of organelles that
    expresses them etc.
  • Different animal models, need to integrate data
    collected from different brain regions, across
    different species etc. Furthermore, the
    experimental conditions have a great influence on
    the experimental results.
  • This requires that the data shall be properly
    annotated during different stages of data
    collection and all conditions/parameters properly
    recorded. Furthermore, assumptions in doing
    experiment etc shall also be recorded.
  • In case of data derived from primary data the
    need for proper annotations is further enhanced.

12
Need for large scale data integration
  • It is very difficult, if not impossible, to
    collect information about various biological
    entities at a single institute or laboratory.
  • Data collected from years of research, across
    different functional and anatomical scales, and
    for normal as well as disease cases is available
    for use.
  • Often this data is poorly annotated and
    inadequately structured yet contains precious
    information that can not be ignored.
  • However, lack of universal taxonomy, or a uniform
    structure presents many challenges for effective
    utilization of this data.
  • While improving the readability of the existing
    databases, it is imperative for the new databases
    to adapt proper descriptions, query interfaces
    and annotation frameworks.

13
Other Issues
  • Lack of Taxonomies
  • Schema Evolution
  • Biological Constraints
  • Data Cleaning

14
Part II
  • Neocortical Microcircuit Data

15
The Neocortex and the Cortical Column
Cortical Sheet
Neocortex
Cortical Column
Layer I II III
IV V VI
16
  • Key properties of neurons and synapses
    numerically represented as profiles

17
Neuron ProfilesMorphology Data (m-Profile)
  • Neuron 3D reconstructed and converted to
    Neurolucida format
  • Analyzed by a MATLAB based tool to extract a
    vector of 200 values
  • Values can be used to artificially rebuild
    neurons with specific statistical properties
  • Example Parameters
  • TreeLengthMean (mean of lengths of segments with
    same order in each tree)
  • IndivTreeLengthMean (mean of segment lengths in a
    tree)
  • XY_Angle (angle between projection of a segment
    on XY plane and X axis)

18
(No Transcript)
19
Neuron ProfilesElectrophysiology Data (e-Profile)
  • Obtained by applying a series of current
    injections to the samota
  • Response measured to obtain a spectrum ( 140) of
    electrophysiological parameters (EPs)
  • Most parameters sensitive to the ion channel
    composition
  • Raw electrophysiological data also stored
  • Example parameters
  • ADP (after depolarization immediately following
    APs)
  • APThreshhold (threshold of AP generation during a
    ramp polarization)
  • SineSpectrum (various measures of frequency
    filtering by neuron)
  • sAHP (amplitude of hyperpolarization after a
    burst of APs)

20
(No Transcript)
21
Neuron ProfilesGene Expression Data (g-Profile)
  • Obtained from single cell multiplex RT-PCR
    studies and single cell DNA microarray analyses
  • Enable non quantitative detection of expression
    vs non expression of 50 genes
  • Extended the system to conduct single cell DNA
    microarray studies to screen for over 20,000
    genes

22
(No Transcript)
23
Synaptic ProfilesMorphology Data (sm-Profile)
  • Describes anatomy of a synaptic connection
  • Contains information about number of synapses,
    their location on axonal and dendritic arbors of
    pre , post synaptic neuron, axonal and dendritic
    geometric and electronic distances
  • Examples
  • Axonal Branch Order (number of branch points
    between the bouton forming the synapse and the
    soma of the source neuron)
  • Dendritic Branch Order (the location of the
    synapse along the dendritic arbor according to
    the branching frequency of the dendritic tree)
  • Geometrical Distance (the distance along the
    dendritic from the synaptic location to the
    postsynaptic soma)

24
(No Transcript)
25
Synaptic ProfilesElectrophysiological Data
(se-Profile)
  • Characterized in terms of the biophysical
    dynamic properties
  • The biophysical properties focus on the
    amplitudes, latencies, rise and decay times
    synaptic conductances synaptic charge transfer,
    etc
  • The dynamic properties include the time-constants
    governing the rates of recovery from synaptic
    depression (D) and facilitation (F) as well as
    the absolute and effective utilization of
    synaptic efficacy parameters
  • Other parameters include estimates of probability
    of release and number of functional release sites

26
(No Transcript)
27
Synaptic ProfilesPharmacological Data
(sp-Profile)
  • Contains information describing the sensitivity
    of the synaptic connection to different chemicals
  • Described in terms of synapse response to various
    blockers, agonists and antagonists
  • Commonly used chemicals are
  • bicuculine (GABA-a antagonist)
  • APV (NMDA receptor antagonist)
  • CNQX (AMPA receptor antagonist)
  • CGP 35348 (GABA-b antagonist)
  • NMDA (NMDA receptor agonist)
  • diazepam (GABA-a facilitator)

28
Additional Data
  • General Data (g-Profile)
  • Animal Information
  • Brain Region Information
  • Experiment Information
  • Model Data (mod-Profile)
  • A complete NEURON model that will include
  • active properties by inclusion of ion channel
    constellations and parameters
  • electrical properties of the neurons
  • possible ion channel constellations
  • Canonical Data (x-Profile)
  • statistical analysis of stored neurons and
    synapses
  • these simplified models to be used for
    visualization and simulation etc

29
Part III
  • Databasing Neocortical Microcircuit

30
Blue Brain Project - Goals
  • Gather and share raw data about different facets
    of the neurons
  • Develop biologically accurate model of neurons
    and their interactions
  • Obtain a biologically accurate simulation of the
    cortical column
  • Develop visualization tools on the simulation to
    perform in silico biology studies

31
Blue Brain Data Usage
Visualization
Produces data
Simulation
Are used for
Experiments
Models
Test against Experimental data
Is stored
Produces statistics On experiments
Database
32
Overview of current system
33
Storage Resource Broker
  • Distributed data storage
  • Uniform access to a variety of data sources
  • Intuitive file system-like interface with the
    data
  • Ability to annotate data with metadata
  • Data access management
  • Security management

34
SRB Federation architecture
35
SRB Server Architecture
36
The Metadata Catalog (MCAT)
  • Stores and manages the information about an
  • SRB System
  • Physical and connection information about the
    data sources
  • User information and privileges
  • Logical and physical mapping to the files
  • Files metadata
  • System metadata
  • User-defined metadata

37
Current Database Infrastructure
  • Uses SRB for storing the primary data
  • Customized tools for data upload, annotation and
    download
  • Metadata about primary data is stored in the
    metadata catalog
  • Secondary databases storing morphology,
    electrophysiology and gene expression data
  • Specialized database for storing extracted
    features

38
Current Set of Databases
  • Morphology Data - structure of neurons
  • Electrophysiology Data- Recordings from patch
    clamp system, response of neurons to stimuli
  • Gene Expression Data expression of specific
    genes. Started to collect DNA microarrays
  • Feature DB primary features extracted from
    electrophysiology data
  • Synapse DB properties of synaptic connections,
    release probabilities , conductances etc
  • Index DB A central bookkeeping and
    synchronization system.
  • Microcircuit DB a description of the
    microcircuit with 10,000 neurons and 2 million
    connections

39
Morph DB
Synapse DB
EP DB
Gene Expr DB
Index DB
Feature DB
SRB System
Files
Relational DB
40
Part IV
  • Conclusion and Discussions

41
Reapplying challenges
  • Diversity - morph , electro, gene , statistics ,
    models
  • Unconventional Queries - need for expert
    knowledge to draw relations
  • Uncertainty in the data
  • Need for annotations, taxonomy , metadata (person
    doing experiemnts on PC does not record the type
    of cell in his lab book)
  • Schema Evolution
  • Need for integration

42
Conclusion
  • A basic framework for storing neocortical
    microcircuit data (primary) as well as metadata
  • Tools for data upload, consistency checking, data
    download and browsing
  • Capability to store different type of data i.e.
    images, recordings, ascii files etc
  • Hierarchical database infrastructure supporting
    secondary and specialized database
  • Flexible structure catering for new data types

43
Conclusion (in progress)
  • Using standard taxonomy , Using ontologies to
    facilitate integration within various databases,
    and with external databases
  • Data Integration across multiple databases for
    supporting experimentalists
  • Development of efficient and user friendly
    interaction environments
  • Knowledge based Query Environment
  • Scalability Issues

44
Thank you
Write a Comment
User Comments (0)
About PowerShow.com