A Design to Integrate Heterogeneous Microarray Databases - PowerPoint PPT Presentation

1 / 8
About This Presentation
Title:

A Design to Integrate Heterogeneous Microarray Databases

Description:

[Stekel, 2003] D. Stekel, Microarray Bioinformatics, Cambridge University Press, 2003. ... Bioinformatics (CSB'2004), August 2004. ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 9
Provided by: jon9154
Category:

less

Transcript and Presenter's Notes

Title: A Design to Integrate Heterogeneous Microarray Databases


1
A Design to Integrate Heterogeneous Microarray
Databases
  • Abhishek Dabral
  • Jonathan Gavenda

2
Background
  • What are Microarrays?
  • Generally speaking DNA expression chips
  • So what is the problem?
  • 1000s of genes/chip 1000s of chips vast
    amount of information
  • Heterogeneous databases

3
Our goal
  • is the creation of anidealized system that
    actively identifies data sources of interest,
    automatically overcomes syntactic and semantic
    heterogeneities wherever it discovers them, and
    provides transparent declarative, optimized query
    access over all sources.
  • Specifically targeted for microarray domains
  • Benefits cost and time-efficient

4
First Step Standardization
  • Microarray Gene Expression Data (MGED) Society
  • Minimum Information About a Microarray Experiment
    (MIAME)
  • Microarray and Gene Expression (MAGE) group

5
Methodology
  • Step 1 The schemas from the data sources are to
    be extracted.
  • Step 2 Tokenization step - the table name along
    with the attribute to be the token for the
    schema. Carried out for every table in each of
    the schemas.
  • Step 3 An ontology is constructed from these
    schemas.
  • Step 4 The object classes from Step 2 are
    clustered into class clusters

Schema
6
Methodology
  • Step 5 The clusters obtained from Step 4 are
    treated as base sets and their ontologies are
    traced and the common features are examined to
    arrive at an ontology base set per cluster.
  • Step 6 The ontologies in the ontology base set
    from Step 5 are integrated to obtain ontology
    clusters. The result of this step is a cluster
    of related metadata terms grouped together.
  • Step 7 The ontology clusters are then named
    accordingly and recorded in the metadata updater
    to be used in the next iteration.

7
Architecture
  • Semantically Enhanced Enterprise Directory
    Services (SEEDS)

Specific to the microarray domain, the
preliminary goal of microSEEDS architecture is to
minimize semantic heterogeneity through proactive
promotion of semantic homogeneity.
8
Selected References
  • Microarray Gene Expression Data Society (MGED),
    http//www.mged.orgJagadish and Olken, 2003
     H. V. Jagadish and F. Olken, Database
    Managementfor Life Science Research Summary
    Report of the Workshop on Data Management
    forMolecular and Cell Biology at the National
    Library of Medicine, Bethesda,Maryland, February
    2-3, 2003, OMICS A Journal of Integrative
    Biology, 7(1),2003.Stekel, 2003 D. Stekel,
    Microarray Bioinformatics, Cambridge University
    Press,2003.Kohonen  et al., 2000 T. Kohonen,
    S. Kaski, K. Lagus, J. Salojärvi, J.Honkela, V.
    Paatero and A. Saarela, Self-Organizing Maps of
    Massive DocumentCollections, Proceedings
    International Joint Conference Neural Networks,
    IEEE,Piscataway, NJ, USA, 23-9, 2000.Li et
    al.,2004 Lei Li, Vijay Vaishnavi, and Art
    Vandenberg. "An Architecturefor Semantic
    Facilitation and Reuse of Directory Metadata"
    Proc. 2004International Conference on
    Information and Knowledge Engineering  (IKE'04
    June21-24, 2004, Las Vegas, Nevada, USA), to
    appear.Liang, J., Vaishnavi, V., and
    Vandenberg, A. Clustering of LDAP
    DirectorySchemas to Facilitate Information
    Resources Interoperability AcrossOrganizations.
    IEEE Transactions on Systems, Man, and
    Cybernetics, Part A (toappear). Liu, Y.,
    Ciliax, B. J., Borges, K., Dasigi, V., Ram, A.,
    Navathe, S., andDingledine, R. Comparison of
    Two Schemes for Automatic Keyword Extraction
    fromMEDLINE for Functional Gene Clustering.
     IEEE Conf. on Computational SystemsBioinformatic
    s (CSB2004), August 2004.Vandenberg, A.,
    Liang, J., Bolet, V., Kou, H., Vaishnavi, V., and
    Kuechler, D.Research Prototype Semantic
    Facilitator TM SM for LDAP Directory
    Services.Proceedings of the 12th Annual Workshop
    on Information Technologies and Systems,2002.
Write a Comment
User Comments (0)
About PowerShow.com