G2G - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

G2G

Description:

Frustrated biologist. DNA seq (Mac), Standen(Unix/Sun), EMBL (VAX/VMS) camel.tar.gz ... most effective? Efforts of both biologists and technoligists. Punchline ... – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 20
Provided by: ignumDlSo
Category:
Tags: g2g | biologist

less

Transcript and Presenter's Notes

Title: G2G


1
G2G
  • A Peer-to-Peer Architecture for Gene Expression
    Data

Jason E. Stewart Open Informatics www.OpenInformat
ics.com
2
How I got Here
  • Frustrated biologist
  • DNA seq (Mac), Standen(Unix/Sun), EMBL (VAX/VMS)
  • camel.tar.gz
  • Unix Power Tools gt Perl
  • Lincoln Stein (DAS, etc.)
  • www.OpenInformatics.com

3
Prophecy
  • Distributed data
  • It's already here
  • GenBank/TrEMBL/SwissProt ...
  • WormBase/TAIR/FlyBase ...
  • Microarray data has no central DB
  • ArrayExpress/GEO
  • How can we make it most effective?
  • Efforts of both biologists and technoligists

4
Punchline
  • Ontologies to unify queries
  • Controlled Vocabularies not AI
  • Object Models (UML)
  • Open Source reference implementations
  • Wrong-Is-Right
  • Queries
  • Language XML format parser
  • Use cases In rats treated with CFB at 250mg per
    day and pooled, which genes have changed more
    than 2-fold compared to control rats

5
Payoff
  • XML-based markup language
  • Programming API's Java, Perl, C
  • Relational Schema

6
G2G Architecture
  • Enables Gene Expression data queries and
    downloads from remote data sources
  • Primary focus data discovery
  • Technologies
  • HTTP/XML
  • SOAP/WDSL/UDDI
  • MOBY

7
Data in the Big
  • 88 hybridization Mouse experiment
  • measured data (70Mb), XML ?? Gb
  • Number of data sets (few vs. many)
  • Contextual data (little vs. large)
  • Services (BLAST vs. Clustering)
  • Conclusion GE resources will be different than
    DNA sequence oriented ones

8
P2P Metaphor
  • G2G not P2P (napster, seti_at_home)
  • Uses DNS and central service directory
  • Advantages
  • Distribution of resources (CPU Bandwidth)
  • Distribution of services
  • Distribution of annotation (richer)
  • Disadvantages
  • Authoritative source ??
  • Peer review ??

9
Implementing G2G
  • Infrastructure
  • GeneX
  • MAGE/MAGEstk
  • Perl/LWP/Apache/mod_perl
  • Issues
  • Query and response formats (MAGE-ML)
  • Creating clients and servers (GeneX)
  • Only supports simple queries

10
GeneX (genex.sf.net)
  • Open Source gene expression DB
  • Laboratory data
  • Analysis framework
  • Components
  • Relational schema
  • Data Server Query tools
  • Analysis tools
  • Perl API
  • Active development, two NSF grants

11
MGED (www.mged.org)
  • Industry and Academia
  • Standards for microarray data
  • MIAME (published in Nature Genetics)
  • MAGE (mged.sf.net)
  • Ontologies (mged.sf.net)
  • Queries
  • Normalization

12
Review of MIAME
  • In MIAME
  • Samples Samples used, the extract preparation
    and labeling
  • Hybridizations Procedures and parameters
  • Array design Each array used and each element
    (spot) on the array
  • Measurements Images, quantitation,
    specifications
  • ControlsTypes, values, specifications.
  • Experimental designThe set of the hybridization
    experiments as a whole
  • In MAGE-OM
  • BioMaterial
  • BioAssay
  • ArrayDesign
  • BioAssayData
  • ExperimentDesign
  • ExperimentDesign

13
MAGE-OM
  • Collaborative effort to develop a common
    representation of gene expression experiments and
    associated annotations.
  • Industry Rosetta, Agilent, Affymetrix
  • Academic MGED
  • UML based model gives rise to
  • MAGE-ML XML DTD
  • MAGEstk software toolkit
  • Java, Perl, C programming language APIs
  • XML Reader/Writer
  • DB Serializer (future)

14
MAGE-OM Packages
  • AuditAndSecurity
  • Description
  • BioEvent
  • BioMaterial
  • BioSequence
  • HigherLevelAnalysis
  • Protocol
  • ProtocolApplication

Experiment
BioAssay
BioAssayData
ArrayDesign
DesignElement
Maps
Array
QuantitationType
15
Demo
16
Issues
  • User control of local resources
  • What queries should be supported
  • What services should be supported
  • Simple lists of data (current)
  • Analyses
  • Complex queries with subsetting on genes and
    context

17
How to Help
  • MGED (mged.org)
  • Ontlogies (mged.sf.net, geneontology.org)
  • MAGE (mged.sf.net)
  • GeneX, G2G (genex.sf.net)

18
Acknowledgments
  • MGED
  • Alvis Brazma
  • Terry Gaasterland
  • MAGE
  • Paul Spellman
  • Michael Miller
  • Charles Troup
  • Steve Chervitz, Derek Bernhardt
  • Eric Deutsch, Robert Hubley
  • Angel Pizzaro
  • GeneX
  • Jennifer Weller
  • Harry Mangalam
  • Karen Schlauch
  • Michael Pear

19
Radical Subversion
  • Publicly funded software should be Public
  • Www.OpenInformatics.org
Write a Comment
User Comments (0)
About PowerShow.com