Content, Format, and Standards in Genomics Scale Data - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Content, Format, and Standards in Genomics Scale Data

Description:

Why do we need a database for toxicogenomics. How is it envisioned that this ... sequence is required to resolve discrepancies through automated bioinformatics ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 37
Provided by: william323
Learn more at: https://www.fda.gov
Category:

less

Transcript and Presenter's Notes

Title: Content, Format, and Standards in Genomics Scale Data


1
Content, Format, and Standards in Genomics
Scale Data
  • The ILSI EBI Collaboration
  • Wm. B. Mattes, PhD, DABT

2
Outline
  • Why do we need a database for toxicogenomics
  • How is it envisioned that this will be developed
  • What are the issues for such a database
  • Who is involved in such development
  • The ILSI EBI Collaboration

3
Traditional Biology
One tree at a time
4
Omic Biology
Forests and Mountains
5
Challenge of Genomics
  • Its the informatics, period!
  • And its awfully tempting to take shortcuts!

6
Why do we need a database?
  • Volume of data
  • Traditional endpoints per animal
  • lt20 histopathology observations
  • lt10 gross measurements (e.g. weights, food)
  • lt25 serum measurements
  • lt10 urinalysis measurements
  • Genomic endpoints per animal
  • 5,000-10,000 transcripts !!!

7
Why do we need a database? (cont)
  • Influence of technology details
  • Influence of probe sequence
  • Many genes are alternatively spliced such
    events may not be detected unambiguously by a
    microarray

8
Influence of Probe Sequence
Most arrays target this region of the mRNA!
9
Why do we need a database? (cont)
  • Influence of technology details
  • Influence of probe sequence
  • Many genes are alternatively spliced such
    events may not be detected unambiguously by a
    microarray
  • For cDNA arrays, probes may hybridize to more
    than one sequence
  • A database that captures probe sequence is
    required to resolve discrepancies through
    automated bioinformatics

10
How are databases being developed?
  • Microarray Gene Expression Data Society - MGED
    Society
  • MIAME - Minimum Information About a Microarray
    Experiment
  • the minimum information that should be reported
    about a microarray experiment to enable its
    unambiguous interpretation and reproduction
  • Essentially, what should go into the database

11
How are databases being developed?
  • MIAME Basic Areas
  • Experiment Design
  • Samples used, extract preparation and labeling
  • Hybridization procedures and parameters
  • Measurement data and specifications
  • Array Design

12
How are databases being developed? (cont)
  • MGED Society
  • MAGE
  • Programming conventions and data structures to
    communicate Microarray Gene Expression data
  • MAGE-OM Object Model
  • MAGE-ML Markup Language
  • Essentially, how the data is exchanged/ how the
    database is constructed

13
How are databases being developed? (cont)
  • MGED Society
  • Ontology working group
  • Ontologies provide a vocabulary for representing
    and communicating knowledge about a
    topic,allowing interpretation and use by
    computers
  • MGED Ontology will provide standard terms for the
    annotation of microarray experiments, allowing
  • structured queries
  • unambiguous descriptions of experiments

14
How are databases being developed? (cont)
  • MGED Society
  • Data Transformation and Normalization Working
    Group
  • Standards for recording how microarray data are
    transformed and normalized.

15
What are the issues for a toxicogenomics database?
  • Scope of the ILSI effort
  • Genotoxicity Group
  • 10 array platforms
  • 11 compounts
  • gt2 time points, up to 10 doses / compound
  • Nephrotoxicity Group
  • 6 array platforms
  • 3 compounds, 260 animals

16
What are the issues for a toxicogenomics database?
  • Scope of the ILSI effort
  • Hepatotoxicity Group
  • 8 array platforms
  • 2 compounds, 144 animals
  • 2 in-life studies / compound
  • ALL Groups
  • Analysis of each sample at multiple sites

17
What are the issues fortoxicogenomics databases?
(cont)
  • Traditional toxicology endpoints are not
    currently covered by MAGE, MIAME, or the MGED
    Ontologies!
  • Organ weights
  • Clinical pathology
  • Histopathology
  • Etc

18
What are the issues for toxicogenomics databases?
  • Traditional toxicology endpoints are not
    standardized in nomenclature
  • Clinical pathology/chemistry
  • AACC
  • IUPAC
  • Histopathology
  • STP
  • WHO/IARC/RITA
  • NACAD
  • SNOMED
  • NTP, TDMS Database Pathology Code Table

19
Who is involved in database development
  • Private Companies
  • Genelogic, Iconix, Curagen
  • MSU - dbZach
  • NIEHS - CEBS
  • NCTR - ArrayTrack
  • ILSI - EBI

20
ILSI-HESI and EBI collaboration
  • Establishment of database for toxicogenomics data
  • Capture, store and analyse gene expression data
    produced from many different toxicogenomic
    experiments, conducted in several different
    laboratories worldwide by the ILSI-HESI members
  • Interrogate the gene array data integrating
    information from genomic, experimental and
    toxicological domains
  • Gain knowledge of possible links between gene
    expression changes and toxicological endpoints

21
ILSI-HESI and EBI collaboration
  • Aims of the database and tools
  • Provide a way to integrate the different domains
  • Control the annotation to achieve data
    harmonization
  • Centralize the information to ease data access
    and data sharing
  • Improve array annotations as the genome
    assemblies are released
  • ALLOW data comparison

22
ILSI-HESI and EBI collaboration
  • Main challenge
  • Get internally consistent data to allow
    comparability among the experiments and run
    complex queries across and within domains
  • Note Experiments conducted in 40 different
    sites, using different array platforms and
    terminologies, measuring parameters with
    different units and storing information in
    different format !

23
ILSI-HESI and EBI collaboration
  • Simple question
  • Does gene X expression goes up after treatment
    with compound Y with biological endpoint Z in
    experiments from ILSI-HESI members A and B ?
  • Not simple question
  • Which are the most reproducible gene expression
    changes (and the quantitative measure of this
    reproducibility) for all experiments on the rat
    arrays, with biological endpoint X, and which
    functional category these genes belong to and
    which are the human homologues ?

24
MIAME/Tox
  • An international effort aiming to
  • Share expertise
  • Encourage harmonization
  • Promote standardization initiative
  • A call for community participation!

25
MIAME/Tox objectives
  • Standard contextual information
  • Establish worldwide scientific consensus on the
    minimal information descriptors for array-based
    toxicogenomics experiments
  • Data harmonization
  • Encourage use of controlled vocabularies for the
    toxicological assessments
  • Data integration and data sharing
  • Link data within a study
  • Link several studies from one institution
  • Exchange datasets among institutions
  • Data storage
  • Facilitate development of MIAME/Tox compliant
    data management softwares and databases
  • - ArrayExpress _at_ EBI and CEBS _at_ NIEHS-NCT

26
MIAME/Tox document
  • Promote standard contextual information
  • Defining the core common to most experiments
  • - Minimum/sufficient information
  • Structured information
  • Promote data harmonization, data capture and
    communication
  • MIAME/Tox is based on MIAME
  • Focus on toxicological domain
  • Sample treatment and conventional toxicology
    information
  • - Clinical pathology, pathology, histopathology

27
MIAME/Tox document
  • Available at the MGED Society and ILSI-HESI web
    sites
  • Circulate for consensus
  • Toxicogenomics, pharmacogenomics and
    ecotoxicogenomics communities
  • - Regulatory bodies
  • MGED Meeting (AAAS, Denver, Feb 2003 MGED6,
    France, Sept 2003)
  • - Toxicology societies (SOT Meeting, Salt Lake
    City, March 2003)
  • Review and publish
  • Work closely with the MGED working groups
  • Ontology working group
  • Identify controlled vocabularies for
    toxicological metadata

28
Data Input As a Key Step
  • Capture data in a standard manner
  • Tox-MIAMExpress
  • Store information domains in database
  • ArrayExpress
  • Compare/query across and within domains

29
Tox-MIAMExpress
30
Tox-MIAMExpress
  • Array designs
  • A set of procedures for formatting the array
    design information into a standard referencing
    format (ADF)
  • A set of procedure to re-annotate or up date the
    array designs via a link to another database at
    EBI (EnsMart)

31
Tox-MIAMExpress
  • Experiment
  • Experiment design, quality controls,
    publications
  • Sample source and treatment
  • Conventional toxicology tests data
  • Microarray hybridizations data

32
Tox-MIAMExpress
33
Tox-MIAMExpress
34
Tox-MIAMExpress
35
ILSI-HESI and EBI collaboration
  • Status
  • Interface and database infrastructure developed
  • Data input ongoing

36
Acknowledgments
  • Microarray Informatics Team at EBI, in particular
  • Alvis Brazma (Team Leader and MGED Society
    President)
  • Susanna-Assunta Sansone
  • Philippe Rocca-Serra (Data Management)
  • NIEHS-NCT and NTP
  • ILSI-HESI EBI Steering Committee
  • ILSI-HESI Genomics Committee
Write a Comment
User Comments (0)
About PowerShow.com