Community Standards and Comparative Genomics - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Community Standards and Comparative Genomics

Description:

Community Standards and Comparative Genomics – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 45
Provided by: doree
Category:

less

Transcript and Presenter's Notes

Title: Community Standards and Comparative Genomics


1
Community Standards and Comparative Genomics
  • Doreen Ware USDA-ARS
  • Cold Spring Harbor Laboratories
  • Plant Ontology Workshop May 30, 2006

2
Outline
  • Introduction to Gramene
  • Data integration why is it important
  • Example using Genome Sequencing and Mapping
  • Why standards are important for data management
    and integration?
  • Repositories, protocols, nomenclature,
    vocabulary, formats, user interfaces
  • Comparative Resources
  • PlexDB www.plexdb.org, Pathways tools in
    Gramene, Reactome www.reactome.org

3
Gramenewww.gramene.org
4
Gramene
  • As an information resource, Gramene's purpose is
    to provide added value to data sets available
    within the public sector, which will facilitate
    researchers' ability to leverage the rice genomic
    sequence to identify and understand corresponding
    genes, pathways and phenotypes in other crop
    grasses.
  • This is achieved by building automated and
    curated relationships between rice and other
    cereals for both sequence and biology. The
    automated and curated relationships are queried
    and displayed using controlled vocabularies and
    web-based displays.
  • The controlled vocabularies (Ontologies),
    currently being utilized include Gene ontology,
    Plant ontology, Trait ontology, Environment
    ontology and Gramene Taxonomy ontology.
  • The web-based displays for phenotypes include the
    Genes and Quantitative Trait Loci (QTL) modules.
    Sequence based relationships are displayed in the
    Genomes module using the genome browser adapted
    from Ensembl, in the Maps module using the
    comparative map viewer (CMap) from GMOD, and in
    the Proteins module displays. BLAST is used to
    search for similar sequences. Literature
    supporting all the above data is organized in the
    Literature database.

5
Why data integration?
  • It is often the case that two data sets, when
    integrated, are far more useful than the two data
    sets taken individually.

6
Genome Sequence and ESTs
7
Genome Sequence and SNP data
8
SNPs in context to the a protein sequence
9
Automated Annotation
10
Synteny view of Rice Chromosome 1 to the maize
genome
http//www.gramene.org/Oryza_sativa/syntenyview?ot
herspeciesZea_mayschr1x15y13
11
Bioinformatic Food Chain
Plant Biology Databases A needs Assessment
November 2005
12
What are the appropriate repositories for the
data sets?
  • Standards for release

13
Static repositories for long term storage
  • GenBank (Benson et al 2004), sequence submissions
  • GEO (Barrett et al 2005), a repository of
    microarray expression data
  • PDB (Westbrook et al 2003), a repository of x-ray
    crystallographic structures

14
If no static repository exist what to keep and
why?
  • Examplegenotype data from recombinant inbred
    (RI) lines
  • Multiple labs use the same germplasm for trait
    evaluation
  • Analysis tools change overtime
  • Integrate data from multiple experiments

15
What is the method for producing the alignments?
16
Provide information on the data set, the method
used and the when the analysis was completed
17
Standard Operating ProtocolsSOPs
  • Data integration often requires additional
    analysis, and these require decisions.
  • Document these to allows end-users and the
    individuals working on the project to understand
    the process and what decisions were made along
    the way.
  • Establish Quality Assurance and Control at each
    step

18
Separate the technical infrastructure from the
human infrastructure
  • Many automated computational tasks that do not
    require specialized species- or biology-specific
    knowledge.
  • In order to avoid redundant and inconsistent
    efforts, encourage partnerships between groups
    that can provide technical infrastructure for
    automated annotation tasks and groups that are
    knowledgeable about the underlying biology
    associated with the data set.

19
Standardize Nomenclature
20
Standardize Nomenclature
  • Example Genomic Clones used as templates for the
    physical map and genomic sequence
  • 59P7
  • c59P7
  • c0059P07
  • ZM0059P07
  • ZMC0059P07
  • ZMCUNK_0059P07

21
Standardize Vocabulary
22
Ontologies to describe attributes of the data set
  • Ontologies are sets of vocabulary terms whose
    meanings and relations with other terms are
    explicitly stated in such a way as to be
    comprehensible to humans and computer programs.
  • Ontology-building has emerged as a major activity
    of curated repositories because by annotating
    data sets using a shared set of ontologies,
    repositories can establish connections both
    within the data sets they curate and across data
    sets contained within different repositories.

23
GOGene Ontologyhttp//www.geneontology.org
  • molecular function
  • describes activities, such as catalytic or
    binding activities, at the molecular level
  • biological process
  • is series of events accomplished by one or more
    ordered assemblies of molecular functions
  • cellular component
  • a component of a cell but with the proviso that
    it is part of some larger object, which may be an
    anatomical structure (e.g. rough endoplasmic
    reticulum or nucleus) or a gene product group
    (e.g. ribosome, proteasome or a protein dimer)

24
POPlant Ontologywww.plantontology.org
  • Plant Structure
  • A controlled vocabulary of botanical terms
    describing morphological and anatomical
    structures representing organ, tissue and cell
    types and their relationships. Examples are
    stamen, gynoecium, petal, parenchyma, guard cell,
    etc.
  • Growth and Developmental Stages
  • A controlled vocabulary of terms describing
    growth and developmental stages in model plant
    species and their relationships. Examples are
    embryo development stage, seedling stage,
    flowering stage, etc

25
OBOOpen biomedical Ontologieshttp//obo.sourcefo
rge.net
26
Example ontologies currently available from OBO
27
Evidence and Attribution tracking
  • Evidence tracking links an assertion contained
    within a repository to the underlying evidence
    that supports that assertion.
  • Attribution tracking links a data set and
    annotations on the data set to the individual or
    group that produced it

28
Standardize data formats and user interfaces
  • The lack of standard file formats provides
    friction that increases the cost and decreases
    the pace of active curation and data integration.
  • The lack of standardization of user interfaces
    leads to frustration on the part of researchers
    who cannot easily move from one repository to
    another.

29
Description of the fields in the database and
intended use
  • Example the database field was location
  • Mexico
  • Jalisco
  • third field off of the first left on the main
    road out of town

30
End-Users Who are they and what are their needs?
  • Naïve end-users require easy-to-use and intuitive
    interfaces that nevertheless provide them with
    access to the full data set. These users are
    often satisfied with one-object-at-a-time
    interfaces, such as those provided by almost all
    biological databases.
  • More sophisticated users require query interfaces
    that allow them to integrate multiple data sets
    within the current repository.
  • The most sophisticated users wish to integrate
    multiple data sets across multiple repositories.

31
Generic Model Organism Database
  • The Generic Model Organism Database (GMOD)
    Project is a largely open source project to
    develop a complete set of software for creating
    and administering a model organism database.
    Components of this project include genome
    visualization and editing tools, literature
    curation tools, a robust database schema,
    biological ontology tools, and a set of standard
    operating procedures.

32
GMODwww.gmod.org
33
Comparative Resources
34
Comparative Resources
  • Microarray Resource
  • PlexDB
  • MIAME Standard compliant
  • Pathway Databases
  • MetaCyc pathway tools
  • Biochemcial pathways
  • Reactome
  • Biological processes

35
(No Transcript)
36
MIAME Standards
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
Pathway Database
  • Standardized formats are maturing for data
    exchange
  • User meeting are necessary to facilitate the
    development
  • Existing resources provide useful cross species
    comparisons, to leverage experimental validation
    between organisms, identify gaps in pathways,
    and overlap between species

42
Inference of a rice reaction set from Arabidopsis
with Pathway Tools in Gramene Focus is on
biochemical pathways
43
SkyPainter view from Reactome
Inference of a rice reaction set from human with
Reactome / OrthoMCL Focus on biological process
44
http//dev.gramene.org/about/personnel.html
PlexDB Roger Wise Julie Dickerson Pathway
Tools SRI Peter Karp Reactome Peter
dEstachio Guanming Wu Lincoln Stein Funding
NSF and USDA-ARS
Write a Comment
User Comments (0)
About PowerShow.com