Plant Ontologies Industrial Science meets Renaissance Concepts - PowerPoint PPT Presentation

About This Presentation
Title:

Plant Ontologies Industrial Science meets Renaissance Concepts

Description:

High-throughput data. Sequencing. Expression. Medium-throughput data. Proteomics. Metabolomics ... Renaissance concepts (historically Enlightenment) ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 27
Provided by: pion6
Category:

less

Transcript and Presenter's Notes

Title: Plant Ontologies Industrial Science meets Renaissance Concepts


1
Plant Ontologies Industrial Science meets
Renaissance Concepts
  • Dave Selinger
  • Computational Biologist
  • Pioneer Hi-Bred,
  • DuPont Agriculture and Nutrition

2
Outline
  • What is the nature of the problem that a Plant
    Anatomy Ontology can solve?
  • What is an Ontology?
  • How do you make a Plant Anatomy Ontology?
  • Does it really solve the problem?

3
Industrial Science
  • Not science in industry, but the
    industrialization of data creation, i.e. the
    omics revolutions.
  • High-throughput data
  • Sequencing
  • Expression
  • Medium-throughput data
  • Proteomics
  • Metabolomics
  • Low-throughput data
  • Gene/protein function
  • Phenotype

4
The double-edged sword of Industrial Science
  • Industrial science means lots of cheap data
  • Sequencing ltlt 0.01/base
  • 10,000 prokaryotic genomes are reality
  • 10,000 eukaryotic genomes will be reality in the
    next five years
  • Expression lt0.50/gene
  • And much of this data is available for free after
    it is produced!
  • Lots of data means that you cant sit down with
    your lab notebook and analyze the data by hand.
  • Databases, software for searching and comparing
  • Whole new areas of research devoted to finding
    meaningful patterns in lots of data.

5
Organizing information
  • Information is not knowledge.
  • But knowledge can be acquired from information.
  • But only with a lot of effort, see third law of
    thermodynamics
  • Central challenge with Industrial science is
    organizing the information.
  • The organization of the information determines
    what you can discover.
  • Experimental design
  • Good design will produce a contrast that will
    support or refute a hypothesis.
  • Statistical rigor
  • Is the signal higher than the noise?
  • How conclusive will the discoveries be?

6
Context
  • How do we compare across experiments?
  • Not too hard if one person did all the
    experiments and kept careful notes.
  • If multiple people, then we need to define what
    was done, what the analysis was, and what the
    sample was.
  • What was done e.g. MIAME standard for
    describing the technical details of an expression
    experiment.
  • Analysis e.g. ANOVA, SAM, etc.
  • Sample ?

7
Renaissance concepts (historically Enlightenment)
  • Things can be systematically described and
    classified
  • Organisms - Linneaus, Species Plantarum, 1758
  • Linneaus problem is much the same as the sample
    description problem
  • Variable specificity
  • California Laurel or Oregon Myrtlewood?
  • Kernel or seed?
  • In addition, a term like kernel assumes all
    parts, but this assumption could be wrong

8
Ontologies to the rescue?
  • Ontology the study of being (Philosophy)
  • The specification of a conceptualization of a
    domain of interest (Computer Science)
  • Original and continuing computer science interest
    was Artificial Intelligence.
  • How can a computer make inferences?
  • Need to define meanings can for example.
  • Structure and relationships in an ontology allow
    a computer to make inferences.
  • Mary is the mother of Bill. Is Mary a parent of
    Bill?
  • IsA Mother Parent
  • Parts of an ontology
  • Concepts -gt objects, real and abstract,
    processes, functions
  • Partitions -gt rules that can classify concepts
  • Attributes -gt properties of a concept, can have
    individual and class attributes
  • Relationships -gt is a, part of

9
Does an ontology make sense?
  • The value of ontologies is a current debate among
    information scientists.
  • One group advocates that ontologies are necessary
    for computers to understand content.
  • Semantic web -gt an extension of the current
    HTML/XML based web to something with ontological
    inference
  • Others argue that ontologies are not needed and
    are not practical
  • Complexity is ok and just use a Google like
    search to connect concepts.
  • However, some problems, like organismal
    classification and the periodic table are very
    amenable to an ontological approach.
  • Formal categories and stable entities
  • Expert users and catalogers

10
Forms of ontologies
  • Ontologies can take several forms (data
    structures)
  • Controlled vocabulary (List)
  • Terms but no relationships
  • Enforces systematic naming
  • Hierarchy (tree structure) gt Taxonomy
  • Terms and is a relationship
  • Children are unique and have a single parent
  • Directed acyclic graph gt Gene Ontology
  • Multiple relationship types
  • Children with multiple parents

11
Features of Trees
  • Because each child node has only one parent
  • There is an unambiguous path to the root from
    each leaf
  • Child nodes can be easily grouped at any level of
    the structure
  • Trees can express only one organizing principle
  • Work well for taxonomy (at least eukaryotic
    taxonomy)
  • Organizing principle is classification by
    similarity
  • All terms have an is a relationship to the next
    level term
  • Organisms were classified before evolution was
    hypothesized, but the classification matches the
    evolutionary relationships
  • Similar example would be the periodic table of
    the elements
  • Classification can facilitate discovery of
    underlying principles

12
A tree based Anatomy Ontology
  • Developed by Winston Hides group at SANBI and
    Electric Genetics
  • Single concept, orthogonal trees
  • Cells
  • Tissues
  • Organs
  • Disease state
  • Each tree is independent, but has related
    dimensions describing a sample
  • Set operations, intersection or union, between
    trees allows specific queries.

13
Features of DAGs
  • A tree is a special case of the DAG class
  • Children can have multiple parents.
  • Allows multiple classifications of the same child
  • E.g. a guard cell is both part of a leaf and is
    an epidermal cell.
  • Allows for more than a binary classification of a
    concept
  • If this results from poor definition of the
    concept, then it is not good.
  • Multiple parentage fits a normalized data model
  • Like a normalized relational database, a DAG can
    minimize duplication of objects (concepts).

14
Sample DAG
  • Root
  • Cooking
  • Spices
  • Bay leaf
  • Laurel nobilis
  • Umbellularia californica (California laurel)
  • Trees
  • Lauraceae
  • Laurel
  • Laurel nobilis
  • Umbellularia
  • Umbellularia californica

15
Constructing the Pioneer Plant Ontology
  • Decided to produce a DAG
  • Used DAGeditor (editor developed for GO)
  • Developed our own web based viewing tool
  • AmiGO was too complicated to re-use. Other
    public browsers did not have the functionality we
    wanted.
  • Decided to focus on Corn and Soybeans
  • Used Kiesselbachs 1949 Monograph on Corn
    structure and reproduction as the primary source.
  • Used Iowa State University Ag Extension
    publications for the development stages of corn
    and soybeans
  • Added information from a botany textbook to cover
    missing terms from soybean.

16
To collaborate or not to collaborate?
  • Advantage of just using the Pioneer Ontology was
    that it served our needs and was focused on corn
    and soybeans, our major crops.
  • Disadvantage was that it was not synchronized to
    the public
  • We would not be able to easily integrate public
    tissue classifications to ours
  • We would not be able to easily take advantage of
    improvements to the public ontology
  • Presumably the public ontology would be more
    botanically correct than ours.

17
Plant Ontology Consortium
  • Focused on model organisms
  • Arabidopsis
  • Rice and other grasses with the rice terms
    (corn).
  • Used a DAG approach
  • Multiple concepts
  • Structure (cells, tissues, sporophyte and
    gametophyte)
  • Development
  • Used DAGeditor and other GO approaches
  • Most terms have multiple parents
  • Same software and data structures as GO

18
Plant Ontology
  • Domain Plant anatomy and development
  • Concepts
  • Plant parts (leaf, root, flower, meristem, etc.)
  • Life cycle stages (sporophyte, gametophyte)
  • Developmental stages (V1, flowering, R1, etc.)
  • Relationships between concepts
  • A kind of (Is a)
  • A prop root is a root
  • A part of (part of)
  • A root cap is part of a root
  • In addition, for plant anatomy a develops from
    relation is needed
  • For example the relationship between stomatal
    guard cells and the guard mother cell
  • Guard cells develop from guard mother cells

19
Adapting the POC ontology for Pioneers needs
  • Problem is that it has many more terms than
    required for our experiments
  • Some terms describe tissues or cells that are not
    practical to collect (e.g. antipodal cells)
  • Some terms describe parts not found in corn (e.g.
    nectary)
  • Another problem is that we collect samples that
    are convenient subdivisions of structures
  • Tip and base of an immature ear. Each differs
    from a whole immature ear in terms of what it
    contains.
  • Basal endosperm morphologically distinct from
    starchy endosperm, but not found in the ontology

20
Our current solution
  • Add additional terms to the POC ontology
  • Use a different id system
  • easily distinguished from POC terms
  • will not be overwritten by on-going public
    curation efforts.
  • Label experiments with the terms from the
    ontology.
  • Create a Custom ontology
  • Query the whole ontology with the terms used in
    the labeling and keep only
  • terms that are used to label an experimental
    sample
  • Parent terms of used terms.
  • Can be readily rebuilt if new experiments or
    terms are added.

21
What can you do with the ontology?
  • Provides a grouping mechanism
  • Summarize expression for a tissue
  • Compare expression between tissues
  • Make complex queries that involve multiple
    tissues
  • Provides a systematic label for annotating genes
  • Where is the gene expressed?
  • Query annotation of genes based on terms
  • Provides a description of the complexity of
    tissue samples
  • Leaf sample is composed of multiple cell types
    with different roles
  • Cell types can be shared between tissues or
    structures

22
Comparing by tissue
  • The ontology provides the groupings, but how to
    summarize
  • Mean?
  • Median?
  • Maximum value?
  • Significance of differences?
  • Each group will be much more variable than a set
    of samples from a controlled experiment.
  • But you may be able to eliminate the inevitable
    false discoveries that appear when looking at
    large numbers of genes.

23
Annotating genes
  • This is the primary use for TAIR and Gramene
  • Potentially label most genes with tissues of
    expression
  • However, need to differentiate presence with
    preferential expression.
  • A gene may be present in many tissues, but highly
    expressed in a few
  • Another gene may be present in the same tissues,
    but similarly expressed in all of them.
  • Might need to precompute and indicate which
    tissues the gene is significantly preferentially
    expressed in.
  • Might be able to use the RMS differences between
    expression in each tissue as a measure of
    consistency.

24
Complexity
  • Genes may appear to differ between tissues for
    trivial reasons
  • Example Gene appears to be preferentially
    expressed in stem versus leaf tissue.
  • If gene is really specific to vascular tissue and
    stem has more
  • Gene is expressed late in development, adjacent
    leaves and stems may differ in development.
  • Ontology can guide further experiments
  • Compare vascular and non-vascular tissue from
    both leaf and stem.
  • Compare multiple leaf and stem samples from
    different positions (developmental stages).

25
Conclusions
  • The Plant Ontology classifies experiments and
    genes based on anatomical and developmental
    concepts.
  • Now that we have significant data, can we, like
    Darwin, discern the underlying mechanisms for how
    anatomical and developmental differences occur.
  • The Plant Ontology will be successful and used
    long term if it facilitates these kinds of
    investigations.

26
Acknowledgements
  • Pioneer
  • Henry Mirsky
  • Lane Arthur
  • Bob Merrill
  • POC
  • Doreen Ware (Gramene)
  • Katica Ilic (TAIR)
Write a Comment
User Comments (0)
About PowerShow.com