T. M. Murali - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

T. M. Murali

Description:

Slide 1 – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 37
Provided by: arj64
Category:

less

Transcript and Presenter's Notes

Title: T. M. Murali


1
The State of Gene Function Prediction in
Arabidopsis thaliana
  • T. M. Murali
  • Department of Computer Science
  • Virginia Tech
  • Slides prepared by Arjun Krishnan
  • Introduction to Computational Biology and
    Bioinformatics (CS 3824
  • October 11, 13, 2011

2
How a cell is wired
Small molecules
DNA
mRNA
Protein
The dynamics of such interactions emerge as
cellular processes and functions
3
Molecular interaction networks
How do the genes and their products interact to
collectively perform a function?
4
Molecular interaction networks
  • A network containing genes connected to each
    other whenever they physically or functionally
    interact
  • Proteins that interact/co-complex (ribosomal,
    polymerase, etc.)
  • Transcription factors and their target
  • Enzymes catalyzing different steps in the same
    metabolic pathway
  • Genes with correlation in expression
  • Genes with similar phylogenetic profiles

5
Arabidopsis is the primary model organism for
plants
  • Complex organization from molecular to whole
    organism level.
  • A key challenge
  • Understanding the cellular machinery that
    sustains this complexity.
  • In the current post-genomic times, a main aspect
    of this challenge is gene function prediction
  • Identification of functions of all the (30, 000)
    genes in the genome.

6
Extent of gene annotations in Arabidopsis
Total of 30,000 genes in the genome
Ashburner et al, (2000) Nat. Gen. Swarbreck et al
(2008) Nuc. Acids. Res.
7
Exploit high-throughput data
  • Integrating functional genomic data could lead to
  • Network models of gene interactions that resemble
    the underlying cellular map.
  • Typically these networks contain gene functional
    interactions
  • Connecting pairs of genes that participate in the
    same biological processes.
  • In such a network, the very place of a gene
    establishes the functional context that gene.
  • Guilt-by-association genes of unknown
    functions can also be imputed with the function
    of their annotated neighbors.

8
Functional interaction networks
  • Functional interaction network models have been
    developed for Arabidopsis.
  • Lee et al. (2010) Rational association of genes
    with traits using a genome-scale gene network for
    Arabidopsis thaliana.
  • Very comprehensive in terms of using and
    integrating datasets in other organisms for
    application in plants.
  • Integrated 24 datasets 5 datasets from
    Arabidopsis and the rest from other models.
  • AraNet 19,647 genes, 1,062,222 interactions.

9
Goal of this study
  • We examine the state of network-based gene
    function prediction in Arabidopsis.
  • Evaluate the performance of multiple prediction
    algorithms on AraNet.
  • Assesses the influence of the number of genes
    annotated to a function and the source of
    annotation evidence.
  • Compute the correlation of prediction performance
    with network properties.
  • Evaluate prediction performance for
    plant-specific functions.

10
Network-based gene function prediction algorithms
11
Network-based gene function prediction
12
Network-based gene function prediction
  • Function A
  • Function B

13
In this study
14
Performance of different algorithms
  • Computational gene function prediction precedes
    and guides experimental validation
  • What we get is a ranked list of novel predictions
  • An experimenter would choose a manageable number
    of top-scoring predictions to pursue
  • Precision at the top of the prediction list
  • We choose precision at 20 recall (P20R) as the
    measure of performance

15
Performance of different algorithms
SS seems to be better than the other algorithms
What about the influence of the number of genes
in a function?
16
Performance of different algorithms
Each group containing 125 functions
Number of functions
Number of genes annotated with a function
17
Performance of different algorithms
For small functions, the algorithm does not
matter!
And, using just experimental annotations is
better when you know little about a function.
  • For large functions
  • SS is clearly the best
  • - Using all annotation is better

For medium functions, SS is a little better and
use of electronic evidences is mixed.
18
Performance of different algorithms
Wilcoxon test SS vs. other algorithms
All ECs
Sans IEA/ISS
Overall, SinkSource appears to be best algorithm.
19
Correlation of performance with network
properties
  • Performance on a particular function might depend
    on how its genes are organized / connected among
    themselves in the network.
  • Number of nodes
  • Number of components
  • Fraction of nodes in the largest connected
    component
  • Total edge weight
  • Weighted density
  • Average weighted degree
  • Average segregation

20
Correlation of performance with network
properties
21
Correlation of performance with network
properties
22
Correlation of performance with network
properties
  • Number of nodes 9
  • Number of components 3
  • Fraction of nodes in the largest connected
    component 4/9
  • Total edge weight 8
  • Weighted density 8/36
  • Average weighted degree 16/9

23
Correlation of performance with network
properties
Functional modularity Average Segregation
24
Correlation of performance with network
properties
Functional modularity Average Segregation
  • Avg. seg 8/22
  • Avg. seg 12/15

25
Correlation of performance with network
properties
  • We have
  • Vector of SS P20R values for each function
  • Vector of values of a particular topological
    property for each function
  • Spearman rank correlation

26
Correlation of performance with network
properties
27
Performance on plant-specific functions
  • The underlying network is built based on data
    from multiple non-plant species
  • For plant-specific functions
  • Performance is much worse compared to conserved
    functions
  • Using only experimental annotations is better
  • For conserved functions
  • Performance is better than that for all functions
  • Using all annotations is better

28
Most predictable conserved functions
  • protein folding
  • nucleotide transport
  • innate immunity
  • cytoskeleton organization, and
  • cell cycle

29
Least predictable conserved functions
Specialized functions
  • regulation of

30
Most predictable plant-specific functions
Contribution from Arabidopsis datasets
  • cell wall modification
  • auxin/cytokinin signaling, and
  • photosynthesis

31
Least predictable plant-specific functions
  • development, morphogenesis
  • pattern formation
  • phase transitions of various tissues, organs /
    growth stages

32
Conclusions
  • Evaluated the performance of various prediction
    algorithms on AraNet.
  • SinkSource is the overall best prediction
    algorithm.
  • Measured the influence of the number of genes
    annotated to a function and the source of
    annotation evidence.
  • All algorithms perform poorly when only a small
    number of genes are known or when annotating
    very specific functions.
  • When only a small number of genes are known,
    use only experimentally verified annotations to
    make new predictions.
  • When a considerable number of genes are known,
    use all annotations to make new predictions.

33
Conclusions
  • Measured the correlation of performance with
    network properties
  • Several topological properties correlate well
    with performance.
  • Average segregation has the strongest
    correlation.

34
Conclusions
  • Assessed performance on conserved/plant-specific
    functions
  • Performance on basic conserved functions is
    better than that for all the functions.
  • Specialized conserved functions are hard to
    predict.
  • Performance on plant-specific functions is very
    poor.
  • Also a consequence of the fact that
    plant-specific functions generally have small
    number of annotations.

35
Conclusions
  • Avenues for improvement in functional interaction
    networks
  • Build functional interaction networks that are
    based on a larger collection of plant datasets.
  • If possible, rely as little as possible on data
    from other species.
  • Avenues for future experimental work
  • Plant-specific functions and
  • Specialized conserved functions.

36
Acknowledgements
  • Arjun Krishnan
  • Brett Tyler
  • Andy Pereira
Write a Comment
User Comments (0)
About PowerShow.com