Stochastic Models and Statistic Tools Used in Phylogenetic Reconstruction - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Stochastic Models and Statistic Tools Used in Phylogenetic Reconstruction

Description:

There are a lot of methods to construct the model. ... So you can construct the tree using simulation such as MCMC (Markov Chain Monte Carlo) ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 16
Provided by: csPu
Category:

less

Transcript and Presenter's Notes

Title: Stochastic Models and Statistic Tools Used in Phylogenetic Reconstruction


1
Stochastic Models and Statistic Tools Used in
Phylogenetic Reconstruction
  • Zhen Zhu

2
What is Phylogentics?
  • The study of evolutionary relatedness among
    various groups of organisms.
  • Phylogenetics treats a species as a group of
    lineage-connected individuals over time.
  • In short words, phylogentic reconstruction is to
    construct a tree of life.

3
Animals
Plants
Fungi
Protists
Eukaryotes
4
Some Terminology
  • Character (variables, e.g. the color of the egg)
  • States (domain of the variable, e.g, ATGC for DNA
    nucleotides)
  • Taxa (the species being studied)
  • Marker (the character used for a particular
    study, mostly DNA sequences, could be combo)

5
Why to use stochastic models in phylogenetic
reconstruction
  • Why?
  • We dont know a lot of things in the tree, have
    to guess.
  • Even for the leaf species that exists today, it
    is hard to get all the samples we want.
  • We know certain things about DNA mutation, its
    easy to use those knowledge with the model.
  • Stochastic models also allow us to predict the
    unknowns.

6
How to use stochastic models in phylogenetic
reconstruction
  • Known, unknown and assumption
  • Known sample data set D, and perhaps, some
    probability gathered from other sources.
  • Unknown almost everything!
  • A lot of methods of building the model are
    suggested based on different assumptions.
  • The mostly used model (a tree!)
  • Normally, one study only focus on a relatively
    small sub-tree of the whole tree of life. T is
    used to denote the tree we are studying. V is the
    set of all vertices in the tree. Each edge e can
    be represented as a pair of vertices (v,w) where
    v is closer to the root ?.

7
Detail the Model
  • Once we have this rough idea that the model will
    look like a tree, well have to think about how
    to construct it.
  • There are a lot of methods to construct the
    model. Basically, by different method, the
    vertices and edges store different information.
    There will always be a probability matrix
    associated with each vertex. How matrices change
    differs according to the assumption.
  • If we know the state of a root and all the
    transition matrices, we can uniquely define a
    tree (usually the models are identifiable). We
    could have a lot of estimation tree, but we want
    the best that can fits the sample data we have.

8
Example tree model
A
A
T
G
C
.


A
C
C
A
..
..
..
..
  • In this model, the marker is only one
    nucleotide, it has 4 states A,T,G,C. Each level
    represents a time unit increment, the time unit
    is small enough that within the unit, no mutation
    will happen. After this unit of time, the
    character will either remains the same or change
    to another state with certain probability. We
    could assign different evolution rate to each
    edge. And we will have a transition associated
    with each vertex.

9
Ways to reconstruct the tree
  • Once we have the estimated tree, we need to find
    a true evolutionary tree out of those
    candidates using the sample data we have.
  • There are so many different ways, totally depend
    on what data is available to you!
  • Because its hard to get the samples millions of
    years ago, so usually the samples will be
    associated with the leaf nodes and leave the
    internal vertices unlabeled.

10
Top-down or Bottom-up
  • Top-down
  • You know the probability from some other sources.
    So you can construct the tree using simulation
    such as MCMC (Markov Chain Monte Carlo).
  • You have several hypothesis to test, and you just
    throw them to the simulator, and gather the data
    from the output, and have a scheme to determine
    whether the hypothesis is valid or not according
    to your sample data.
  • The result is not one single best tree, but a
    distribution of the possible best results. (the
    probability of each hypothesis being the true
    tree)

11
  • Bottom-up
  • A lot of math involved. GTR (Generalized Time
    Reversible) Model. Tree is a little bit
    different. (the tree is predetermined in shape so
    that number of leaves is the same as number of
    samples).
  • U means unlabelled, we dont really care about
    what is exactly in those nodes. Transition
    matrices here are associated with the edge rather
    than the vertex.

?
U
U
U
U
U
U
1
2
3
4
5
6
7
8
12
  • Suppose the root has a probability pi of being in
    state i (is fixed).
  • Given an edge e(v,w), we have a transition matrix
    M(e) associated with it, where an element in the
    matrix M(e)rs means the conditional probability
    that w being in state s given that v being in
    state r. (Thats why we dont really care about
    what is in the unlabelled vertices.)
  • Suppose that the probability in the transition
    matrices only depend on its ancestor.
  • We are able to get the probability of the leaves
    from
  • Suppose the determinant of all the transition
    matrix is not 0, 1 or -1 and pi will not be 0.
  • Its proved that under such assumption that if we
    know that which leaf is in what state, we can
    reconstruct the tree (except the root).

13
Other statistical methods used in phylogenetic
reconstruction
  • Sampling (extremely important but problematic)
  • How to fit in the sample data (maximum
    likelihood)
  • Measure the result with confidence
  • In short, we are just guessing with the hope to
    have good luck.

14
  • Reference
  • An overview of Phylogeny Reconstruction
    (C.R.Linder T.Warnow)
  • Recovering a tree from the leaf colurations it
    generates under a Markov model (Mike Steel)
  • A Probability Model for Inferring Evolutionary
    Trees (James S. Farris)
  • Wikipedia and other internet sources

15
Thank You For Not Sleeping
Write a Comment
User Comments (0)
About PowerShow.com