Title: Bayesian inference
1Bayesian inference
- Based on Bayesian inference using Markov Chain
Monte Carlo in phylogenetic studies by TorbojÖrn
Karfunkel
Presented by Amir Hadadi Bioinformatics seminar,
spring 2005
2What is Bayesian inference ?
- Definition an approach to statistics in which
all forms of uncertainty are expressed in terms
of probability (Radford M. Neal)
3Probability reminder
- Conditional probability
- P(D?T) P(DT)?P(T)
- P(D?T) P(TD)?P(D)
Bayes theorem P(TD) P(DT)?P(T)/P(D)
- P(TD) is called the posterior probability of T
- P(T) is the prior probability, that is the
probability assigned to T before seeing the data - P(DT) is the likelihood of T, which is what we
try to maximize in ML - P(D) is the probability of observing the data D
disregarding which tree is correct
4posterior vs. likelihood probabilitiesBayesian
inference vs. Maximum likelihood
Observation Fair Biased
1/6
1/21
1/6
2/21
1/6
3/21
1/6
4/21
- 100 dice
- some fair, some biased
1/6
5/21
1/6
6/21
5Example continued
- A die is drawn at random from the box
- Rolling the die twice gives us a and a
- Using the ML approach we get
- P( Fair) 1/6 ? 1/6 0.028
- P( Biased) 4/21 ? 6/21 0.054
- ML Conclusion the die is biased
6Example continued further
- Assume we have a prior knowledge about the dice
distribution inside the box - We know that in the box there are 90 fair dice
and 10 biased dice
7Example conclusion
- Prior probability fair 0.1, biased 0.9
- Rolling the die twice gives us a and a
- Using the Bayesian approach we get
- P(Biased ) P(
Biased)?P(Biased)/P( )0.179 - B.I. Conclusion the die is fair
- Conclusion ML and BI do not necessarily agree
- Resemblance of BI and ML results depends on the
strength of prior assumptions we introduce
8Steps in B.I.
- formulate a model of the problem
- Formulate a prior distribution which captures
your beliefs before seeing the data - Obtain posterior distribution for the model
parameters
9B.I. In phylogenetic reconstruction
- Phylogenetic reconstruction
- Finding an evolutionary tree which explains the
Data (observed species) - Methods of phylogenetic reconstruction
- Using a model of sequence evolution, e.g. maximum
likelihood - Not using sequence evolution, e.g. maximum
parsimony, neighbor joining etc. - Bayesian inference belongs to the first category
10Bayesian inference vs. Maximum likelihood
- The basic question in Bayesian inference
- What is the probability that this model (T) is
correct, given the data (D) that we have observed
? - Maximum likelihood asks a different question
- What is the probability of seeing the observed
data (D) given that a certain model (T) is true
? - B.I. seeks P(TD), while ML maximizes P(DT)
11Which priors should we assume ?
- Knowledge about a parameter can be used to
approximate its prior distribution - Usually we dont have prior knowledge about a
parameters distribution. In this case a flat or
vague prior is assumed.
12A flat prior
A vague prior
13How to find the posterior probability P(TD) ?
- P(T) is the assumed prior
- P(DT) is the likelihood
- Finding P(D) is infeasible we need to sum
P(DT)P(T) over the entire tree space - Markov Chain Monte Carlo (MCMC) gives us an
indirect way of finding P(TD) without having to
calculate P(D)
14MCMC Example
P1/2
P1/2
P1/2
P1/2
,
,
,
,
,
,
P(Palestine) 3/7, P(Tree) 4/7
15Symmetric simple random walk
- Definition A sequence of steps in ?, starting at
0 and moving one step left or right with
probability ½ - Properties
- After n steps the average distance from 0 is of
magnitude ?n - A random walk in one or two dimensions is
recurrent - A random walk in three dimensions or more is
transient - The Brownian motion is a limit of a random walk
16Definition of a markov chain
- A special type of stochastic process
- A sequence of random variables X0, X1, X2, such
that - Each Xi takes values in a state space S s1,
s2, - If x0, x1,, xn1 are elements of S, then
- P(Xn1 xn1Xn xn, Xn-1 xn-1,,X0 x0)
- P(Xn1 xn1Xn xn)
17Using MCMC to calculate posterior probabilities
- set S the set of parameters (e.g. tree
topology, mutation probability, branch lengths
etc.) - Construct an MCMC with a stationary distribution
equal to the posterior probability of the
parameters - Run the chain for a long time and sample from it
regularly - Use the samples to find the stationary
distribution
18Constructing our MCMC
- The state space S is defined as the parameter
space - Start with random tree and parameters
- In each new generation, randomly propose either
- A new tree topology
- A new value for a model parameter
- If the proposed tree has higher posterior
probability, ?proposed, than the current tree,
?current, the transition is accepted - Otherwise the transition is accepted with
probability ?proposed / ?current
19Algorithm visualization
20Convergence issues
- An MCMC might run for a long time until its
sampled distribution is close to the stationary
distribution - The initial convergence phase is called the
burn-in phase - We wish to minimize burn-in time
21Avoiding getting stuck on local maxima
- Assume our landscape looks like this
22Avoiding local maxima (contd)
- descending a maximum can take a long time
- MCMCMC (Metropolis coupled MCMC) speeds the
chains mixing rate - Instead of running a single chain, multiple
chains are run simultaneously - The chains are heated to different degrees
23Chain heating
The cold chain has stationary distribution P(TD)
Heated chain number i has Stationary distribution
P(TD)1/i
24The MC3 algorithm
- Run multiple heated chains
- At each generation, attempt a swap between two
chains - If the swap is accepted, the hotter and cooler
chains will swap states - sample only from the cold chain
25Drawing conclusions
- To Decide the value of a parameter
- Draw a histogram showing the number of trees in
each interval and calculate mean, mode,
credibility intervals etc. - To find the most likely tree topologies
- sort all sampled trees according to their
posterior probabilities - Pick the most probable trees until the cumulative
probability is 0.95 - To Check whether a certain group of organisms is
monophyletic - Find the number of sampled trees in which it is
monophyletic - If it is monophyletic in 74 of the trees, it has
a 74 probability of being monophyletic
26Summary
- Bayesian inference is very popular in many fields
requiring statistical observations - The advent of fast computers gave rise to the use
of MCMC in B.I., enabling multi-parameter
analysis - Fields of genomics using Bayesian methods
- Identification of SNPs
- Inferring levels of gene expression and
regulation - Association mapping
- Etc.
27THE END
28A sample histogram