Title: Bayesian Statistics for Bioinformatics
1Bayesian Statistics for Bioinformatics
- Craig A. Struble, Ph.D.
- Department of Mathematics, Statistics, and
Computer Science - Marquette University
2Overview
- Introduction to Probability
- Bayes Rule
- Bayesian Evolutionary Distance
- Bayesian Sequence Alignment
3Probability
- Let P(A) represent the probability that
proposition A is true. - Example Let Risky represent that a customer is a
high credit risk. P(Risky) 0.519 means that
there is a 51.9 chance a given customer is a
high-credit risk. - Without any other information, this probability
is called the prior or unconditional probability
4Random Variables
- Could also consider a random variable X, which
can take on one of many values in its domain
ltx1,x2,,xngt - Example Let Weather be a random variable with
domain ltsunny, rain, cloudy, snowgt. The
probabilities of Weather taking on one of these
values is - P(Weathersunny)0.7 P(Weatherrain)0.2
- P(Weathercloudy)0.08 P(Weathersnow)0.02
5Probability Distributions
- The notation P(X) is used to represent the
probabilities of all possible values of a random
variable - Example, P(Weather) lt0.7,0.2,0.08,0.02gt
- The statement above defines a probability
distribution for the random variable Weather - The notation P(Weather, Risky) is used to denote
the probabilities of all combinations of the two
variables. - Represented by a 4x2 table of probabilities
6Conditional Probability
- Probabilities of events change when we know
something about the world - The notation P(AB) is used to represent the
conditional or posterior probability of A - Read the probability of A given that all we know
is B. - P(Weather snow Temperature below freezing)
0.10
7Logical Connectives
- We can use logical connectives for probabilities
- P(Weather snow ? Temperature below freezing)
- Can use disjunctions (or) or negation (not) as
well - The product rule
- P(A ? B) P(AB)P(B) P(BA)P(A)
- Using probability distributions
- P(X,Y) P(XY)P(Y)
- which is equivalent to saying
- P(Xxi ? Yyj)P(XxiYyj)P(Yyj) for all i and j
8Axioms of Probability
- All probabilities are between 0 and 1
- 0?P(A) ?1
- Necessarily true propositions have prob. of 1,
necessarily false prob. of 0 - P(true) 1 P(false) 0
- The probability of a disjunction is given by
- P(A?B) P(A) P(B) - P(A?B)
9Joint Probability Distributions
- Recall P(A,B) represents the probabilities of all
possible combinations of assignments to random
variables A and B. - More generally, P(X1, , Xn) for random variables
X1, , Xn is called the joint probability
distribution or joint
10Joint Probability Distributions
- Example
- P(Weathersunny) 0.7 (add up row)
- P(Risky) 0.519 (add up column)
- What about P(Weathersunny??Risky)?
11Bayes Rule
- Bayes rule relates conditional probabilities
- P(A ?B)P(AB)P(B)
- P(A ?B)P(BA)P(A)
- Bayes Rule
12Normalization
- Direct assessment of P(A) may not be possible,
but we can use the fact - to estimate the value.
- Example
- P(Weathersunny) P(Weathersunny
Risky)P(Risky) -
P(Weathersunny ?Risky)P(? Risky)
13Bayes Rule Example
- Dishonest casino Loaded die where 6 comes up 50
of the time. 1 out of every 100 die is loaded - P(Dfair)0.99 P(Dloaded) 0.01
- Lets say someone rolls three 6s in a row.
Whats the probability that the die is loaded?
14Generalizing Bayes Rule
- For probability distributions
- Conditionalized on background evidence E
15Generalizing with Normalization
- Using normalization, Bayes rule can be written
- More generally
- where ? is a constant that makes the probability
distribution table for P(BA) total to 1.
16Likelihood
- Let A represent that a given model/hypothesis is
true and B represent that a given data sample
being observed. - P(AB) is the probability of the model being true
given that data is observed - P(BA) is the likelihood of the model.
17Bayesian Approaches
- Choose the most likely hypothesis, model,
classification, using Bayesian techniques - MAP (maximum a posteriori), choose hi
18Naïve Bayesian Classifier
- ML (maximum likelihood)
- Assume P(hi) P(hj) (classifications are equally
likely) - Choose hi such that
19Bayesian Evolutionary Distance
- Agarwal and States (1996)
- Chapter 3, pp. 122-124
- Problem Estimate evolutionary distance of two
sequences - Recall PAM1 represents 1 change in 100
- Evolutionary distance of 107
- PAMN PAM1N
20Bayesian Evolutionary Distance
- One approach is to align the sequences with
different PAM matrices until highest score is
obtained. - E.g. Sequence of length 100 with 60 mismatches
- PAM50 401.34 - 601.04 -8.8
- PAM125 400.65 - 600.30 8
21Bayesian Evolutionary Distance
- Let x be the evolutionary distance represented by
PAMN matrix - Let k be the number of mismatches
- P(xk) is the probability of x being the
evolutionary distance given k mismatches - Goal Select x such that P(xk) is maximized.
22Bayesian Evolutionary Distance
Odds Score
- From Bayes rule
- If we use ML, then choose xn maximizing
- What about MAP?
23Bayesian Evolutionary Distance
24Bayesian Sequence Alignment
- Zhu et al. (1998) Bayes block aligner
- Finds highest scoring ungapped regions or blocks
- Provide range of substitution matrices and
expected number of blocks as prior information - Scores every possible combination of blocks to
find best scoring alignment
25Bayesian Sequence Alignment