Bayesian Statistics for Bioinformatics - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

Bayesian Statistics for Bioinformatics

Description:

Choose the most likely hypothesis, model, classification, using Bayesian techniques ... One approach is to align the sequences with different PAM matrices until ... – PowerPoint PPT presentation

Number of Views:72

Avg rating:3.0/5.0

Slides: 26

Provided by: CraigAS7

Learn more at: http://www.mscs.mu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Bayesian Statistics for Bioinformatics

1
Bayesian Statistics for Bioinformatics

Craig A. Struble, Ph.D.
Department of Mathematics, Statistics, and
Computer Science
Marquette University

2
Overview

Introduction to Probability
Bayes Rule
Bayesian Evolutionary Distance
Bayesian Sequence Alignment

3
Probability

Let P(A) represent the probability that
proposition A is true.
Example Let Risky represent that a customer is a
high credit risk. P(Risky) 0.519 means that
there is a 51.9 chance a given customer is a
high-credit risk.
Without any other information, this probability
is called the prior or unconditional probability

4
Random Variables

Could also consider a random variable X, which
can take on one of many values in its domain
ltx1,x2,,xngt
Example Let Weather be a random variable with
domain ltsunny, rain, cloudy, snowgt. The
probabilities of Weather taking on one of these
values is
P(Weathersunny)0.7 P(Weatherrain)0.2
P(Weathercloudy)0.08 P(Weathersnow)0.02

5
Probability Distributions

The notation P(X) is used to represent the
probabilities of all possible values of a random
variable
Example, P(Weather) lt0.7,0.2,0.08,0.02gt
The statement above defines a probability
distribution for the random variable Weather
The notation P(Weather, Risky) is used to denote
the probabilities of all combinations of the two
variables.
Represented by a 4x2 table of probabilities

6
Conditional Probability

Probabilities of events change when we know
something about the world
The notation P(AB) is used to represent the
conditional or posterior probability of A
Read the probability of A given that all we know
is B.
P(Weather snow Temperature below freezing)
0.10

7
Logical Connectives

We can use logical connectives for probabilities
P(Weather snow ? Temperature below freezing)
Can use disjunctions (or) or negation (not) as
well
The product rule
P(A ? B) P(AB)P(B) P(BA)P(A)
Using probability distributions
P(X,Y) P(XY)P(Y)
which is equivalent to saying
P(Xxi ? Yyj)P(XxiYyj)P(Yyj) for all i and j

8
Axioms of Probability

All probabilities are between 0 and 1
0?P(A) ?1
Necessarily true propositions have prob. of 1,
necessarily false prob. of 0
P(true) 1 P(false) 0
The probability of a disjunction is given by
P(A?B) P(A) P(B) - P(A?B)

9
Joint Probability Distributions

Recall P(A,B) represents the probabilities of all
possible combinations of assignments to random
variables A and B.
More generally, P(X1, , Xn) for random variables
X1, , Xn is called the joint probability
distribution or joint

10
Joint Probability Distributions

Example
P(Weathersunny) 0.7 (add up row)
P(Risky) 0.519 (add up column)
What about P(Weathersunny??Risky)?

11
Bayes Rule

Bayes rule relates conditional probabilities
P(A ?B)P(AB)P(B)
P(A ?B)P(BA)P(A)
Bayes Rule

12
Normalization

Direct assessment of P(A) may not be possible,
but we can use the fact
to estimate the value.
Example
P(Weathersunny) P(Weathersunny
Risky)P(Risky)
P(Weathersunny ?Risky)P(? Risky)

13
Bayes Rule Example

Dishonest casino Loaded die where 6 comes up 50
of the time. 1 out of every 100 die is loaded
P(Dfair)0.99 P(Dloaded) 0.01
Lets say someone rolls three 6s in a row.
Whats the probability that the die is loaded?

14
Generalizing Bayes Rule

For probability distributions
Conditionalized on background evidence E

15
Generalizing with Normalization

Using normalization, Bayes rule can be written
More generally
where ? is a constant that makes the probability
distribution table for P(BA) total to 1.

16
Likelihood

Let A represent that a given model/hypothesis is
true and B represent that a given data sample
being observed.
P(AB) is the probability of the model being true
given that data is observed
P(BA) is the likelihood of the model.

17
Bayesian Approaches

Choose the most likely hypothesis, model,
classification, using Bayesian techniques
MAP (maximum a posteriori), choose hi

18
Naïve Bayesian Classifier

ML (maximum likelihood)
Assume P(hi) P(hj) (classifications are equally
likely)
Choose hi such that

19
Bayesian Evolutionary Distance

Agarwal and States (1996)
Chapter 3, pp. 122-124
Problem Estimate evolutionary distance of two
sequences
Recall PAM1 represents 1 change in 100
Evolutionary distance of 107
PAMN PAM1N

20
Bayesian Evolutionary Distance

One approach is to align the sequences with
different PAM matrices until highest score is
obtained.
E.g. Sequence of length 100 with 60 mismatches
PAM50 401.34 - 601.04 -8.8
PAM125 400.65 - 600.30 8

21
Bayesian Evolutionary Distance

Let x be the evolutionary distance represented by
PAMN matrix
Let k be the number of mismatches
P(xk) is the probability of x being the
evolutionary distance given k mismatches
Goal Select x such that P(xk) is maximized.

22
Bayesian Evolutionary Distance
Odds Score

From Bayes rule
If we use ML, then choose xn maximizing
What about MAP?

23
Bayesian Evolutionary Distance
24
Bayesian Sequence Alignment

Zhu et al. (1998) Bayes block aligner
Finds highest scoring ungapped regions or blocks
Provide range of substitution matrices and
expected number of blocks as prior information
Scores every possible combination of blocks to
find best scoring alignment

25
Bayesian Sequence Alignment

Write a Comment

User Comments (0)