Bayesian Statistics for Bioinformatics - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Bayesian Statistics for Bioinformatics

Description:

Choose the most likely hypothesis, model, classification, using Bayesian techniques ... One approach is to align the sequences with different PAM matrices until ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 26
Provided by: CraigAS7
Learn more at: http://www.mscs.mu.edu
Category:

less

Transcript and Presenter's Notes

Title: Bayesian Statistics for Bioinformatics


1
Bayesian Statistics for Bioinformatics
  • Craig A. Struble, Ph.D.
  • Department of Mathematics, Statistics, and
    Computer Science
  • Marquette University

2
Overview
  • Introduction to Probability
  • Bayes Rule
  • Bayesian Evolutionary Distance
  • Bayesian Sequence Alignment

3
Probability
  • Let P(A) represent the probability that
    proposition A is true.
  • Example Let Risky represent that a customer is a
    high credit risk. P(Risky) 0.519 means that
    there is a 51.9 chance a given customer is a
    high-credit risk.
  • Without any other information, this probability
    is called the prior or unconditional probability

4
Random Variables
  • Could also consider a random variable X, which
    can take on one of many values in its domain
    ltx1,x2,,xngt
  • Example Let Weather be a random variable with
    domain ltsunny, rain, cloudy, snowgt. The
    probabilities of Weather taking on one of these
    values is
  • P(Weathersunny)0.7 P(Weatherrain)0.2
  • P(Weathercloudy)0.08 P(Weathersnow)0.02

5
Probability Distributions
  • The notation P(X) is used to represent the
    probabilities of all possible values of a random
    variable
  • Example, P(Weather) lt0.7,0.2,0.08,0.02gt
  • The statement above defines a probability
    distribution for the random variable Weather
  • The notation P(Weather, Risky) is used to denote
    the probabilities of all combinations of the two
    variables.
  • Represented by a 4x2 table of probabilities

6
Conditional Probability
  • Probabilities of events change when we know
    something about the world
  • The notation P(AB) is used to represent the
    conditional or posterior probability of A
  • Read the probability of A given that all we know
    is B.
  • P(Weather snow Temperature below freezing)
    0.10

7
Logical Connectives
  • We can use logical connectives for probabilities
  • P(Weather snow ? Temperature below freezing)
  • Can use disjunctions (or) or negation (not) as
    well
  • The product rule
  • P(A ? B) P(AB)P(B) P(BA)P(A)
  • Using probability distributions
  • P(X,Y) P(XY)P(Y)
  • which is equivalent to saying
  • P(Xxi ? Yyj)P(XxiYyj)P(Yyj) for all i and j

8
Axioms of Probability
  • All probabilities are between 0 and 1
  • 0?P(A) ?1
  • Necessarily true propositions have prob. of 1,
    necessarily false prob. of 0
  • P(true) 1 P(false) 0
  • The probability of a disjunction is given by
  • P(A?B) P(A) P(B) - P(A?B)

9
Joint Probability Distributions
  • Recall P(A,B) represents the probabilities of all
    possible combinations of assignments to random
    variables A and B.
  • More generally, P(X1, , Xn) for random variables
    X1, , Xn is called the joint probability
    distribution or joint

10
Joint Probability Distributions
  • Example
  • P(Weathersunny) 0.7 (add up row)
  • P(Risky) 0.519 (add up column)
  • What about P(Weathersunny??Risky)?

11
Bayes Rule
  • Bayes rule relates conditional probabilities
  • P(A ?B)P(AB)P(B)
  • P(A ?B)P(BA)P(A)
  • Bayes Rule

12
Normalization
  • Direct assessment of P(A) may not be possible,
    but we can use the fact
  • to estimate the value.
  • Example
  • P(Weathersunny) P(Weathersunny
    Risky)P(Risky)

  • P(Weathersunny ?Risky)P(? Risky)

13
Bayes Rule Example
  • Dishonest casino Loaded die where 6 comes up 50
    of the time. 1 out of every 100 die is loaded
  • P(Dfair)0.99 P(Dloaded) 0.01
  • Lets say someone rolls three 6s in a row.
    Whats the probability that the die is loaded?

14
Generalizing Bayes Rule
  • For probability distributions
  • Conditionalized on background evidence E

15
Generalizing with Normalization
  • Using normalization, Bayes rule can be written
  • More generally
  • where ? is a constant that makes the probability
    distribution table for P(BA) total to 1.

16
Likelihood
  • Let A represent that a given model/hypothesis is
    true and B represent that a given data sample
    being observed.
  • P(AB) is the probability of the model being true
    given that data is observed
  • P(BA) is the likelihood of the model.

17
Bayesian Approaches
  • Choose the most likely hypothesis, model,
    classification, using Bayesian techniques
  • MAP (maximum a posteriori), choose hi

18
Naïve Bayesian Classifier
  • ML (maximum likelihood)
  • Assume P(hi) P(hj) (classifications are equally
    likely)
  • Choose hi such that

19
Bayesian Evolutionary Distance
  • Agarwal and States (1996)
  • Chapter 3, pp. 122-124
  • Problem Estimate evolutionary distance of two
    sequences
  • Recall PAM1 represents 1 change in 100
  • Evolutionary distance of 107
  • PAMN PAM1N

20
Bayesian Evolutionary Distance
  • One approach is to align the sequences with
    different PAM matrices until highest score is
    obtained.
  • E.g. Sequence of length 100 with 60 mismatches
  • PAM50 401.34 - 601.04 -8.8
  • PAM125 400.65 - 600.30 8

21
Bayesian Evolutionary Distance
  • Let x be the evolutionary distance represented by
    PAMN matrix
  • Let k be the number of mismatches
  • P(xk) is the probability of x being the
    evolutionary distance given k mismatches
  • Goal Select x such that P(xk) is maximized.

22
Bayesian Evolutionary Distance
Odds Score
  • From Bayes rule
  • If we use ML, then choose xn maximizing
  • What about MAP?

23
Bayesian Evolutionary Distance
24
Bayesian Sequence Alignment
  • Zhu et al. (1998) Bayes block aligner
  • Finds highest scoring ungapped regions or blocks
  • Provide range of substitution matrices and
    expected number of blocks as prior information
  • Scores every possible combination of blocks to
    find best scoring alignment

25
Bayesian Sequence Alignment
Write a Comment
User Comments (0)
About PowerShow.com