A Practical Course in Graphical Bayesian Modeling; Class 1

About This Presentation

Title:

A Practical Course in Graphical Bayesian Modeling; Class 1

Description:

Models in WinBUGS Models in WinBUGS (Spiegelhalter, 1998) Models in WinBUGS (Spiegelhalter, 1998) Models in WinBUGS (Spiegelhalter, 1998) Models in WinBUGS ... – PowerPoint PPT presentation

Number of Views:76

Avg rating:3.0/5.0

Slides: 58

Provided by: Medewer62

Category:

more less

Transcript and Presenter's Notes

Title: A Practical Course in Graphical Bayesian Modeling; Class 1

1
A Practical Course in Graphical Bayesian
Modeling Class 1
Eric-Jan Wagenmakers
2
Outline

A bit of probability theory
Bayesian foundations
Parameter estimation A simple example
WinBUGS and R2WinBUGS

3
Probability Theory (Wasserman, 2004)

The sample space O is the set of possible
outcomes of an experiment.
If we toss a coin twice then O HH, HT, TH,
TT.
The event that the first toss is heads isA
HH, HT.

4
Probability Theory (Wasserman, 2004)

denotes intersection A and B
denotes union A or B

5
Probability Theory (Wasserman, 2004)
P is a probability measure when the following
axiomsare satisfied
1. Probabilities are never negative
2. Probabilities add to 1.
2. The probability of the union of
non-overlapping (disjoint) events is its sum
6
Probability Theory (Wasserman, 2004)
For any events A and B
O
A
B
7
Conditional Probability
The conditional probability of A given B is
O
A
B
8
Conditional Probability
You will often encounter this as
O
A
B
9
Conditional Probability
From
and
follows Bayes rule.
10
Bayes Rule
11
The Law of Total Probability
Let A1,,Ak be a partition of O. Then, for any
event B
12
The Law of Total Probability
This is just a weighted average of P(B) over
thedisjoint sets A1,,Ak. For instance, when all
P(Ai) areequal, the equation becomes
13
Bayes Rule Revisited
14
Example (Wasserman, 2004)

I divide my Email into three categories spam,
low priority, and high priority.
Previous experience suggests that the a priori
probabilities of a random Email belonging to
these categories are .7, .2, and .1,
respectively.

15
Example (Wasserman, 2004)

The probabilities of the word free occurring in
the three categories is .9, .01, .01,
respectively.
I receive an Email with the word free. What is
the probability that it is spam?

16
Outline

A bit of probability theory
Bayesian foundations
Parameter estimation A simple example
WinBUGS and R2WinBUGS

17
The Bayesian Agenda

Bayesians use probability to quantify uncertainty
or degree of belief about parameters and
hypotheses.
Prior knowledge for a parameter ? is updated
through the data to yield the posterior
knowledge.

18
The Bayesian Agenda
Also note that this equation allows one to learn,
from the probability of what is observed,
something about what is not observed.
19
The Bayesian Agenda

But why would one measure degree of belief by
means of probability? Couldnt we choose
something else that makes sense?
Yes, perhaps we can, but the choice of
probability is anything but ad-hoc.

20
The Bayesian Agenda

Assume degree of belief can be measured by a
single number.
Assume you are rational, that is, not
self-contradictory or obviously silly.
Then degree of belief can be shown to follow the
same rules as the probability calculus.

21
The Bayesian Agenda

For instance, a rational agent would not hold
intransitive beliefs, such as

22
The Bayesian Agenda

When you use a single number to measure
uncertainty or quantify evidence, and these
numbers do not follow the rules of probability
calculus, you can (almost certainly?) be shown to
be silly or incoherent.
One of the theoretical attractions of the
Bayesian paradigm is that it ensures coherence
right from the start.

23
Coherence Examplea la De Finetti

There exists a ticket that says If the French
national soccer team wins the 2010 World Cup,
this ticket pays 1.
You must determine the fair price for this
ticket.
After you set the price, I can choose to either
sell the ticket to you, or to buy the ticket from
you. This is similar to how you would divide a
pie according to the rule you cut, I choose.
Please write this number down, you are not
allowed to change it later!

24
Coherence Examplea la De Finetti

There exists another ticket that says If the
Spanish national soccer team wins the 2010 World
Cup, this ticket pays 1.
You must again determine the fair price for this
ticket.

25
Coherence Examplea la De Finetti

There exists a third ticket that says If either
the French or the Spanish national soccer team
wins the 2010 World Cup, this ticket pays 1.
What is the fair price for this ticket?

26
Bayesian Foundations

Bayesians use probability to quantify uncertainty
or degree of belief about parameters and
hypotheses.
Prior knowledge for a parameter ? is updated
through the data to yield posterior knowledge.
This happens through the use of probability
calculus.

27
Bayes Rule
Prior Distribution
Likelihood
Posterior Distribution
Marginal Probability of the Data
28
Bayesian Foundations
This equation allows one to learn, from the
probability of what is observed, something about
what is not observed. Bayesian statistics was
long known as inverse probability.
29
Nuisance Variables

Suppose ? is the mean of a normal distribution,
and a is the standard deviation.
You are interested in ?, but not in a.
Using the Bayesian paradigm, how can you go from
P(?, a x) to P(? x)? That is, how can you get
rid of the nuisance parameter a? Show how this
involves P(a).

30
Nuisance Variables
31
Predictions

Suppose you observe data x, and you use a model
with parameter ?.
What is your prediction for new data y, given
that youve observed x? In other words, show how
you can obtain P(yx).

32
Predictions
33
Want to Know More?
34
Outline

A bit of probability theory
Bayesian foundations
Parameter estimation A simple example
WinBUGS and R2WinBUGS

35
Bayesian Parameter Estimation Example

We prepare for you a series of 10 factual
true/false questions of equal difficulty.
You answer 9 out of 10 questions correctly.
What is your latent probability ? of answering
any one question correctly?

36
Bayesian Parameter Estimation Example

We start with a prior distribution for ?. This
reflect all we know about ? prior to the
experiment. Here we make a standard choice and
assume that all values of ? are equally likely a
priori.

37
Bayesian Parameter Estimation Example

We then update the prior distribution by means of
the data (technically, the likelihood) to arrive
at a posterior distribution.

38
The Likelihood

We use the binomial model, in which P(D?) is
given bywhere n 10 is the number of trials,
and s9 is the number of successes.

39
Bayesian Parameter Estimation Example

The posterior distribution is a compromise
between what we knew before the experiment (i.e.,
the prior) and what we have learned from the
experiment (i.e., the likelihood). The posterior
distribution reflects all that we know about ?.

40
Mode 0.9 95 confidence interval (0.59, 0.98)
41
Bayesian Parameter Estimation Example

Sometimes it is difficult or impossible to obtain
the posterior distribution analytically.
In this case, we can use Markov chain Monte Carlo
algorithms to sample from the posterior. As the
number of samples increases, the approximation to
the analytical posterior becomes arbitrarily
small.

42
(No Transcript)
43
Mode 0.89 95 confidence interval (0.59,
0.98) With 9000 samples, almost identical
to analytical result.
44
(No Transcript)
45
Outline

A bit of probability theory
Bayesian foundations
Parameter estimation A simple example
WinBUGS and R2WinBUGS

46
WinBUGS
Bayesian inference UsingGibbs Sampling
You want to have thisinstalled (plus the
registration key)
47
WinBUGS

Knows many probability distributions
(likelihoods)
Allows you to specify a model
Allows you to specify priors
Will then automatically run the MCMC sampling
routines and produce output.

48
Want to Know MoreAbout MCMC?
49
Models in WinBUGS

The models you can specify in WinBUGS are
directed acyclical graphs (DAGs).

50
Models in WinBUGS(Spiegelhalter, 1998)
Below, E depends only on C
51
Models in WinBUGS(Spiegelhalter, 1998)
If the nodes are stochastic, the
jointdistribution factorizes
52
Models in WinBUGS(Spiegelhalter, 1998)
P(A,B,C,D,E) P(A) P(B) P(CA,B)
P(DA,B) P(EC)
53
Models in WinBUGS(Spiegelhalter, 1998)
This means we can sometimes performlocal
computations to get what we want
54
Models in WinBUGS(Spiegelhalter, 1998)
What is P(CA,B,D,E)?
55
Models in WinBUGS(Spiegelhalter, 1998)
P(CA,B,D,E) is proportional to P(CA,B) P(EC)
? D is irrelevant
56
WinBUGS R

A Practical Course in Graphical Bayesian Modeling; Class 1 - PowerPoint PPT Presentation

A Practical Course in Graphical Bayesian Modeling; Class 1

Models in WinBUGS Models in WinBUGS (Spiegelhalter, 1998) Models in WinBUGS (Spiegelhalter, 1998) Models in WinBUGS (Spiegelhalter, 1998) Models in WinBUGS ... – PowerPoint PPT presentation