An Introduction to Bayesian Inference - PowerPoint PPT Presentation

1 / 58

About This Presentation

Title:

An Introduction to Bayesian Inference

Description:

Bayes nets: MSBNx Setup ... Hillsdale, NJ: Erlbaum. For a lot more detail, see. Jensen, F.V. (1996) ... Inference in multiply-connected nets ... – PowerPoint PPT presentation

Number of Views:574

Avg rating:3.0/5.0

Slides: 59

Provided by: bobmi9

Category:

more less

Transcript and Presenter's Notes

Title: An Introduction to Bayesian Inference

1
An Introduction to Bayesian Inference

Robert J. Mislevy
University of Maryland
March 19, 2002

2
A quote from Glenn Shafer

Probability is not really about numbers
it is about the structure of reasoning.
Glenn Shafer, quoted in Pearl, 1988, p. 77

3
Views of Probability

Two conceptions of probability
Aleatory (chance)
Long-run frequencies, mechanisms
Probability is a property of the world
Degree of belief (subjective)
Probability is a property of Your state of
knowledge (de Finetti)
Same formal definitions machinery
Aleatory paradigms as analogical basis for degree
of belief (Glenn Shafer)

4
Frames of discernment

Frame of discernment is all the possible
combinations of values of the variables your are
working with. (Shafer, 1976)
Discern detect, recognize, distinguish
Property of you as much as property of world
Depends on what you know and what your purpose is
(e.g., expert vs. novice)
Frame of discernment can evolve over time

5
Frames of Discernment in Assessment

In Student Model, determining what aspects of
skill knowledge to use as explicit SM
variables--psych perspective, grainsize,
reporting requirements
In Evidence Model, evidence identification
(task scoring), evaluation rules map from
unique work product to common observed variables.
In Task Model, which aspects of situations are
important in task design to keep track of and
manipulate, to achieve assessments purpose?
Task features versus values of Task Model
variables

6
Random Variables

We will concentrate on variables with a finite
number of possible values.
Denote random variable by upper case, say X.
Denote particular values and generic values by
lower case, x.
Y is the outcome of a coin flip yÎ h,t.
Xi is the answer to Item i xi Î 0,1.
Zjk is the rating of Judge k to Essay j
zjk Î 0,1,...,5.

7
Finite Probability Distributions

Finite set of possible values x1,xn
P(Xxj), or more simply p(xj), is the probability
that X takes the value xj.
0 p(xj) 1.
P(Xxj or Xxm) p(xj) p(xm).

8
MSBNx representation
MSBNx (Microsoft Bayesian Network editor).
http//research.microsoft.com/adapt/MSBNx/default.
asp
9
Hypergraph representation
X
p(x)
the probability distribution
the variable
10
Conditional probability distributions

Two random variables, X and Y
P(XxjYyk), or p(xj yk), is the probability
that X takes the value xj given that Y takes the
value yk .
This is how we express relationships among
real-world phenomena
P(heart attackage, family history, blood
pressure)
P(tomorrows high temperature geographical
location, date, todays high)
IRT P(Xj1) vs. P(Xj1q)
Coin flip p(heads) vs. p(headsRJM_strategy)

11
Conditional probability distributions

Two random variables, X and Y
P(XxjYyk), or p(xj yk), is the probability
that X takes the value xj given that Y takes the
value yk .
0 p(xj yk) 1 for a given yk.
for a given yk
P(Xxj or Xxm Yyk) p(xj yk) p(xm yk).

12
Hygiene Example Variables
One observable variable Quality of Patient
History Taken Adequate, Inadequate One
unobservable ability Level of Proficiency
Expert, Novice Want to model probability
of taking an adequate history as depending on
level of proficiency.
13
Hygiene Example Conditional Probabilities
One observable Adequate Patient History Taken,
Yes or No One unobservable ability, Expert or
Novice
p(inadequateexpert)
What kind of reasoning is this direction?
p(adequateexpert)
14
Hygiene Example Conditional Probabilities
One observable Adequate Patient History Taken,
Yes or No One unobservable ability, Expert or
Novice
p(inadequatenovice)
p(adequatenovice)
15
Tie-in with Toulmin diagram
Harry is an expert hygienist.
Novices sometimes take adequate patient histories
unless
Experts tend to take adequate patient histories
since
so
Harry took an adequate patient history.
16
Tie-in with Toulmin diagram
Harry is an expert hygienist.
unless
since
so
Harry took an adequate patient history.
17
Tie-in with Toulmin diagram
Pr(Harry is an expert hygienistadequate
history) .82
unless
since
so
Harry took an adequate patient history.
18
Bayes nets MSBNx Setup
History (i.e., quality of patient history
procedure) is modeled as being conditionally
dependent on Proficiency. That is,
Proficiency is a parent of History
equivalently, History is a child of
Proficiency. Note that Proficiency has no
parents.
19
Bayes nets MSBNx Setup
This is the probability table expressing initial
belief about Proficiency.
20
Bayes nets MSBNx Setup
This is the probability table expressing belief
about History. Note there there are two
different conditional distributions for the
values of History one is relevant if Proficiency
Expert and a different one if
ProficiencyNovice.
21
Bayes nets MSBNx Setup
Belief if you know ProficiencyExpert. You know
that with certainty (probability 1), and what you
expect about History is P(HistoryProficiencyExp
ert) .8 for adequate and .2 for inadequate.
22
Bayes nets MSBNx Setup
Click here to bring up the evaluation window.
23
Bayes nets MSBNx Setup
Belief if you know ProficiencyExpert. You know
that with certainty (probability 1), and what you
expect about History is P(HistoryProficiencyExp
ert) .8 for adequate and .2 for inadequate.
24
Bayes nets MSBNx Setup
Belief if you know ProficiencyNovice. You know
that with certainty (probability 1), and what you
expect about History is P(HistoryProficiencyNov
ice) .4 for adequate and .6 for inadequate.
25
Bayes nets MSBNx Setup
Depiction if you dont know the value of either
Proficiency or History. Starting with 70-30
belief about Expert/Novice. The conditional
distributions for History are averaged together
to get the marginal (overall) distribution for
History -- appropriate if you dont know
Proficiency level.
26
Bayes Theorem

The setup, with two random variables, X and Y
You know conditional probabilities, p(xj yk),
which tell you what to believe about X if you
knew the value of Y.
You learn Xx what should you believe about Y?
You combine two things
Relative conditional probabilities (the
likelihood)
Previous probabilities about Y values

posterior likelihood
prior
27
Hygiene Example Likelihoods
One observable Adequate Patient History Taken,
Yes or No One unobservable proficiency, Expert or
Novice
What kind of reasoning is this direction?
p(adequateexpert)
p(adequatenovice)
Note 21 ratio
28
Hygiene Example Likelihoods
One observable Adequate Patient History Taken,
Yes or No One unobservable proficiency, Expert or
Novice
p(inadequateexpert)
p(inadequatenovice)
Note 13 ratio
29
Bayes nets Posterior Probabilities
Posterior distribution for Proficiency if
HistoryAdequate is observed. The 21 ratio
favoring Expert multiplies the 7030 ratio we
started with. The products are rescaled so they
sum to one.
30
Bayes nets Posterior Probabilities
Posterior distribution for Ability if
HistoryInadequate is observed. The 13 ratio
for Expert/Novice from the likelihood has
multiplied the 7030 prior.
31
Conditional independence

Conditional independence is not a grace of
nature for which we must wait passively, but
rather a psychological necessity which we satisfy
actively by organizing our knowledge in a
specific way.
An important tool in such organization is the
identification of intermediate variables that
induce conditional independence among
observables if such variables are not in our
vocabulary, we create them.
In medical diagnosis, for instance, when some
symptoms directly influence one another, the
medical profession invents a name for that
interaction (e.g., syndrome, complication,
pathological state) and treats it as a new
auxiliary variable that induces conditional
independence dependency between any two
interacting systems is fully attributed to the
dependencies of each on the auxiliary variable.
(Pearl, 1988, p. 44)

32
Conditional independence

Independence
The probability of the joint occurrence of
values of two variables is always equal to the
product of the probabilities individually
P(Xx,Yy) P(Xx) P(Yy).
Conditional independence
The conditional probability of the joint
occurrence given the value of another variable is
always equal to the product of the conditional
probabilities
P(Xx,YyZz) P(Xx Zz) P(Yy Zz).

33
Conditional independence

MSBNx diagram with Independence
MSBNx diagram with Conditional independence

Entering a value for one variable, X or Y,
doesnt change the probabilities for the other.
Entering a value for one variable, X or Y,
doesnt change the probabilities for the other IF
the value for Z has already been fixed at some
value.
34
Example Classical Test Theory

One true score, 5 identically distributed,
conditionally independent observable test scores.

TrueScore is probably about such such (a
probability distribution)
Distribution of observable test score is true
score noise
Since
Test3 score
Test1 score
Test2 score
Test5 score
Test4 score
almost
35
Example Classical Test Theory

One true score, 5 identically distributed,
conditionally independent observable test scores.
Distribution of observable is true score noise.

almost
36
Example Classical Test Theory

Conditional distributions of observables are
identical, and dont depend on other observables

almost
37
Conditional distributions of observables given
TrueScore2
(What kind of reasoning is this?)
38

Prior (sort of normal) distribution for TrueScore

39
Example Classical Test Theory

Posterior distribution for TrueScore given Test14

(What kind of reasoning is this?)
almost
40
Example Classical Test Theory

Posterior for TrueScore given Test14, Test24,
Test35

almost
41
Example Classical Test Theory

Note shift in what we expect for Test 4

almost
42
Building up complex networks

Extending Wigmores ideas to probability-based
reasoning
Interrelationships among many variables modeled
in terms of important relationships among smaller
subsets of variables,
sometimes unobservable ones.

43
Building up complex networks

Recursive representation of probability
distributions
terms drop out when there is conditional
independence.
The relationship between recursive
representations and acyclic directed graphs
Edges (arrows) represent explicit dependence
relationships

44
Example The Asia Network
45
Mixed-Number Subtraction
Based on cognitive analyses, task construction,
and data collection by Dr. Kikumi Tatsuoka. See
Mislevy (1994, 1995) readings for details of
Bayes net.
46
Computation in Bayes nets

Very brief conceptual overview
For a bit more detail, see
Mislevy, R.J. (1995). Probability-based
inference in cognitive diagnosis. In P. Nichols,
S. Chipman, R. Brennan (Eds.), Cognitively
diagnostic assessment (pp. 43-71). Hillsdale,
NJ Erlbaum.
For a lot more detail, see
Jensen, F.V. (1996). An introduction to Bayesian
networks. New York Springer-Verlag.

47
Inference in a chain
Recursive representation
p(u,v,x,y,z) p(zy,x,v,u) p(yx,v,u) p(xv,u)
p(vu) p(u) p(zy)
p(yx) p(xv) p(vu) p(u).
U
V
X
Y
Z
p(zy)
p(yx)
p(xv)
p(vu)
48
Inference in a chain
Suppose we learn the value of X
Start here, by revising belief about X
U
V
X
Y
Z
p(zy)
p(yx)
p(xv)
p(vu)
49
Inference in a chain
Propagate information down the chain using
conditional probabilities
From updated belief about X, use conditional
probability to revise belief about Y
U
V
X
Y
Z
p(zy)
p(yx)
p(xv)
p(vu)
50
Inference in a chain
Propagate information down the chain using
conditional probabilities
From updated belief about Y, use conditional
probability to revise belief about Z
U
V
X
Y
Z
p(zy)
p(yx)
p(xv)
p(vu)
51
Inference in a chain
Propagate information up the chain using Bayes
Theorem
From updated belief about X, use Bayes Theorem to
revise belief about V
U
V
X
Y
Z
p(zy)
p(yx)
p(xv)
p(vu)
52
Inference in a chain
Propagate information up the chain using Bayes
Theorem
From updated belief about V, use Bayes Theorem to
revise belief about U
U
V
X
Y
Z
p(zy)
p(yx)
p(xv)
p(vu)
53
Inference in a tree
In a tree each variable has no more than one
parent. Suppose we learn the value of X. We can
update every variable in the tree using either
the conditional probability relationship or Bayes
theorem.
V
U
X
Y
Z
54
Inference in multiply-connected nets
In a multiply- connected graph, in at least one
instance there is more than one path from one
variable to another variable. Repeated
applications of Bayes theorem and conditional
probability at the level of individual variables
doesnt work.
V
W
U
X
Y
Z
55
Inference in multiply-connected nets
V
W

Key idea Group variables into subsets
(cliques) such that the subsets form a tree.

U
X
Y
Z
U,V,W
56
Inference in multiply-connected nets
V
W

Key idea Group variables into subsets
(cliques) such that the subsets form a tree.

U
X
Y
Z
U,V,W
U,V,X
57
Inference in multiply-connected nets
V
W

Key idea Group variables into subsets
(cliques) such that the subsets form a tree.

U
X
Y
Z
U,V,W
U,V,X
U,X,Y
58
Inference in multiply-connected nets
V
W

Key idea Group variables into subsets
(cliques) such that the subsets form a tree.
Can the update cliques with a generalized
version of updating individual variables in
cliques.

U
X
Y
Z
X,Z
U,V,W
U,V,X
U,X,Y

Write a Comment

User Comments (0)