Title: An Introduction to Bayesian Inference
1An Introduction to Bayesian Inference
- Robert J. Mislevy
- University of Maryland
- March 19, 2002
2A quote from Glenn Shafer
- Probability is not really about numbers
- it is about the structure of reasoning.
- Glenn Shafer, quoted in Pearl, 1988, p. 77
3Views of Probability
- Two conceptions of probability
- Aleatory (chance)
- Long-run frequencies, mechanisms
- Probability is a property of the world
- Degree of belief (subjective)
- Probability is a property of Your state of
knowledge (de Finetti) - Same formal definitions machinery
- Aleatory paradigms as analogical basis for degree
of belief (Glenn Shafer)
4Frames of discernment
- Frame of discernment is all the possible
combinations of values of the variables your are
working with. (Shafer, 1976) - Discern detect, recognize, distinguish
- Property of you as much as property of world
- Depends on what you know and what your purpose is
(e.g., expert vs. novice) - Frame of discernment can evolve over time
5Frames of Discernment in Assessment
- In Student Model, determining what aspects of
skill knowledge to use as explicit SM
variables--psych perspective, grainsize,
reporting requirements - In Evidence Model, evidence identification
(task scoring), evaluation rules map from
unique work product to common observed variables. - In Task Model, which aspects of situations are
important in task design to keep track of and
manipulate, to achieve assessments purpose? - Task features versus values of Task Model
variables
6Random Variables
- We will concentrate on variables with a finite
number of possible values. - Denote random variable by upper case, say X.
- Denote particular values and generic values by
lower case, x. - Y is the outcome of a coin flip yÃŽ h,t.
- Xi is the answer to Item i xi ÃŽ 0,1.
- Zjk is the rating of Judge k to Essay j
zjk ÃŽ 0,1,...,5.
7Finite Probability Distributions
- Finite set of possible values x1,xn
- P(Xxj), or more simply p(xj), is the probability
that X takes the value xj. - 0 p(xj) 1.
-
- P(Xxj or Xxm) p(xj) p(xm).
8MSBNx representation
MSBNx (Microsoft Bayesian Network editor).
http//research.microsoft.com/adapt/MSBNx/default.
asp
9Hypergraph representation
X
p(x)
the probability distribution
the variable
10Conditional probability distributions
- Two random variables, X and Y
- P(XxjYyk), or p(xj yk), is the probability
that X takes the value xj given that Y takes the
value yk . - This is how we express relationships among
real-world phenomena - P(heart attackage, family history, blood
pressure) - P(tomorrows high temperature geographical
location, date, todays high) - IRT P(Xj1) vs. P(Xj1q)
- Coin flip p(heads) vs. p(headsRJM_strategy)
11Conditional probability distributions
- Two random variables, X and Y
- P(XxjYyk), or p(xj yk), is the probability
that X takes the value xj given that Y takes the
value yk . - 0 p(xj yk) 1 for a given yk.
- for a given yk
- P(Xxj or Xxm Yyk) p(xj yk) p(xm yk).
12Hygiene Example Variables
One observable variable Quality of Patient
History Taken Adequate, Inadequate One
unobservable ability Level of Proficiency
Expert, Novice Want to model probability
of taking an adequate history as depending on
level of proficiency.
13Hygiene Example Conditional Probabilities
One observable Adequate Patient History Taken,
Yes or No One unobservable ability, Expert or
Novice
p(inadequateexpert)
What kind of reasoning is this direction?
p(adequateexpert)
14Hygiene Example Conditional Probabilities
One observable Adequate Patient History Taken,
Yes or No One unobservable ability, Expert or
Novice
p(inadequatenovice)
p(adequatenovice)
15Tie-in with Toulmin diagram
Harry is an expert hygienist.
Novices sometimes take adequate patient histories
unless
Experts tend to take adequate patient histories
since
so
Harry took an adequate patient history.
16Tie-in with Toulmin diagram
Harry is an expert hygienist.
unless
since
so
Harry took an adequate patient history.
17Tie-in with Toulmin diagram
Pr(Harry is an expert hygienistadequate
history) .82
unless
since
so
Harry took an adequate patient history.
18Bayes nets MSBNx Setup
History (i.e., quality of patient history
procedure) is modeled as being conditionally
dependent on Proficiency. That is,
Proficiency is a parent of History
equivalently, History is a child of
Proficiency. Note that Proficiency has no
parents.
19Bayes nets MSBNx Setup
This is the probability table expressing initial
belief about Proficiency.
20Bayes nets MSBNx Setup
This is the probability table expressing belief
about History. Note there there are two
different conditional distributions for the
values of History one is relevant if Proficiency
Expert and a different one if
ProficiencyNovice.
21Bayes nets MSBNx Setup
Belief if you know ProficiencyExpert. You know
that with certainty (probability 1), and what you
expect about History is P(HistoryProficiencyExp
ert) .8 for adequate and .2 for inadequate.
22Bayes nets MSBNx Setup
Click here to bring up the evaluation window.
23Bayes nets MSBNx Setup
Belief if you know ProficiencyExpert. You know
that with certainty (probability 1), and what you
expect about History is P(HistoryProficiencyExp
ert) .8 for adequate and .2 for inadequate.
24Bayes nets MSBNx Setup
Belief if you know ProficiencyNovice. You know
that with certainty (probability 1), and what you
expect about History is P(HistoryProficiencyNov
ice) .4 for adequate and .6 for inadequate.
25Bayes nets MSBNx Setup
Depiction if you dont know the value of either
Proficiency or History. Starting with 70-30
belief about Expert/Novice. The conditional
distributions for History are averaged together
to get the marginal (overall) distribution for
History -- appropriate if you dont know
Proficiency level.
26Bayes Theorem
- The setup, with two random variables, X and Y
- You know conditional probabilities, p(xj yk),
which tell you what to believe about X if you
knew the value of Y. - You learn Xx what should you believe about Y?
- You combine two things
- Relative conditional probabilities (the
likelihood) - Previous probabilities about Y values
posterior likelihood
prior
27Hygiene Example Likelihoods
One observable Adequate Patient History Taken,
Yes or No One unobservable proficiency, Expert or
Novice
What kind of reasoning is this direction?
p(adequateexpert)
p(adequatenovice)
Note 21 ratio
28Hygiene Example Likelihoods
One observable Adequate Patient History Taken,
Yes or No One unobservable proficiency, Expert or
Novice
p(inadequateexpert)
p(inadequatenovice)
Note 13 ratio
29Bayes nets Posterior Probabilities
Posterior distribution for Proficiency if
HistoryAdequate is observed. The 21 ratio
favoring Expert multiplies the 7030 ratio we
started with. The products are rescaled so they
sum to one.
30Bayes nets Posterior Probabilities
Posterior distribution for Ability if
HistoryInadequate is observed. The 13 ratio
for Expert/Novice from the likelihood has
multiplied the 7030 prior.
31Conditional independence
- Conditional independence is not a grace of
nature for which we must wait passively, but
rather a psychological necessity which we satisfy
actively by organizing our knowledge in a
specific way. - An important tool in such organization is the
identification of intermediate variables that
induce conditional independence among
observables if such variables are not in our
vocabulary, we create them. - In medical diagnosis, for instance, when some
symptoms directly influence one another, the
medical profession invents a name for that
interaction (e.g., syndrome, complication,
pathological state) and treats it as a new
auxiliary variable that induces conditional
independence dependency between any two
interacting systems is fully attributed to the
dependencies of each on the auxiliary variable.
(Pearl, 1988, p. 44)
32Conditional independence
- Independence
- The probability of the joint occurrence of
values of two variables is always equal to the
product of the probabilities individually - P(Xx,Yy) P(Xx) P(Yy).
- Conditional independence
- The conditional probability of the joint
occurrence given the value of another variable is
always equal to the product of the conditional
probabilities - P(Xx,YyZz) P(Xx Zz) P(Yy Zz).
33Conditional independence
- MSBNx diagram with Independence
-
- MSBNx diagram with Conditional independence
Entering a value for one variable, X or Y,
doesnt change the probabilities for the other.
Entering a value for one variable, X or Y,
doesnt change the probabilities for the other IF
the value for Z has already been fixed at some
value.
34Example Classical Test Theory
- One true score, 5 identically distributed,
conditionally independent observable test scores.
TrueScore is probably about such such (a
probability distribution)
Distribution of observable test score is true
score noise
Since
Test3 score
Test1 score
Test2 score
Test5 score
Test4 score
almost
35Example Classical Test Theory
- One true score, 5 identically distributed,
conditionally independent observable test scores. - Distribution of observable is true score noise.
almost
36Example Classical Test Theory
- Conditional distributions of observables are
identical, and dont depend on other observables
almost
37Conditional distributions of observables given
TrueScore2
(What kind of reasoning is this?)
38- Prior (sort of normal) distribution for TrueScore
39Example Classical Test Theory
- Posterior distribution for TrueScore given Test14
(What kind of reasoning is this?)
almost
40Example Classical Test Theory
- Posterior for TrueScore given Test14, Test24,
Test35
almost
41Example Classical Test Theory
- Note shift in what we expect for Test 4
almost
42Building up complex networks
- Extending Wigmores ideas to probability-based
reasoning - Interrelationships among many variables modeled
in terms of important relationships among smaller
subsets of variables, - sometimes unobservable ones.
43Building up complex networks
- Recursive representation of probability
distributions - terms drop out when there is conditional
independence. - The relationship between recursive
representations and acyclic directed graphs - Edges (arrows) represent explicit dependence
relationships
44Example The Asia Network
45Mixed-Number Subtraction
Based on cognitive analyses, task construction,
and data collection by Dr. Kikumi Tatsuoka. See
Mislevy (1994, 1995) readings for details of
Bayes net.
46Computation in Bayes nets
- Very brief conceptual overview
- For a bit more detail, see
- Mislevy, R.J. (1995). Probability-based
inference in cognitive diagnosis. In P. Nichols,
S. Chipman, R. Brennan (Eds.), Cognitively
diagnostic assessment (pp. 43-71). Hillsdale,
NJ Erlbaum. - For a lot more detail, see
- Jensen, F.V. (1996). An introduction to Bayesian
networks. New York Springer-Verlag.
47Inference in a chain
Recursive representation
p(u,v,x,y,z) p(zy,x,v,u) p(yx,v,u) p(xv,u)
p(vu) p(u) p(zy)
p(yx) p(xv) p(vu) p(u).
U
V
X
Y
Z
p(zy)
p(yx)
p(xv)
p(vu)
48Inference in a chain
Suppose we learn the value of X
Start here, by revising belief about X
U
V
X
Y
Z
p(zy)
p(yx)
p(xv)
p(vu)
49Inference in a chain
Propagate information down the chain using
conditional probabilities
From updated belief about X, use conditional
probability to revise belief about Y
U
V
X
Y
Z
p(zy)
p(yx)
p(xv)
p(vu)
50Inference in a chain
Propagate information down the chain using
conditional probabilities
From updated belief about Y, use conditional
probability to revise belief about Z
U
V
X
Y
Z
p(zy)
p(yx)
p(xv)
p(vu)
51Inference in a chain
Propagate information up the chain using Bayes
Theorem
From updated belief about X, use Bayes Theorem to
revise belief about V
U
V
X
Y
Z
p(zy)
p(yx)
p(xv)
p(vu)
52Inference in a chain
Propagate information up the chain using Bayes
Theorem
From updated belief about V, use Bayes Theorem to
revise belief about U
U
V
X
Y
Z
p(zy)
p(yx)
p(xv)
p(vu)
53Inference in a tree
In a tree each variable has no more than one
parent. Suppose we learn the value of X. We can
update every variable in the tree using either
the conditional probability relationship or Bayes
theorem.
V
U
X
Y
Z
54Inference in multiply-connected nets
In a multiply- connected graph, in at least one
instance there is more than one path from one
variable to another variable. Repeated
applications of Bayes theorem and conditional
probability at the level of individual variables
doesnt work.
V
W
U
X
Y
Z
55Inference in multiply-connected nets
V
W
- Key idea Group variables into subsets
(cliques) such that the subsets form a tree. -
U
X
Y
Z
U,V,W
56Inference in multiply-connected nets
V
W
- Key idea Group variables into subsets
(cliques) such that the subsets form a tree. -
U
X
Y
Z
U,V,W
U,V,X
57Inference in multiply-connected nets
V
W
- Key idea Group variables into subsets
(cliques) such that the subsets form a tree. -
U
X
Y
Z
U,V,W
U,V,X
U,X,Y
58Inference in multiply-connected nets
V
W
- Key idea Group variables into subsets
(cliques) such that the subsets form a tree. - Can the update cliques with a generalized
version of updating individual variables in
cliques.
U
X
Y
Z
X,Z
U,V,W
U,V,X
U,X,Y