Title: Basic Concepts and Definitions in ProbabilityBased Reasoning
1Basic Concepts and Definitions in
Probability-Based Reasoning
- Robert J. Mislevy
- University of Maryland
- September 14, 2006
2A quote from Glenn Shafer
- Probability is not really about numbers
- it is about the structure of reasoning.
- Glenn Shafer, quoted in Pearl, 1988, p. 77
3Views of Probability
- Two conceptions of probability
- Aleatory (chance)
- Long-run frequencies, mechanisms
- Probability is a property of the world
- Degree of belief (subjective)
- Probability is a property of Your state of
knowledge (de Finetti) and model of the situation - Same formal definitions machinery
- Aleatory paradigms as analogical basis for degree
of belief (Glenn Shafer)
4Frames of discernment
- Frame of discernment is all the possible
combinations of values of the variables your are
working with. (Shafer, 1976) - Discern detect, recognize, distinguish
- Property of you as much as property of world
- Depends on what you know and what your purpose is
- Frame of discernment can evolve over time
- Medical diagnosis
- Document literacy example (more information)
5Frames of Discernment in Assessment
- In Student Model, determining what aspects of
skill knowledge to use as explicit SM
variables--psych perspective, grainsize,
reporting requirements - In Evidence Model, evidence identification
(task scoring), evaluation rules map from unique
work product to common observed variables. - In Task Model, which aspects of situations are
important in task design to keep track of and
manipulate, to achieve assessments purpose? - Features vs. Values of Variables
6(Random) Variables
- We will start on variables with a finite number
of possible values. - Denote random variable by upper case, say X.
- Denote particular values and generic values by
lower case, x. - Y is the outcome of a coin flip yÃŽ h,t.
- Xi is the answer to Item i xi ÃŽ 0,1.
7Finite Probability Distributions
- Finite set of possible values x1,xn
- Prob(Xxj), P(Xxj), or more simply p(xj), is the
probability that X takes the value xj. - 0 p(xj) 1.
-
- P(Xxj or Xxm) p(xj) p(xm).
Kolmogorovs axioms (special case of probability
and finite distributions)
8Continuous Probability Distributions
- Infinitely many possible values eg, x
xÃŽ0,1, x xÃŽ(-,) - Events A1,Am are sets of possible values
- A1 x xlt0, A2 x xÃŽ(0,1), A3 x
xgt0, - P(Aj) is the probability that X takes a value in
Aj - 0 p(Aj) 1.
- If A1 Am are disjoint events that exhaust all
possible values of x, then - If Aj and Ak are disjoint events, P(Aj È Ak)
P(Aj) P(Ak).
9Jensens Icy Road Example
Police Inspector Smith is impatiently awaiting
the arrival of Mr. Holmes and Dr. Watson. They
are late, and Inspector Smith has another
important appointment (lunch). Looking out the
window he wonders whether the roads are icy.
Both are notoriously bad drivers, so if the roads
are icy they are likely to crash. His secretary
enters and tells him that Dr Watson has had a car
accident. Watson? OK. It could be worse icy
roads! Then Holmes has most probably crashed
too. Ill go for lunch now. Icy roads? the
secretary replies. It is far from being that
cold, and furthermore all of the roads are
salted. Inspector Smith is relieved. Bad luck
for Watson. Let us give Holmes ten minutes
more. (Jensen, 1996, p. 7)
Jensen, F.V. (1996). An introduction to Bayesian
networks. New York Springer-Verlag.
10From the Icy Road Example
- Ice Is there an icy road?
- Values Yes, No
- Initial Probabilities (.7, .3)
- (Note choice of values for variable icy road.)
11Icy Road Probabilities
Ice
P(Iceyes)
Yes
.7
.3
No
P(Iceno)
12Graph representation
X
the variable
13Hypergraph representation
X
p(x)
the probability distribution
the variable
14Joint probability distributions
- Two random variables, X and Y
- P(Xxj,Yyk), or p(xj, yk), is the probability
that X takes the value xj and Y takes the value
yk . - 0 p(xj , yk) 1.
-
15Marginal probability distributions 1
- Two discrete random variables, X and Y
- Recall P(Xxj,Yyk), or p(xj, yk), is the
probability that X takes the value xj and Y takes
the value yk - The marginal probability of a value xj of X is
the sum over all the possible joint probabilities
p(xj, yk) with that value of X -
16Conditional probability distributions
- Two random variables, X and Y
- P(XxjYyk), or p(xj yk), is the probability
that X takes the value xj given that Y takes the
value yk . - This is how we express relationships among
real-world phenomena - Coin flip p(heads) vs. p(headsBobReport)
- P(heart attackage, family history, blood
pressure) - P(February 10 high temperature geographical
location, February 9 high temperature) - IRT P(Xj1) vs. P(Xj1q)
17Conditional probability distributions
- Two discrete random variables, X and Y
- P(XxjYyk), or p(xj yk), is the probability
that X takes the value xj given that Y takes the
value yk . As if is an important idea here.
- A picture will make clear what this means
18Conditional probability distributions
19Conditional probability distributions
Probabilities for X conditional on
Yy2 p(x1y2) p(x1,y2) / p(y2)
.3 / .4 .75 p(x2y2)
p(x2,y2) / p(y2) .1 / .4
.25
20Conditional probability distributions
Probabilities for Y conditional on
Xx1 p(y1x1) p(x1,y1) / p(x1)
.1 / .6 .167 p(y2x1)
p(x1,y2) / p(x1) .3 / .6
. 50
21Conditional probability distributions
- Kolmogorovs axioms hold for conditional
probabilities too, fixing what is on the right
side of the conditioning bar - 0 p(xj yk) 1 for each given yk.
- for each given yk
- P(Xxj or Xxm Yyk) p(xj yk) p(xm yk).
22Marginal probability distributions 2
- Two discrete random variables, X and Y
- Recall p(xj yk), is the probability that X xj
given Y yk . - The marginal probability of a value of X is the
sum of its conditional probabilities given all
possible values of Y, with each weighted by its
probability -
23Graph representation
X
Y
the parent variable
the child variable
24Hypergraph representation
X
Y
p(x)
p(yx)
Marginal probability distribution for parent
Conditional probability distribution for child
given parent
Parent variable
Child variable
25Hypergraph representation
X
p(x)
Y
p(yx,z)
Z
p(z)
Conditional probability distribution for child
given parents
Child variable
Marginal probability distributions for parents
Parent variables
26A relationship between joint and conditional
probability distributions
- p(xj, yk) p(xj yk) p(yk)
- p(yk xj) p(xj) .
- Basis of Bayes Theorem
- p(yk xj) p(xj) p(xj yk) p(yk)
- Þ p(yk xj) p(xj yk)
p(yk) / p(xj) - Þ p(yk xj) µ p(xj yk)
p(yk).
27Bayes Theorem
- The setup, with two random variables, X and Y
- You know conditional probabilities, p(xj yk),
which tell you what to believe about X if you
knew the value of Y. - You learn Xx what should you believe about Y?
- You combine two things
- Relative conditional probabilities (the
likelihood) - Previous probabilities about Y values
posterior likelihood
prior
28From the Icy Road Example
- Ice Is there an icy road?
- Values Yes, No
- Initial Probabilities (.7, .3)
- Watson Does Watson have a car crash?
- Values Yes, No
- Probabilities conditional on Icy Road
- (.8, .2) if IceYes, (.1, .9) if IceNo.
29Icy Road Conditional Probabilities
Watson
No
Yes
Ice
.2
Yes
.8
.9
.1
No
p(WatsonnoIceyes)
p(WatsonyesIceyes)
30Icy Road Conditional Probabilities
Watson
No
Yes
Ice
.2
Yes
.8
.9
.1
No
p(WatsonnoIceno)
p(WatsonyesIceno)
31Icy Road Likelihoods
Note 2/9 ratio
Watson
No
Yes
Ice
p(WatsonnoIceyes)
.2
Yes
.8
.9
.1
No
p(WatsonnoIceno)
32Icy Road Likelihoods
Note 8/1 ratio
Watson
No
Yes
Ice
p(WatsonyesIceyes)
.2
Yes
.8
.9
.1
No
p(WatsonyesIceno)
33Icy Road Bayes TheoremIf Watson yes
Prior Likelihood µ Posterior
34Icy Road Bayes TheoremIf Watson yes
Prior Likelihood µ Posterior
Note Sum .59, not 1.00. These arent
probabilities.
35Icy Road Bayes TheoremIf Watson yes
Prior Likelihood µ Posterior
Yes
.95
.05
Divide through by normalizing constant .59 to get
posterior probabilities.
36Independence
- Independence
- The probability of the joint occurrence of
values of two variables is always equal to the
product of the probabilities individually - P(Xx,Yy) P(Xx) P(Yy).
- Equivalent to saying that learning the value of
one of the variables does not change your belief
about the other.
37Independence
X and Y independent
X and Y not independent X and Y are dependent
38Conditional independence
- Conditional independence
- The conditional probability of the joint
occurrence given the value of another variable is
always equal to the product of the conditional
probabilities - P(Xx,YyZz) P(Xx Zz) P(Yy Zz).
39Conditional independence
- Conditional independence is not a grace of
nature for which we must wait passively, but
rather a psychological necessity which we satisfy
actively by organizing our knowledge in a
specific way. - An important tool in such organization is the
identification of intermediate variables that
induce conditional independence among
observables if such variables are not in our
vocabulary, we create them. - In medical diagnosis, for instance, when some
symptoms directly influence one another, the
medical profession invents a name for that
interaction (e.g., syndrome, complication,
pathological state) and treats it as a new
auxiliary variable that induces conditional
independence dependency between any two
interacting systems is fully attributed to the
dependencies of each on the auxiliary variable.
(Pearl, 1988, p. 44)
40Example Icy Road