Title: Truth-conduciveness Without Reliability: A Non-Theological Explanation of Ockham
1Truth-conduciveness Without Reliability A
Non-Theological Explanation of Ockhams Razor
- Kevin T. Kelly
- Department of Philosophy
- Carnegie Mellon University
- www.cmu.edu
2I. The Puzzle
3Which Theory is True?
???
4Ockham Says
Choose the Simplest!
5But Why?
Gotcha!
6Puzzle
- An indicator must be sensitive to what it
indicates.
simple
7Puzzle
- An indicator must be sensitive to what it
indicates.
complex
8Puzzle
- But Ockhams razor always points at simplicity.
simple
9Puzzle
- But Ockhams razor always points at simplicity.
complex
10Puzzle
- If a broken compass is known to point North, then
we already know where North is.
complex
11Puzzle
- But then who needs the compass?
complex
12Proposed Answers
13A. Evasions
Truth
14A. Evasions
Brevity
Testability
Unity
Explanation
Truth
15Virtues
- Simple theories have virtues
- Testable
- Unified
- Explanatory
- Symmetrical
- Bold
- Compress data
16Virtues
- Simple theories have virtues
- Testable
- Unified
- Explanatory
- Symmetrical
- Bold
- Compress data
- But to assume that the truth has these virtues is
wishful thinking.
van Fraassen
17Convergence
- At least a simplicity bias doesnt prevent
convergence to the truth.
truth
Complexity
18Convergence
- At least a simplicity bias doesnt prevent
convergence to the truth.
truth
Plink!
Blam!
Complexity
19Convergence
- At least a simplicity bias doesnt prevent
convergence to the truth.
truth
Plink!
Blam!
Complexity
20Convergence
- At least a simplicity bias doesnt prevent
convergence to the truth.
truth
Plink!
Blam!
Complexity
21Convergence
- Convergence allows for any theory choice whatever
in the short run, so this is not an argument for
Ockhams razor now.
truth
Alternative ranking
22 Overfitting
- Empirical estimates based on complex models have
greater expected distance from the truth
Truth
23 Overfitting
- Empirical estimates based on complex models have
greater expected distance from the truth.
Pop! Pop! Pop! Pop!
24 Overfitting
- Empirical estimates based on complex models have
greater expected distance from the truth.
Truth
clamp
25 Overfitting
- Empirical estimates based on complex models have
greater expected distance from the truth.
Pop! Pop! Pop! Pop!
Truth
clamp
26 Overfitting
- ...even if the simple theory is known to be false
Four eyes!
clamp
27C. Circles
28Prior Probability
- Assign high prior probability to simple theories.
Simplicity is plausible now because it was
yesterday.
29Miracle Argument
- e would not be a miracle given C
- e would be a miracle given P.
q
P
C
30Miracle Argument
- e would not be a miracle given C
- e would be a miracle given P.
q
C
S
31However
- e would not be a miracle given P(q)
Why not this?
q
C
S
32The Real Miracle
Ignorance about model p(C) ?
p(P) Ignorance about parameter setting
p(P(q) P) ? p(P(q ) P). Knowledge about
C vs. P(q) p(P(q)) ltlt p(C).
Is it knognorance or Ignoredge?
33The Ellsberg Paradox
1/3
?
?
34The Ellsberg Paradox
1/3
?
?
gt 1/3
Human betting preferences
gt
35The Ellsberg Paradox
1/3
?
?
gt 1/3
lt 1/3
Human betting preferences
gt
gt
36Human View
knowledge
ignorance
1/3
?
?
Human betting preferences
gt
gt
37Bayesian View
ignoredge
ignoredge
1/3
1/3
1/3
Human betting preferences
gt
gt
38Moral
1/3
?
?
Even in the most mundane contexts, when Bayesians
offer to replace our ignorance with ignoredge, we
vote with our feet.
39Probable Tracking
- If the simple theory S were true, then the data
would probably be simple so Ockhams razor would
probably believe S. - If the simple theory S were false, then the
complex alternative theory C would be true, so
the data would probably be complex so you would
probably believe C rather than S.
40Probable Tracking
Given that you use Ockhams razor p(B(S) S)
p(eS S) 1. p(not-B(S) not-S) 1 - p(eS
C) 1.
41Probable Tracking
Given that you use Ockhams razor p(B(C) C)
1 probability that the data look simple given
C. p(B(C) not-C) 0 probability that the
data look simple given alternative theory P.
42B. Magic
Truth
Simplicity
43 Magic
- Simplicity informs via hidden causes.
G
44 Magic
- Simpler to explain Ockhams razor without hidden
causes.
?
45Reductio of Naturalism (Koons 2000)
- Suppose that the crucial probabilities p(Tq T)
in the Bayesian miracle argument are natural
chances, so that Ockhams razor really is
reliable. - Suppose that T is the fundamental theory of
natural chance, so that Tq determines the true pq
for some choice of q. - But if pt(Tq) is defined at all, it should be 1
if t q and 0 otherwise. - So natural science can only produce fundamental
knowledge of natural chance if there are
non-natural chances.
46Diagnosis
- Indication or tracking
- Too strong
- Circles, evasions, or magic required.
- Convergence
- Too weak
- Doesnt single out simplicity
Complex
Simple
Simple
Complex
47Diagnosis
- Indication or tracking
- Too strong
- Circles or magic required.
- Convergence
- Too weak
- Doesnt single out simplicity
- Straightest convergence
- Just right?
Complex
Simple
Simple
Complex
Complex
Simple
48II. Straightest Convergence
Complex
Simple
49Empirical Problems
- Set K of infinite input sequences.
- Partition of K into alternative theories.
K
T1
T2
T3
50Empirical Methods
- Map finite input sequences to theories or to ?.
T3
K
T1
T2
T3
e
51Method Choice
Output history
At each stage, scientist can choose a new method
(agreeing with past theory choices).
T1
T2
T3
e1
e2
e3
e4
Input history
52Aim Converge to the Truth
T3
?
T2
?
T1
T1
T1
T1
. . .
T1
T1
T1
K
T1
T2
T3
53Retraction
- Choosing T and then not choosing T next
T
T
?
54Aim Eliminate Needless Retractions
Truth
55Aim Eliminate Needless Retractions
Truth
56Aim Eliminate Needless Delays to Retractions
theory
57Aim Eliminate Needless Delays to Retractions
application
theory
application
application
application
corollary
application
application
application
corollary
application
corollary
58Easy Retraction Time Comparisons
Method 1
T1
T1
T2
T2
T2
T2
T4
T4
T4
. . .
T1
T1
T2
T2
T3
T3
T2
T4
T4
. . .
at least as many at least as late
Method 2
59Worst-case Retraction Time Bounds
(1, 2, 8)
. . .
. . .
. . .
T1
T2
T3
T3
T3
T4
T3
. . .
T1
T2
T3
T3
T3
T4
T4
. . .
T1
T2
T3
T3
T4
T4
T4
. . .
T1
T2
T4
T3
T4
T4
T4
. . .
Output sequences
60II. Ockham Without Circles, Evasions, or Magic
61Curve Fitting
- Data open intervals around Y at rational values
of X.
62Curve Fitting
63Curve Fitting
64Curve Fitting
65Empirical Effects
66Empirical Effects
67Empirical Effects
May take arbitrarily long to discover
68Empirical Effects
May take arbitrarily long to discover
69Empirical Effects
May take arbitrarily long to discover
70Empirical Effects
May take arbitrarily long to discover
71Empirical Effects
May take arbitrarily long to discover
72Empirical Effects
May take arbitrarily long to discover
73Empirical Effects
May take arbitrarily long to discover
74Empirical Theories
- True theory determined by which effects appear.
75Empirical Complexity
More complex
76Background Constraints
More complex
77Background Constraints
?
More complex
78Background Constraints
?
More complex
79Ockhams Razor
- Dont select a theory unless it is uniquely
simplest in light of experience.
80Weak Ockhams Razor
- Dont select a theory unless it among the
simplest in light of experience.
81Stalwartness
- Dont retract your answer while it is uniquely
simplest
82Stalwartness
- Dont retract your answer while it is uniquely
simplest
83Uniform Problems
- All paths of accumulating effects starting at a
level have the same length.
84Timed Retraction Bounds
- r(M, e, n) the least timed retraction bound
covering the total timed retractions of M along
input streams of complexity n that extend e
M
. . .
. . .
Empirical Complexity
0
1
2
3
85Efficiency of Method M at e
- M converges to the truth no matter what
- For each convergent M that agrees with M up to
the end of e, and for each n - r(M, e, n) ? r(M, e, n)
M
M
. . .
. . .
Empirical Complexity
0
1
2
3
86M is Strongly Beaten at e
- There exists convergent M that agrees with M up
to the end of e, such that - For each n, r(M, e, n) gt r(M, e, n).
M
M
. . .
. . .
Empirical Complexity
0
1
2
3
87M is Weakly Beaten at e
- There exists convergent M that agrees with M up
to the end of e, such that - For each n, r(M, e, n) ? r(M, e, n)
- Exists n, r(M, e, n) gt r(M, e, n).
M
M
. . .
. . .
Empirical Complexity
0
1
2
3
88Idea
- No matter what convergent M has done in the past,
nature can force M to produce each answer down an
arbitrary effect path, arbitrarily often. - Nature can also force violators of Ockhams razor
or stalwartness either into an extra retraction
or a late retraction in each complexity class.
89Ockham Violation with Retraction
Extra retraction in each complexity class
Ockham violation
90Ockham Violation without Retraction
Late retraction in each complexity class
Ockham violation
91Uniform Ockham Efficiency Theorem
- Let M be a solution to a uniform problem. The
following are equivalent - M is strongly Ockham and stalwart at e
- M is efficient at e
- M is not strongly beaten at e.
92Idea
- Similar, but if convergent M already violates
strong Ockhams razor by favoring an answer T at
the root of a longer path, sticking with T may
reduce retractions in complexity classes reached
only along the longer path.
93Violation Favoring Shorter Path
Non-uniform problem
?
Late or extra retraction in each complexity class
Ockham violation
94Violation Favoring Longer Path without Retraction
Non-uniform problem
?
Ouch! Extra retraction in each complexity class!
Ockham violation
95But at First Violation
Non-uniform problem
?
?
Breaks even each class.
?
First Ockham violation
96But at First Violation
Non-uniform problem
?
?
Breaks even each class.
?
Loses in class 0 when truth is red.
First Ockham violation
97Ockham Efficiency Theorem
- Let M be a solution. The following are
equivalent - M is always strongly Ockham and stalwart
- M is always efficient
- M is never weakly beaten.
98Application Causal Inference
- Causal graph theory more correlations ? more
causes. - Idealized data list of conditional dependencies
discovered so far. - Anomaly the addition of a conditional
dependency to the list.
partial correlations
S
G(S)
99Causal Path Rule
- X, Y are dependent conditional on set S of
variables not containing X, Y iff X, Y are
connected by at least one path in which - no non-collider is in S and
- each collider has a descendent in S.
X
Y
S
Pearl, SGS
100Forcible Sequence of Models
X
Y
Z
W
101Forcible Sequence of Models
X
Y
Z
W
X dep Y Z, W, Z,W
102Forcible Sequence of Models
X
Y
Z
W
X dep Y Z, W, Z,W Y dep Z X, W,
X,W X dep Z Y, Y,W
103Forcible Sequence of Models
X
Y
Z
W
X dep Y Z, W, Z,W Y dep Z X, W,
X,W X dep Z Y, W, Y,W
104Forcible Sequence of Models
X
Y
Z
W
X dep Y Z, W, Z,W Y dep Z X, W,
X,W X dep Z Y, W, Y,W Z dep W X,
Y, X,Y Y dep W Z, X,Z
105Forcible Sequence of Models
X
Y
Z
W
X dep Y Z, W, Z,W Y dep Z X, W,
X,W X dep Z Y, W, Y,W Z dep W X,
Y, X,Y Y dep W X, Z, X,Z
106Policy Prediction
- Consistent policy estimator can be forced into
retractions. - Failure of uniform consistency.
- No non-trivial confidence interval.
Y
Z
Y
Z
Y
Z
Robins, Wasserman, Zhang
Y
Z
107Moral
- Not true model vs. prediction.
- Issue actual vs. counterfactual model selection
and prediction. - In counterfactual prediction, form of model
matters and retractions are unavoidable.
Y
Z
Y
Z
Y
Z
Y
Z
108IV. Simplicity
109Aim
- General definition of simplicity.
- Prove Ockham efficiency theorem for general
definition.
110Approach
- Empirical complexity reflects nested problems of
induction posed by the problem. - Hence, simplicity is problem-relative.
111Empirical Problems
- Set K of infinite input sequences.
- Partition of K into alternative theories.
K
T1
T2
T3
112Grove Systems
- A sphere system for K is just a downward-nested
sequence of subsets of K starting with K.
K
2
1
0
113Grove Systems
- Think of successive differences as levels of
increasing empirical complexity in K.
2
1
0
114Answer-preserving Grove Systems
- No answer is split across levels.
2
1
0
115Answer-preserving Grove Systems
- Refine offending answer if necessary.
2
1
0
116Data-driven Grove Systems
- Each answer is decidable given a complexity
level. - Each upward union of levels is verifiable.
Verifiable
Decidable
Decidable
117Grove System Update
118Grove System Update
1
0
119Forcible Grove Systems
- At each stage, the data presented by a world at a
level are compatible with the next level up (if
there is a next level).
. . .
120Forcible Path
- A forcible restriction of a Grove system.
121Forcible Path to Top
- A forcible restriction of a Grove system that
intersects with every level.
122Simplicity Concept
- A data-driven, answer-preserving Grove system for
which each restriction to a possible data event
has a forcible path to the top.
123Uniform Simplicity Concepts
- If a data event intersects a level, it intersects
each higher level.
124Uniform Ockham Efficiency Theorem
- Let M be a solution to a uniform problem. The
following are equivalent - M is strongly Ockham and stalwart at e
- M is efficient at e
- M is strongly beaten at e.
125Ockham Efficiency Theorem
- Let M be a solution. The following are
equivalent - M is always strongly Ockham and stalwart
- M is always efficient
- M is never weakly beaten.
126V. Stochastic Ockham
127Mixed Strategies
- Require that the strategy converge in chance to
the true model.
Chance of producing true model at parameter q
. . .
Sample size
128Retractions in Chance
- Total drop in chance of producing an arbitrary
answer as sample size increases. - Retraction in signal, not actual retractions due
to noise.
Chance of producing true model at parameter q
. . .
Sample size
129Ockham Efficiency
- Bound retractions in chance by easy comparisons
of time and magnitude. - Ockham efficiency still follows.
(0, 0, .5, 0, 0, 0, .5, 0, 0, )
Chance of producing true model at parameter q
. . .
Sample size
130Classification Problems
- Points from plane sampled IID, labeled with
half-plane membership. Edge of half-plane is
some polynomial. What is its degree? - Uniform Ockham efficiency theorem applies.
Cosma Shalizi
131Model Selection Problems
- Random variables.
- IID sampling.
- Joint distribution continuously parametrized.
- Partition over parameter space.
- Each partition cell is a model.
- Method maps sample sequences to models.
132Two Dimensional Example
- Assume independent bivariate normal distribution
of unit variance. - Question how many components of the joint mean
are zero? - Intuition more nonzeros more complex
- Puzzle How does it help to favor simplicity in
less-than-simplest worlds?
133A Standard Model Selection Method
- Bayes Information Criterion (BIC)
- BIC(M, sample)
- - log(max prob that M can assign to sample)
- log(sample size) ?? model complexity ? ½.
- BIC method choose M with least BIC score.
134Official BIC Property
- In the limit, minimizing BIC finds a model with
maximal conditional probability when the prior
probability is flat over models and fairly flat
over parameters within a model. - But it is also mind-change-efficient.
135Toy Problem
- Truth is bivariate normal of known covariance.
- Count non-zero components of mean vector.
136Pure Method
- Acceptance zones for different answers in sample
mean space.
Simple
Complex
137Performance in Simplest World
Simple
Complex
95
138Performance in Simplest World
Simple
Complex
139Performance in Simplest World
Simple
Complex
140Performance in Simplest World
Simple
Complex
141Performance in Simplest World
Simple
Complex
142Performance in Complex World
Simple
Complex
95
143Performance in Complex World
Simple
Complex
144Performance in Complex World
Simple
Complex
145Performance in Complex World
- n 4,000,000 (!)
- m (.05, .005).
Simple
Complex
146Causal Inference from Stochastic Data
- Suppose that the true linear causal model is
Variables are standard normal
.998
X
Y
Z
W
.1
.99
-.99
147Causal Inference from Stochastic Data
Scheines, Mayo-Wilson, and Fancsali
Sample size 40. In 9 out of 10 samples, PC
algorithm outputs
X
Y
Z
W
Sample size 100,000. In 9 out of 10 samples, PC
outputs truth
Variables standard normal
X
Y
Z
W
148Deterministic Sub-problems
Membership Degree 1
w
Membership degree 0
n
- Worst-case cost at w
- supw mem(w, w) X cost(w)
- Worst-case cost supw worst-case cost at w.
149Statistical Sub-problems
Membership(p, p) 1 r(p, p)
p
p
p
- Worst-case cost at p
- supw mem(p, p) X cost(p)
- Worst-case cost supp worst-case cost at p.
150Future Direction
- a-Consistency Converge to production of true
answer with chance gt 1 - a. - Compare worst-case timed bounds on retractions in
chance of a-consistent methods over each
complexity class. - Generalized power minimizing retraction time
forces simple acceptance zones to be powerful. - Generalized significance minimizing retractions
forces simple zone to be size a - Balance balance depends on a.
151V. Conclusion
152Ockhams Razor
- Necessary for staying on the straightest path to
the truth - Does not point at or indicate the truth.
- Works without circles, evasions, or magic.
- Such a theory is motivated in counterfactual
inference and estimation.
153Further Reading
(with C. Glymour) Why Probability Does Not
Capture the Logic of Scientific Justification,
C. Hitchcock, ed., Contemporary Debates in the
Philosophy of Science, Oxford Blackwell,
2004. Justification as Truth-finding Efficiency
How Ockham's Razor Works, Minds and Machines 14
2004, pp. 485-505. Ockham's Razor, Efficiency,
and the Unending Game of Science, forthcoming in
proceedings, Foundations of the Formal Sciences
2004 Infinite Game Theory, Springer, under
review. How Simplicity Helps You Find the Truth
Without Pointing at it, forthcoming, V.
Harazinov, M. Friend, and N. Goethe,
eds. Philosophy of Mathematics and Induction,
Dordrecht Springer. Ockham's Razor, Empirical
Complexity, and Truth-finding Efficiency,
forthcoming, Theoretical Computer
Science. Learning, Simplicity, Truth, and
Misinformation, forthcoming inVan Benthem, J.
and Adriaans, P., eds. Philosophy of Information.
154II. Navigation Without a Compass
155Asking for Directions
Wheres
156Asking for Directions
Turn around. The freeway ramp is on the left.
157Asking for Directions
158Helpful Advice
159Best Route
160Best Route to Any Goal
161Disregarding Advice is Bad
Extra U-turn
162Best Route to Any Goal
so fixed advice can help you reach a hidden
goal without circles, evasions, or magic.
163- There is no difference whatsoever in It. He goes
from death to death, who sees difference, as it
were, in It Brihadaranyaka 4.4.19-20 - "Living in the midst of ignorance and considering
themselves intelligent and enlightened, the
senseless people go round and round, following
crooked courses, just like the blind led by the
blind." Katha Upanishad I. ii. 5.
164Academic
165Academic
Poof!
If there werent an apple on the table I wouldnt
be a brain in a vat, so I wouldnt see one.