Title: Ockham
1Ockhams Razor What it is, What it isnt, How
it works, and How it doesnt
- Kevin T. Kelly
- Department of Philosophy
- Carnegie Mellon University
- www.cmu.edu
2Further Reading
Efficient Convergence Implies Ockham's Razor,
Proceedings of the 2002 International Workshop
on Computational Models of Scientific Reasoning
and Applications, Las Vegas, USA, June 24- 27,
2002. (with C. Glymour) Why Probability Does Not
Capture the Logic of Scientific Justification,
C. Hitchcock, ed., Contemporary Debates in the
Philosophy of Science, Oxford Blackwell,
2004. Justification as Truth-finding Efficiency
How Ockham's Razor Works, Minds and Machines
14 2004, pp. 485-505. Learning, Simplicity,
Truth, and Misinformation, The Philosophy of
Information, under review. Ockham's Razor,
Efficiency, and the Infinite Game of Science,
proceedings, Foundations of the Formal Sciences
2004 Infinite Game Theory, Springer, under
review.
3Which Theory to Choose?
Compatible with data
???
4Use Ockhams Razor
Complex
T1
T2
T3
Simple
5Dilemma
- If you know the truth is simple,
- then you dont need Ockham.
Complex
T1
T2
T3
Simple
6Dilemma
- If you dont know the truth is simple,
- then how could a fixed simplicity bias help you
if the truth is complex?
Complex
T1
T2
T3
Simple
T4
T5
7Puzzle
- A fixed bias is like a broken thermometer.
- How could it possibly help you find unknown truth?
Cold!
8I. Ockham Apologists
9Wishful Thinking
- Simple theories are nice if true
- Testability
- Unity
- Best explanation
- Aesthetic appeal
- Compress data
- So is believing that you are the emperor.
10 Overfitting
- Maximum likelihood estimates based on overly
complex theories can have greater predictive
error (AIC, Cross-validation, etc.). - Same is true even if you know the true model is
complex. - Doesnt converge to true model.
- Depends on random data.
Thanks, but a simpler model still has lower
predictive error.
The truth is complex. -God-
.
.
.
.
11Ignorance Knowledge
- Messy worlds are legion
- Tidy worlds are few.
- That is why the tidy worlds
- Are those most likely true. (Carnap)
unity
12Ignorance Knowledge
Messy worlds are legion Tidy worlds are
few. That is why the tidy worlds Are those most
likely true. (Carnap)
1/3
1/3
1/3
13Ignorance Knowledge
- Messy worlds are legion
- Tidy worlds are few.
- That is why the tidy worlds
- Are those most likely true. (Carnap)
2/6
2/6
1/6
1/6
14Depends on Notation
- But mess depends on coding,
- which Goodman noticed, too.
- The picture is inverted if
- we translate green to grue.
2/6
2/6
Notation Indicates truth?
1/6
1/6
15Same for Algorithmic Complexity
- Goodmans problem works against every fixed
simplicity ranking (independent of the processes
by which data are generated and coded prior to
learning). - Extra problem any pair-wise ranking of theories
can be reversed by choosing an alternative
computer language. - So how could simplicity help us find the true
theory?
Notation Indicates truth?
16Just Beg the Question
- Assign high prior probability to simple theories.
- Why should you?
- Preference for complexity has the same
explanation.
You presume simplicity Therefore you should
presume simplicity!
17Miracle Argument
- Simple data would be a miracle if a complex
theory were true (Bayes, BIC, Putnam).
18Begs the Question
- Fairness between theories ?
- bias against complex worlds.
S
C
19Two Can Play That Game
- Fairness between worlds ?
- bias against simple theory.
S
C
20Convergence
- At least a simplicity bias doesnt prevent
convergence to the truth (MDL, BIC, Bayes, SGS,
etc.). - Neither do other biases.
- May as well recommend flat tires since they can
be fixed.
O
P
O
P
L
L
M
M
E
E
O
I
X
C
S
21Does Ockham Have No Frock?
Ash Heap of History
Philosophers stone, Perpetual motion, Free
lunch Ockhams Razor???
. . .
22II. How Ockham Helps You Find the Truth
23What is Guidance?
- Indication or tracking
- Too strong
- Fixed bias cant indicate anything
- Convergence
- Too weak
- True of other biases
- Straightest convergence
- Just right?
C
S
S
C
C
S
24A True Story
Niagara Falls
Clarion
Pittsburgh
25A True Story
Niagara Falls
Clarion
Pittsburgh
26A True Story
Niagara Falls
Clarion
Pittsburgh
27A True Story
Niagara Falls
!
Clarion
Pittsburgh
28A True Story
Niagara Falls
Clarion
Pittsburgh
29A True Story
?
30A True Story
31A True Story
Ask directions!
32A True Story
Wheres
33What Does She Say?
Turn around. The freeway ramp is on the left.
34 You Have Better Ideas
Phooey! The Sun was on the right!
35You Have Better Ideas
!!
36You Have Better Ideas
37You Have Better Ideas
38You Have Better Ideas
39Stay the Course!
Ahem
40Stay the Course!
41Stay the Course!
42Dont Flip-flop!
43Dont Flip-flop!
44Dont Flip-flop!
45Then Again
46Then Again
47Then Again
_at__at_!
48One Good Flip Can Save a Lot of Flop
49The U-Turn
50The U-Turn
51The U-Turn
52The U-Turn
53The U-Turn
54The U-Turn
55The U-Turn
Told ya!
56The U-Turn
57The U-Turn
58The U-Turn
59The U-Turn
60The U-Turn
61Your Route
Needless U-turn
62The Best Route
Told ya!
63The Best Route Anywhere from There
Told ya!
64The Freeway to the Truth
Told ya!
- Fixed advice for all destinations
- Disregarding it entails an extra course reversal
65The Freeway to the Truth
Told ya!
- even if the advice points away from the goal!
66Counting Marbles
67Counting Marbles
68Counting Marbles
May come at any time
69Counting Marbles
May come at any time
70Counting Marbles
May come at any time
71Counting Marbles
May come at any time
72Counting Marbles
May come at any time
73Counting Marbles
May come at any time
74Counting Marbles
May come at any time
75Ockhams Razor
- If you answer, answer with the current count.
3
?
76Analogy
- Marbles detectable effects.
- Late appearance difficulty of detection.
- Count model (e.g., causal graph).
- Appearance times free parameters.
77Analogy
- U-turn model revision (with content loss)
- Highway revision-efficient truth-finding
method.
T
T?
78The U-turn Argument
- Suppose you converge to the truth but
- violate Ockhams razor along the way.
3
79The U-turn Argument
- Where is that extra marble, anyway?
3
80The U-turn Argument
3
81The U-turn Argument
- If you never say 2 youll never converge to the
truth.
3
82The U-turn Argument
- Thats it. You should have listened to Ockham.
3
2
2
2
83The U-turn Argument
- Oops! Well, no method is infallible!
3
2
2
2
84The U-turn Argument
- If you never say 3, youll never converge to the
truth.
3
2
2
2
85The U-turn Argument
- Embarrassing to be back at that old theory, eh?
3
2
2
2
3
86The U-turn Argument
3
2
2
2
3
4
87The U-turn Argument
3
2
2
2
3
4
5
88The U-turn Argument
3
2
2
2
3
4
5
6
89The U-turn Argument
3
2
2
2
3
4
5
6
7
90The Score
Subproblem
3
2
2
2
3
4
5
6
7
91The Score
Subproblem
2
2
2
3
4
5
6
7
92Ockham is Necessary
- If you converge to the truth,
- and
- you violate Ockhams razor
- then
- some convergent method beats your worst-case
revision bound in each answer in the subproblem
entered at the time of the violation.
93Ockham is Sufficient
- If you converge to the truth,
- and
- you never violate Ockhams razor
- then
- You achieve the worst-case revision bound of each
convergent solution in each answer in each
subproblem.
94Efficiency
- Efficiency achievement of the best worst-case
revision bound in each answer in each subproblem.
95Ockham Efficiency Theorem
- Among the convergent methods
- Ockham Efficient!
Efficient
Inefficient
96Mixed Strategies
- mixed strategy chance of output depends only on
actual experience. - convergence in probability chance of producing
true answer approaches 1 in the limit. - efficiency achievement of best worst-case
expected revision bound in each answer in each
subproblem.
97Ockham Efficiency Theorem
- Among the mixed methods that converge in
probability - Ockham Efficient!
Efficient
Inefficient
98Dominance and Support
- Every convergent method is weakly dominated in
revisions by a clone who says ? until stage n. - Convergence Must leap eventually.
- Efficiency Only leap to simplest.
- Dominance Could always wait longer.
Cant wait forever!
99III. Ockham on Steroids
100Ockham Wish List
- General definition of Ockhams razor.
- Compare revisions even when not bounded within
answers. - Prove theorem for arbitrary empirical problems.
101Empirical Problems
- Problem partition of a topological space.
- Potential answers partition cells.
- Evidence open (verifiable) propositions.
Example Symmetry
102Example Parameter Freeing
- Euclidean topology.
- Say which parameters are zero.
- Evidence open neighborhood.
a1
a1 0 a2 0
a1 gt 0 a2 0
a1 gt 0 a2 gt 0
a1 0 a2 gt 0
a2
0
103The Players
- Scientist
- Produces an answer in response to current
evidence. - Demon
- Chooses evidence in response to scientists
choices
104Winning
- Scientist wins
- by default if demon doesnt present an infinite
nested sequence of basic open sets whose
intersection is a singleton. - else by merit if scientist eventually always
produces the true answer for world selected by
demons choices.
105Comparing Revisions
- One answer sequence maps into another iff
- there is an order and answer-preserving map from
the first to the second (? is wild). - Then the revisions of first are as good as those
of the second.
. . .
?
?
?
?
?
. . .
106Comparing Revisions
- The revisions of the first are strictly better
if, in addition, the latter doesnt map back into
the former.
. . .
?
?
?
?
?
. . .
?
107Comparing Methods
- F is as good as G iff
- each output sequence of F is as good as some
output sequence of G.
F
as good as
G
108Comparing Methods
- F is better than G iff
- F is as good as G and
- G is not as good as F
F
not as good as
G
109Comparing Methods
- F is strongly better than G iff each output
sequence of F is strictly better than an output
sequence of G but
strictly better than
110Comparing Methods
- no output sequence of G is as good as any of F.
not as good as
111Terminology
- Efficient solution as good as any solution in
any subproblem.
112What Simplicity Isnt
Only by accident!!
- Syntactic length.
- Data-compression (MDL).
- Computational ease.
- Social entrenchment (Goodman).
- Number of free parameters (BIC, AIC).
- Euclidean dimensionality
113What Simplicity Is
- Simpler theories are compatible with deeper
problems of induction.
Worst demon
Smaller demon
114Problem of Induction
- No true information entails the true answer.
- Happens in answer boundaries.
115Demonic Paths
A demonic path from w is a sequence of
alternating answers that a demon can force an
arbitrary convergent method through starting from
w.
01234
116Simplicity Defined
The A-sequences are the demonic sequences
beginning with answer A. A is as simple as B iff
each B-sequence is as good as some A-sequence.
2, 3 2, 3, 4 2, 3, 4, 5
lt lt lt
3 3, 4 3, 4, 5
. . .
So 2 is simpler than 3!
117Ockham Answer
- An answer as simple as any other answer.
- number of observed particles.
2, , n 2, , n, n1 2, , n, n1, n2
lt lt lt
n n, n1 n, n1, n2
. . .
So 2 is Ockham!
118Ockham Lemma
A is Ockham iff for all demonic p, (Ap) some
demonic sequence.
I can force you through 2 but not through 3,2.
So 3 isnt Ockham
3
119Ockham Answer
E.g. Only simplest curve compatible with data is
Ockham.
a1
Demonic sequence
Non-demonic sequences
a2
0
120General Efficiency Theorem
- If the topology is metrizable and separable and
the question is countable then - Ockham Efficient.
- Proof uses Martins Borel Determinacy theorem.
121Stacked Problems
- There is an Ockham answer at every stage.
1
122Non-Ockham ? Strongly Worse
- If the problem is a stacked countable partition
over a restricted Polish space - Each Ockham solution is strongly better than each
non-Ockham solution in the subproblem entered at
the time of the violation.
123Simplicity ? Low Dimension
- Suppose God says the true parameter value is
rational.
124Simplicity ? Low Dimension
- Topological dimension and integration theory
dissolve. - Does Ockham?
125Simplicity ? Low Dimension
- The proposed account survives in the preserved
limit point structure.
126IV. Ockham and Symmetry
127Respect for Symmetry
- If several simplest alternatives are available,
dont break the symmetry.
- Count the marbles of each color.
- You hear the first marble but dont see it.
- Why red rather than green?
128Respect for Symmetry
- Before the noise, (0, 0) is Ockham.
- After the noise, no answer is Ockham
Demonic
Non-demonic
(0, 0)
(1, 0)
(1, 0) (0, 1)
(0, 1)
(0, 1) (1, 0)
Right!
129Goodmans Riddle
- Count oneicles--- a oneicle is a particle at any
stage but one, when it is a non-particle. - Oneicle tranlation is auto-homeomorphism that
does not preserve the problem. - Unique Ockham answer is current oneicle count.
- Contradicts unique Ockham answer in particle
counting.
130Supersymmetry
- Say when each particle appears.
- Refines counting problem.
- Every auto-homeomorphism preserves problem.
- No answer is Ockham.
- No solution is Ockham.
- No method is efficient.
131Dual Supersymmetry
- Say only whether particle count is even or odd.
- Coarsens counting problem.
- Particle/Oneicle auto-homeomorphism preserves
problem. - Every answer is Ockham.
- Every solution is Ockham.
- Every solution is efficient.
132Broken Symmetry
- Count the even or just report odd.
- Coarsens counting problem.
- Refines the even/odd problem.
- Unique Ockham answer at each stage.
- Exactly Ockham solutions are efficient.
133Simplicity Under Refinement
Supersymmetry No answer is Ockham
Time of particle appearance
Particle counting
Oneicle counting
Twoicle counting
Broken symmetry Unique Ockham answer
Particle counting or odd particles
Oneicle counting or odd oneicles
Twoicle counting or odd twoicles
Dual supersymmetry Both answers are Ockham
Even/odd
134Proposed Theory is Right
- Objective efficiency is grounded in problems.
- Symmetries in the preceding problems would wash
out stronger simplicity distinctions. - Hence, such distinctions would amount to mere
conventions (like coordinate axes) that couldnt
have anything to do with objective efficiency.
135Furthermore
- If Ockhams razor is forced to choose in the
supersymmetrical problems then either - following Ockhams razor increases revisions in
some counting problems - Or
- Ockhams razor leads to contradictions as a
problem is coarsened or refined.
136V. Conclusion
137What Ockhams Razor Is
- Only output Ockham answers
- Ockham answer a topological invariant of the
empirical problem addressed.
138What it Isnt
- preference for
- brevity,
- computational ease,
- entrenchment,
- past success,
- Kolmogorov complexity,
- dimensionality, etc.
139How it Works
- Ockhams razor is necessary for mininizing
revisions prior to convergence to the truth.
140How it Doesnt
- No possible method could
- Point at the truth
- Indicate the truth
- Bound the probability of error
- Bound the number of future revisions.
141Spooky Ockham
- Science without support or safety nets.
142Spooky Ockham
- Science without support or safety nets.
143Spooky Ockham
- Science without support or safety nets.
144Spooky Ockham
- Science without support or safety nets.
145VI. Stochastic Ockham
146Mixed Strategies
- mixed strategy chance of output depends only on
actual experience.
e
Pe(M H at n) Pen(M H at n).
147Stochastic Case
- Ockham
- at each stage, you produce a non-Ockham answer
with prob 0. - Efficiency
- achievement of the best worst-case expected
revision bound in each answer in each subproblem
over all methods that converge to the truth in
probability.
148Stochastic Efficiency Theorem
- Among the stochastic methods that converge in
probability, Ockham Efficient!
Efficient
Inefficient
149Stochastic Methods
- Your chance of producing an answer is a function
of observations made so far.
2
p
Urn selected in light of observations.
150Stochastic U-turn Argument
- Suppose you converge in probability to the truth
but produce a non-Ockham answer with prob gt 0.
3
r gt 0
151Stochastic U-turn Argument
- Choose small e gt 0. Consider answer 4.
3
r gt 0
152Stochastic U-turn Argument
- By convergence in probability to the truth
3
r gt 0
2
p gt 1 - e/3
153Stochastic U-turn Argument
3
r gt 0
2
3
4
pgt 1-e/3
p gt 1-e/3
p gt 1-e/3
154Stochastic U-turn Argument
- Since e can be chosen arbitrarily small,
- sup prob of 3 revisions r.
- sup prob of 2 revisions 1
3
r gt 0
2
3
4
pgt 1-e/3
p gt 1-e/3
p gt 1-e/3
155Stochastic U-turn Argument
- So sup Exp revisions is 2 3r.
- But for Ockham 2.
3
r gt 0
2
3
4
pgt 1-e/3
p gt 1-e/3
p gt 1-e/3
Subproblem
156VII. Statistical Inference
(Beta Version)
157The Statistical Puzzle of Simplicity
- Assume Normal distribution, s 1, m? 0.
- Question m? 0 or m?gt 0 ?
- Intuition m? 0 is simpler than m?gt 0 .
m 0
mean
158Analogy
- Marbles potentially small effects
- Time sample size
- Simplicity fewer free parameters tied to
potential effects - Counting freeing parameters in a model
159U-turn in Probability
- Convergence in probability chance of producing
true model goes to unity
- Retraction in probability chance of producing a
model drops from above r gt .5 to below 1 r.
1
r
Chance of producing true model
Chance of producing alternative model
1 - r
0
Sample size
160Suppose You (Probably) Choose a Model More
Complex than the Truth
m 0
mean
m gt 0
Revision Counter 0
gt r
sample mean
zone for choosing m gt 0
161Eventually You Retract to the Truth (In
Probability)
m 0
mean
Revision Counter 1
gt r
sample mean
zone for choosing m 0
162So You (Probably) Output an Overly Simple Model
Nearby
m gt 0
mean
Revision Counter 1
gt r
sample mean
zone for choosing m 0
163Eventually You Retract to the Truth (In
Probability)
m gt 0
mean
Revision Counter 2
gt r
sample mean
zone for choosing m gt 0
164But Standard (Ockham) Testing Practice Requires
Just One Retraction!
m 0
mean
Revision Counter 0
gt r
sample mean
zone for choosing m 0
165In The Simplest World, No Retractions
m 0
mean
Revision Counter 0
gt r
sample mean
zone for choosing m 0
166In The Simplest World, No Retractions
m 0
mean
Revision Counter 0
gt r
sample mean
zone for choosing m 0
167In Remaining Worlds, at Most One Retraction
m gt 0
mean
Revision Counter 0
gt r
zone for choosing m gt 0
168In Remaining Worlds, at Most One Retraction
m gt 0
mean
Revision Counter 0
zone for choosing m gt 0
169In Remaining Worlds, at Most One Retraction
m gt 0
mean
Revision Counter 1
gt r
zone for choosing m gt 0
170So Ockham Beats All Violators
- Ockham at most one revision.
- Violator at least two revisions in worst case
171Summary
- Standard practice is to test the point
hypothesis rather than the composite alternative. - This amounts to favoring the simple hypothesis
a priori. - It also minimizes revisions in probability!
172Two Dimensional Example
- Assume independent bivariate normal distribution
of unit variance. - Question how many components of the joint mean
are zero? - Intuition more nonzeros more complex
- Puzzle How does it help to favor simplicity in
less-than-simplest worlds?
173A Real Model Selection Method
- Bayes Information Criterion (BIC)
- BIC(M, sample)
- - log(max prob that M can assign to sample)
- log(sample size) ?? model complexity ? ½.
- BIC method choose M with least BIC score.
174Official BIC Property
- In the limit, minimizing BIC finds a model with
maximal conditional probability when the prior
probability is flat over models and fairly flat
over parameters within a model. - But it is also revision-efficient.
175AIC in Simplest World
Simple
Complex
176AIC in Simplest World
Simple
Complex
177AIC in Simplest World
Simple
Complex
178BIC in Simplest World
Simple
Complex
179BIC in Simplest World
Simple
Complex
180BIC in Simplest World
Simple
Complex
181BIC in Simplest World
Simple
Complex
182Performance in Complex World
Simple
Complex
95
183Performance in Complex World
Simple
Complex
184Performance in Complex World
Simple
Complex
185Performance in Complex World
- n 4,000,000 (!)
- m (.05, .005).
Simple
Complex
186Question
- Does the statistical retraction minimization
story extend to violations in less-than-simplest
worlds? - Recall that the deterministic argument for higher
retractions required the concept of minimizing
retractions in each subproblem. - A subproblem is a proposition verified at a
given time in a given world. - Some analogue in probability is required.
187Subproblem.
- H is an a -subroblem in w at n
- There is a likelihood ratio test of w at
significance lt a such that this test has power lt
1 - a at each world in H.
worlds
H
w
sample size n
gt 1- a
gt a
gt a
reject
reject
accept
188Significance Schedules
- A significance schedule a(.) is a monotone
decreasing sequence of significance levels
converging to zero that drop so slowly that power
can be increased monotonically with sample size.
n1
n
a(n1)
a(n)
189Ockham Violation ? Inefficient
Subproblem At sample size n
(mX, mY)
190Ockham Violation ? Inefficient
Subproblem At sample size n
(mX, mY)
Ockham violation Probably say blue hypothesis at
white world (p gt r)
191Ockham Violation ? Inefficient
Subproblem at time of violation
(mX, mY)
Probably say blue
Probably say white
192Ockham Violation ? Inefficient
Subproblem at time of violation
(mX, mY)
Probably say blue
Probably say white
193Ockham Violation ? Inefficient
Subproblem at time of violation
(mX, mY)
Probably say blue
Probably say white
Probably say blue
194Oops! Ockham ? Inefficient
(mX, mY)
Subproblem
195Oops! Ockham ? Inefficient
(mX, mY)
Subproblem
196Oops! Ockham ? Inefficient
(mX, mY)
Subproblem
197Oops! Ockham ? Inefficient
(mX, mY)
Subproblem
198Oops! Ockham ? Inefficient
(mX, mY)
Subproblem
Two retractions
199Local Retraction Efficiency
- Ockham does as well as best subproblem
performance in some neighborhood of w.
(mX, mY)
Subproblem
At most one retraction
Two retractions
200Ockham Violation ? Inefficient
- Note no neighborhood around w avoids extra
retractions.
Subproblem at time of violation
(mX, mY)
w
201Gonzo Ockham ? Inefficient
- Gonzo probably saying simplest answer in entire
subproblem entered in simplest world.
(mX, mY)
Subproblem
202Balance
- Be Ockham (avoid complexity)
- Dont be Gonzo Ockham (avoid bad fit).
- Truth-directed sole aim is to find true model
with minimal revisions! - No circles totally worst-case no prior bias
toward simple worlds.
203THE END