Ockham

About This Presentation

Title:

Ockham

Description:

Niagara Falls. Pittsburgh. Clarion. A True Story. Niagara ... Niagara Falls. Pittsburgh. A True Story. A True Story. A True Story. Ask directions! A True Story ... – PowerPoint PPT presentation

Number of Views:75

Avg rating:3.0/5.0

Slides: 204

Provided by: kevint84

Learn more at: https://www.andrew.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Ockham

1
Ockhams Razor What it is, What it isnt, How
it works, and How it doesnt

Kevin T. Kelly
Department of Philosophy
Carnegie Mellon University
www.cmu.edu

2
Further Reading
Efficient Convergence Implies Ockham's Razor,
Proceedings of the 2002 International Workshop
on Computational Models of Scientific Reasoning
and Applications, Las Vegas, USA, June 24- 27,
2002. (with C. Glymour) Why Probability Does Not
Capture the Logic of Scientific Justification,
C. Hitchcock, ed., Contemporary Debates in the
Philosophy of Science, Oxford Blackwell,
2004. Justification as Truth-finding Efficiency
How Ockham's Razor Works, Minds and Machines
14 2004, pp. 485-505. Learning, Simplicity,
Truth, and Misinformation, The Philosophy of
Information, under review. Ockham's Razor,
Efficiency, and the Infinite Game of Science,
proceedings, Foundations of the Formal Sciences
2004 Infinite Game Theory, Springer, under
review.
3
Which Theory to Choose?
Compatible with data
???
4
Use Ockhams Razor
Complex
T1
T2
T3
Simple
5
Dilemma

If you know the truth is simple,
then you dont need Ockham.

Complex
T1
T2
T3
Simple
6
Dilemma

If you dont know the truth is simple,
then how could a fixed simplicity bias help you
if the truth is complex?

Complex
T1
T2
T3
Simple
T4
T5
7
Puzzle

A fixed bias is like a broken thermometer.
How could it possibly help you find unknown truth?

Cold!
8
I. Ockham Apologists
9
Wishful Thinking

Simple theories are nice if true
Testability
Unity
Best explanation
Aesthetic appeal
Compress data
So is believing that you are the emperor.

10
Overfitting

Maximum likelihood estimates based on overly
complex theories can have greater predictive
error (AIC, Cross-validation, etc.).
Same is true even if you know the true model is
complex.
Doesnt converge to true model.
Depends on random data.

Thanks, but a simpler model still has lower
predictive error.
The truth is complex. -God-
.
.
.
.
11
Ignorance Knowledge

Messy worlds are legion
Tidy worlds are few.
That is why the tidy worlds
Are those most likely true. (Carnap)

unity
12
Ignorance Knowledge
Messy worlds are legion Tidy worlds are
few. That is why the tidy worlds Are those most
likely true. (Carnap)
1/3
1/3
1/3
13
Ignorance Knowledge

Messy worlds are legion
Tidy worlds are few.
That is why the tidy worlds
Are those most likely true. (Carnap)

2/6
2/6
1/6
1/6
14
Depends on Notation

But mess depends on coding,
which Goodman noticed, too.
The picture is inverted if
we translate green to grue.

2/6
2/6
Notation Indicates truth?
1/6
1/6
15
Same for Algorithmic Complexity

Goodmans problem works against every fixed
simplicity ranking (independent of the processes
by which data are generated and coded prior to
learning).
Extra problem any pair-wise ranking of theories
can be reversed by choosing an alternative
computer language.
So how could simplicity help us find the true
theory?

Notation Indicates truth?
16
Just Beg the Question

Assign high prior probability to simple theories.
Why should you?
Preference for complexity has the same
explanation.

You presume simplicity Therefore you should
presume simplicity!
17
Miracle Argument

Simple data would be a miracle if a complex
theory were true (Bayes, BIC, Putnam).

18
Begs the Question

Fairness between theories ?
bias against complex worlds.

S
C
19
Two Can Play That Game

Fairness between worlds ?
bias against simple theory.

S
C
20
Convergence

At least a simplicity bias doesnt prevent
convergence to the truth (MDL, BIC, Bayes, SGS,
etc.).
Neither do other biases.
May as well recommend flat tires since they can
be fixed.

O
P
O
P
L
L
M
M
E
E
O
I
X
C
S
21
Does Ockham Have No Frock?
Ash Heap of History
Philosophers stone, Perpetual motion, Free
lunch Ockhams Razor???
. . .
22
II. How Ockham Helps You Find the Truth
23
What is Guidance?

Indication or tracking
Too strong
Fixed bias cant indicate anything
Convergence
Too weak
True of other biases
Straightest convergence
Just right?

C
S
S
C
C
S
24
A True Story
Niagara Falls
Clarion
Pittsburgh
25
A True Story
Niagara Falls
Clarion
Pittsburgh
26
A True Story
Niagara Falls
Clarion
Pittsburgh
27
A True Story
Niagara Falls
!
Clarion
Pittsburgh
28
A True Story
Niagara Falls
Clarion
Pittsburgh
29
A True Story
?
30
A True Story
31
A True Story
Ask directions!
32
A True Story
Wheres
33
What Does She Say?
Turn around. The freeway ramp is on the left.
34
You Have Better Ideas
Phooey! The Sun was on the right!
35
You Have Better Ideas
!!
36
You Have Better Ideas
37
You Have Better Ideas
38
You Have Better Ideas
39
Stay the Course!
Ahem
40
Stay the Course!
41
Stay the Course!
42
Dont Flip-flop!
43
Dont Flip-flop!
44
Dont Flip-flop!
45
Then Again
46
Then Again
47
Then Again
_at__at_!
48
One Good Flip Can Save a Lot of Flop
49
The U-Turn
50
The U-Turn
51
The U-Turn
52
The U-Turn
53
The U-Turn
54
The U-Turn
55
The U-Turn
Told ya!
56
The U-Turn
57
The U-Turn
58
The U-Turn
59
The U-Turn
60
The U-Turn
61
Your Route
Needless U-turn
62
The Best Route
Told ya!
63
The Best Route Anywhere from There
Told ya!
64
The Freeway to the Truth
Told ya!

Fixed advice for all destinations
Disregarding it entails an extra course reversal

65
The Freeway to the Truth
Told ya!

even if the advice points away from the goal!

66
Counting Marbles
67
Counting Marbles
68
Counting Marbles
May come at any time
69
Counting Marbles
May come at any time
70
Counting Marbles
May come at any time
71
Counting Marbles
May come at any time
72
Counting Marbles
May come at any time
73
Counting Marbles
May come at any time
74
Counting Marbles
May come at any time
75
Ockhams Razor

If you answer, answer with the current count.

3
?
76
Analogy

Marbles detectable effects.
Late appearance difficulty of detection.
Count model (e.g., causal graph).
Appearance times free parameters.

77
Analogy

U-turn model revision (with content loss)
Highway revision-efficient truth-finding
method.

T
T?
78
The U-turn Argument

Suppose you converge to the truth but
violate Ockhams razor along the way.

3
79
The U-turn Argument

Where is that extra marble, anyway?

3
80
The U-turn Argument

Its not coming, is it?

3
81
The U-turn Argument

If you never say 2 youll never converge to the
truth.

3
82
The U-turn Argument

Thats it. You should have listened to Ockham.

3
2
2
2
83
The U-turn Argument

Oops! Well, no method is infallible!

3
2
2
2
84
The U-turn Argument

If you never say 3, youll never converge to the
truth.

3
2
2
2
85
The U-turn Argument

Embarrassing to be back at that old theory, eh?

3
2
2
2
3
86
The U-turn Argument

And so forth

3
2
2
2
3
4
87
The U-turn Argument

And so forth

3
2
2
2
3
4
5
88
The U-turn Argument

And so forth

3
2
2
2
3
4
5
6
89
The U-turn Argument

And so forth

3
2
2
2
3
4
5
6
7
90
The Score

Subproblem
3
2
2
2
3
4
5
6
7
91
The Score

Ockham

Subproblem
2
2
2
3
4
5
6
7
92
Ockham is Necessary

If you converge to the truth,
and
you violate Ockhams razor
then
some convergent method beats your worst-case
revision bound in each answer in the subproblem
entered at the time of the violation.

93
Ockham is Sufficient

If you converge to the truth,
and
you never violate Ockhams razor
then
You achieve the worst-case revision bound of each
convergent solution in each answer in each
subproblem.

94
Efficiency

Efficiency achievement of the best worst-case
revision bound in each answer in each subproblem.

95
Ockham Efficiency Theorem

Among the convergent methods
Ockham Efficient!

Efficient
Inefficient
96
Mixed Strategies

mixed strategy chance of output depends only on
actual experience.
convergence in probability chance of producing
true answer approaches 1 in the limit.
efficiency achievement of best worst-case
expected revision bound in each answer in each
subproblem.

97
Ockham Efficiency Theorem

Among the mixed methods that converge in
probability
Ockham Efficient!

Efficient
Inefficient
98
Dominance and Support

Every convergent method is weakly dominated in
revisions by a clone who says ? until stage n.
Convergence Must leap eventually.
Efficiency Only leap to simplest.
Dominance Could always wait longer.

Cant wait forever!
99
III. Ockham on Steroids
100
Ockham Wish List

General definition of Ockhams razor.
Compare revisions even when not bounded within
answers.
Prove theorem for arbitrary empirical problems.

101
Empirical Problems

Problem partition of a topological space.
Potential answers partition cells.
Evidence open (verifiable) propositions.

Example Symmetry
102
Example Parameter Freeing

Euclidean topology.
Say which parameters are zero.
Evidence open neighborhood.

Curve fitting

a1
a1 0 a2 0
a1 gt 0 a2 0
a1 gt 0 a2 gt 0
a1 0 a2 gt 0
a2
0
103
The Players

Scientist
Produces an answer in response to current
evidence.
Demon
Chooses evidence in response to scientists
choices

104
Winning

Scientist wins
by default if demon doesnt present an infinite
nested sequence of basic open sets whose
intersection is a singleton.
else by merit if scientist eventually always
produces the true answer for world selected by
demons choices.

105
Comparing Revisions

One answer sequence maps into another iff
there is an order and answer-preserving map from
the first to the second (? is wild).
Then the revisions of first are as good as those
of the second.

. . .
?
?
?
?
?
. . .
106
Comparing Revisions

The revisions of the first are strictly better
if, in addition, the latter doesnt map back into
the former.

. . .
?
?
?
?
?
. . .
?
107
Comparing Methods

F is as good as G iff
each output sequence of F is as good as some
output sequence of G.

F
as good as
G
108
Comparing Methods

F is better than G iff
F is as good as G and
G is not as good as F

F
not as good as
G
109
Comparing Methods

F is strongly better than G iff each output
sequence of F is strictly better than an output
sequence of G but

strictly better than
110
Comparing Methods

no output sequence of G is as good as any of F.

not as good as
111
Terminology

Efficient solution as good as any solution in
any subproblem.

112
What Simplicity Isnt
Only by accident!!

Syntactic length.
Data-compression (MDL).
Computational ease.
Social entrenchment (Goodman).
Number of free parameters (BIC, AIC).
Euclidean dimensionality

113
What Simplicity Is

Simpler theories are compatible with deeper
problems of induction.

Worst demon
Smaller demon
114
Problem of Induction

No true information entails the true answer.
Happens in answer boundaries.

115
Demonic Paths
A demonic path from w is a sequence of
alternating answers that a demon can force an
arbitrary convergent method through starting from
w.
01234
116
Simplicity Defined
The A-sequences are the demonic sequences
beginning with answer A. A is as simple as B iff
each B-sequence is as good as some A-sequence.
2, 3 2, 3, 4 2, 3, 4, 5
lt lt lt
3 3, 4 3, 4, 5
. . .
So 2 is simpler than 3!
117
Ockham Answer

An answer as simple as any other answer.
number of observed particles.

2, , n 2, , n, n1 2, , n, n1, n2
lt lt lt
n n, n1 n, n1, n2
. . .
So 2 is Ockham!
118
Ockham Lemma
A is Ockham iff for all demonic p, (Ap) some
demonic sequence.
I can force you through 2 but not through 3,2.
So 3 isnt Ockham
3
119
Ockham Answer
E.g. Only simplest curve compatible with data is
Ockham.
a1
Demonic sequence
Non-demonic sequences
a2
0
120
General Efficiency Theorem

If the topology is metrizable and separable and
the question is countable then
Ockham Efficient.
Proof uses Martins Borel Determinacy theorem.

121
Stacked Problems

There is an Ockham answer at every stage.

1
122
Non-Ockham ? Strongly Worse

If the problem is a stacked countable partition
over a restricted Polish space
Each Ockham solution is strongly better than each
non-Ockham solution in the subproblem entered at
the time of the violation.

123
Simplicity ? Low Dimension

Suppose God says the true parameter value is
rational.

124
Simplicity ? Low Dimension

Topological dimension and integration theory
dissolve.
Does Ockham?

125
Simplicity ? Low Dimension

The proposed account survives in the preserved
limit point structure.

126
IV. Ockham and Symmetry
127
Respect for Symmetry

If several simplest alternatives are available,
dont break the symmetry.

Count the marbles of each color.
You hear the first marble but dont see it.
Why red rather than green?

128
Respect for Symmetry

Before the noise, (0, 0) is Ockham.
After the noise, no answer is Ockham

Demonic
Non-demonic
(0, 0)
(1, 0)
(1, 0) (0, 1)
(0, 1)
(0, 1) (1, 0)
Right!
129
Goodmans Riddle

Count oneicles--- a oneicle is a particle at any
stage but one, when it is a non-particle.
Oneicle tranlation is auto-homeomorphism that
does not preserve the problem.
Unique Ockham answer is current oneicle count.
Contradicts unique Ockham answer in particle
counting.

130
Supersymmetry

Say when each particle appears.
Refines counting problem.
Every auto-homeomorphism preserves problem.
No answer is Ockham.
No solution is Ockham.
No method is efficient.

131
Dual Supersymmetry

Say only whether particle count is even or odd.
Coarsens counting problem.
Particle/Oneicle auto-homeomorphism preserves
problem.
Every answer is Ockham.
Every solution is Ockham.
Every solution is efficient.

132
Broken Symmetry

Count the even or just report odd.
Coarsens counting problem.
Refines the even/odd problem.
Unique Ockham answer at each stage.
Exactly Ockham solutions are efficient.

133
Simplicity Under Refinement
Supersymmetry No answer is Ockham
Time of particle appearance
Particle counting
Oneicle counting
Twoicle counting
Broken symmetry Unique Ockham answer
Particle counting or odd particles
Oneicle counting or odd oneicles
Twoicle counting or odd twoicles
Dual supersymmetry Both answers are Ockham
Even/odd
134
Proposed Theory is Right

Objective efficiency is grounded in problems.
Symmetries in the preceding problems would wash
out stronger simplicity distinctions.
Hence, such distinctions would amount to mere
conventions (like coordinate axes) that couldnt
have anything to do with objective efficiency.

135
Furthermore

If Ockhams razor is forced to choose in the
supersymmetrical problems then either
following Ockhams razor increases revisions in
some counting problems
Or
Ockhams razor leads to contradictions as a
problem is coarsened or refined.

136
V. Conclusion
137
What Ockhams Razor Is

Only output Ockham answers
Ockham answer a topological invariant of the
empirical problem addressed.

138
What it Isnt

preference for
brevity,
computational ease,
entrenchment,
past success,
Kolmogorov complexity,
dimensionality, etc.

139
How it Works

Ockhams razor is necessary for mininizing
revisions prior to convergence to the truth.

140
How it Doesnt

No possible method could
Point at the truth
Indicate the truth
Bound the probability of error
Bound the number of future revisions.

141
Spooky Ockham

Science without support or safety nets.

142
Spooky Ockham

Science without support or safety nets.

143
Spooky Ockham

Science without support or safety nets.

144
Spooky Ockham

Science without support or safety nets.

145
VI. Stochastic Ockham

146
Mixed Strategies

mixed strategy chance of output depends only on
actual experience.

e
Pe(M H at n) Pen(M H at n).
147
Stochastic Case

Ockham
at each stage, you produce a non-Ockham answer
with prob 0.
Efficiency
achievement of the best worst-case expected
revision bound in each answer in each subproblem
over all methods that converge to the truth in
probability.

148
Stochastic Efficiency Theorem

Among the stochastic methods that converge in
probability, Ockham Efficient!

Efficient
Inefficient
149
Stochastic Methods

Your chance of producing an answer is a function
of observations made so far.

2
p
Urn selected in light of observations.
150
Stochastic U-turn Argument

Suppose you converge in probability to the truth
but produce a non-Ockham answer with prob gt 0.

3
r gt 0
151
Stochastic U-turn Argument

Choose small e gt 0. Consider answer 4.

3
r gt 0
152
Stochastic U-turn Argument

By convergence in probability to the truth

3
r gt 0
2
p gt 1 - e/3
153
Stochastic U-turn Argument

Etc.

3
r gt 0
2
3
4
pgt 1-e/3
p gt 1-e/3
p gt 1-e/3
154
Stochastic U-turn Argument

Since e can be chosen arbitrarily small,
sup prob of 3 revisions r.
sup prob of 2 revisions 1

3
r gt 0
2
3
4
pgt 1-e/3
p gt 1-e/3
p gt 1-e/3
155
Stochastic U-turn Argument

So sup Exp revisions is 2 3r.
But for Ockham 2.

3
r gt 0
2
3
4
pgt 1-e/3
p gt 1-e/3
p gt 1-e/3
Subproblem
156
VII. Statistical Inference
(Beta Version)

157
The Statistical Puzzle of Simplicity

Assume Normal distribution, s 1, m? 0.
Question m? 0 or m?gt 0 ?
Intuition m? 0 is simpler than m?gt 0 .

m 0
mean
158
Analogy

Marbles potentially small effects
Time sample size
Simplicity fewer free parameters tied to
potential effects
Counting freeing parameters in a model

159
U-turn in Probability

Convergence in probability chance of producing
true model goes to unity

Retraction in probability chance of producing a
model drops from above r gt .5 to below 1 r.

1
r
Chance of producing true model
Chance of producing alternative model
1 - r
0
Sample size
160
Suppose You (Probably) Choose a Model More
Complex than the Truth
m 0
mean
m gt 0
Revision Counter 0
gt r
sample mean
zone for choosing m gt 0
161
Eventually You Retract to the Truth (In
Probability)
m 0
mean
Revision Counter 1
gt r
sample mean
zone for choosing m 0
162
So You (Probably) Output an Overly Simple Model
Nearby
m gt 0
mean
Revision Counter 1
gt r
sample mean
zone for choosing m 0
163
Eventually You Retract to the Truth (In
Probability)
m gt 0
mean
Revision Counter 2
gt r
sample mean
zone for choosing m gt 0
164
But Standard (Ockham) Testing Practice Requires
Just One Retraction!
m 0
mean
Revision Counter 0
gt r
sample mean
zone for choosing m 0
165
In The Simplest World, No Retractions
m 0
mean
Revision Counter 0
gt r
sample mean
zone for choosing m 0
166
In The Simplest World, No Retractions
m 0
mean
Revision Counter 0
gt r
sample mean
zone for choosing m 0
167
In Remaining Worlds, at Most One Retraction
m gt 0
mean
Revision Counter 0
gt r
zone for choosing m gt 0
168
In Remaining Worlds, at Most One Retraction
m gt 0
mean
Revision Counter 0
zone for choosing m gt 0
169
In Remaining Worlds, at Most One Retraction
m gt 0
mean
Revision Counter 1
gt r
zone for choosing m gt 0
170
So Ockham Beats All Violators

Ockham at most one revision.
Violator at least two revisions in worst case

171
Summary

Standard practice is to test the point
hypothesis rather than the composite alternative.
This amounts to favoring the simple hypothesis
a priori.
It also minimizes revisions in probability!

172
Two Dimensional Example

Assume independent bivariate normal distribution
of unit variance.
Question how many components of the joint mean
are zero?
Intuition more nonzeros more complex
Puzzle How does it help to favor simplicity in
less-than-simplest worlds?

173
A Real Model Selection Method

Bayes Information Criterion (BIC)
BIC(M, sample)
- log(max prob that M can assign to sample)
log(sample size) ?? model complexity ? ½.
BIC method choose M with least BIC score.

174
Official BIC Property

In the limit, minimizing BIC finds a model with
maximal conditional probability when the prior
probability is flat over models and fairly flat
over parameters within a model.
But it is also revision-efficient.

175
AIC in Simplest World

n 2
m (0, 0).

Retractions 0

Simple
Complex
176
AIC in Simplest World

n 100
m (0, 0).

Retractions 0

Simple
Complex
177
AIC in Simplest World

n 4,000,000
m (0, 0).

Retractions 0

Simple
Complex
178
BIC in Simplest World

n 2
m (0, 0).

Retractions 0

Simple
Complex
179
BIC in Simplest World

n 100
m (0, 0).

Retractions 0

Simple
Complex
180
BIC in Simplest World

n 4,000,000
m (0, 0).

Retractions 0

Simple
Complex
181
BIC in Simplest World

n 20,000,000
m (0, 0).

Retractions 0

Simple
Complex
182
Performance in Complex World

n 2
m (.05, .005).

Retractions 0

Simple
Complex
95
183
Performance in Complex World

n 100
m (.05, .005).

Retractions 0

Simple
Complex
184
Performance in Complex World

n 30,000
m (.05, .005).

Retractions 1

Simple
Complex
185
Performance in Complex World

n 4,000,000 (!)
m (.05, .005).

Retractions 2

Simple
Complex
186
Question

Does the statistical retraction minimization
story extend to violations in less-than-simplest
worlds?
Recall that the deterministic argument for higher
retractions required the concept of minimizing
retractions in each subproblem.
A subproblem is a proposition verified at a
given time in a given world.
Some analogue in probability is required.

187
Subproblem.

H is an a -subroblem in w at n
There is a likelihood ratio test of w at
significance lt a such that this test has power lt
1 - a at each world in H.

worlds
H
w
sample size n
gt 1- a
gt a
gt a
reject
reject
accept
188
Significance Schedules

A significance schedule a(.) is a monotone
decreasing sequence of significance levels
converging to zero that drop so slowly that power
can be increased monotonically with sample size.

n1
n
a(n1)
a(n)
189
Ockham Violation ? Inefficient
Subproblem At sample size n
(mX, mY)
190
Ockham Violation ? Inefficient
Subproblem At sample size n
(mX, mY)
Ockham violation Probably say blue hypothesis at
white world (p gt r)
191
Ockham Violation ? Inefficient
Subproblem at time of violation
(mX, mY)
Probably say blue
Probably say white
192
Ockham Violation ? Inefficient
Subproblem at time of violation
(mX, mY)
Probably say blue
Probably say white
193
Ockham Violation ? Inefficient
Subproblem at time of violation
(mX, mY)
Probably say blue
Probably say white
Probably say blue
194
Oops! Ockham ? Inefficient
(mX, mY)
Subproblem
195
Oops! Ockham ? Inefficient
(mX, mY)
Subproblem
196
Oops! Ockham ? Inefficient
(mX, mY)
Subproblem
197
Oops! Ockham ? Inefficient
(mX, mY)
Subproblem
198
Oops! Ockham ? Inefficient
(mX, mY)
Subproblem
Two retractions
199
Local Retraction Efficiency