Title: No Free Lunch (NFL) Theorem
1No Free Lunch (NFL)Theorem
Presentation by Kristian Nolde
Many slides are based on a presentation of Y.C. Ho
2General notes
- Goal
- Give an intuitive feeling for the NFL
- Present some mathemtical background
- To keep in mind
- NFL is an impossibility theorem, such as
- Gödels proof in mathematics (roughly some facts
cannot be proved or disaproved in any
mathematical system) - Arrows theorem in economics (in principle,
perfect democracy is not realizable) - Thus, practicle use is limited ?!?
3The No Free Lunch Theorem
- Without specific structural assumptions, no
optimization scheme can perform better than blind
search on the average - But blind search is very inefficient!
- Prob (at least one out of N samples is in the
top-n for search space of size Q) nN/Q
ex. Prob0.0001 for Q109, n1000, N1000
4Assume a finite World
Finite of input symbols (xs) and finite of
output symbols (ys) gt finite of possible
mappings from input to output (fs)
5The Fundamental Matrix F
In each row, each value of Y appear Y X-1
times!
FACT equal number of 0s and 1s in each row!
Averaged over all f, the value is independent of
x!
6Compare Algorithms
- Think of two algorithms a1 and a2 e.g. a1
always selects from x1 to x.5X - a2 always selects from x.5X to xX
- For specific f a1 or a2 may be bettter. However,
if f is not known average performance of both is
equal -
- where d is a sample and dy is the cot value
associated with d.
7Comparing Algorithms Continued
- Case 1 Algorithms can be more specific, e.g.
assume a certain realization fk, a1 - Case 2 Or, they can be more general, assume more
uniform distribution of possible f, a2. - Then performance of a1 will be excellent for fk
but catastrophic for all other cases (great
performance, no robustness) - Contrary, a2 performs mediocre for all cases, but
doesnt fail (poor performance, high robustness) - Common Sense says
- Robustness Efficiency Constant
- or Generality Depth Constant
8Implication 1
- Let x be the optimization variable, f the
performance function, and y the performance,
i.e., yf(x) - then averaged over all possible optimization
problems, the result is choice independent - if you dont know the structure of f (which
column you are dealing with), blind choice is as
good as any!
9Implications 2
- Let X be the strategy (control law, decision
rule) space decisionsinformation, f the
performance function, and y the performance,
i.e., yf(x) - Same conclusion for stochastic optimal control,
adaptive control, decision theory, game theory,
learning control, etc. - A goodalgorithm must be qualified!
10Implications 2
- Let X be the space of all possible representation
(as in genetic algorithms), or space of all
possible algorithms to apply to a class of
problems - Without understanding of the problem, blind
choice is as good as any. - understanding means you know which column of
the F matrix you are dealing with
11Implications 3
- Even if you know which columns or group of
columns you are dealing with gt you can
specialize the choice of rows - You must accept that you will suffer LOSSES
should other choices of column occur due to
uncertainties or disturbances
12The Fundamental Matrix F
Assume a distribution of the columns, then pick a
row that results in minimal expected losses or
maximal performance. This is stochastic
optimization
13Implications 5
- Worse, if you should estimate the probabilities
incorrectly, then your stochastically optimized
solution may suffer catastrophic bad outcomes
more frequent then you like. - Reason you have already used up more of the good
outcomes in your optimal choice. What are left
are bad ones that are not suppose to occur! (HOT
Design power law -Doyle)
14Implications 6
- Generality for generality sake is not very
fruitful - Working on a specific problem can be rewarding
- Because
- the insight can be generalized
- the problem is practically important
- the 80-20 effect