Title: Integrating Probabilistic Modeling and Representation-Building
1Integrating Probabilistic Modeling and
Representation-Building
- Doctoral Thesis Proposal
- Moshe Looks
- March 2nd, 2006
2Outline
- Background
- Thesis
- Proposed Approach
- Proposed Goals
3Problems and Problem-Solving
- Levels of Analysis
- Pre-representational - how to describe the
problem as formalized input? - Post-representational - how to solve the
formal problem?
4Problems and Problem-Solving
- Hofstadter, 1985
- Knob creation - discovering novel values to
parameterize - Knob twiddling - adjusting the values of
existing parameters
5General Optimization
- Formal Representation
- Solution space S (e.g., 0,1n)
- Scoring function maps solutions to reals
- Solving the problem means maximizing the score
- To outperform enumeration and random sampling
- assume some knowledge of the space
6What Knowledge?
-
- Complete separability would be nice
- Near-decomposability (Simon, 1969) is more
realistic
Weaker Interactions
Stronger Interactions
7How to Exploit This?
- Separability Independence Assumptions
- Given a prior over the solution space
- Represented as a probability vector
- Sample solutions from the model (distribution)
- Update the model toward higher-scoring points
- Iterate...
- Baljua, 1994
- Works surprisingly well, even when the
assumptions dont hold completely - when the interactions are weak
- or there is little deception
8How to Exploit This?
- A known correct problem decomposition may be
incorporated into the model - Mühlenbein Manhig, 1998
- An unknown decomposition may be learned
- algorithms that adaptively learn such linkages
are termed competent - Optimization via probabilistic modeling is
surveyed in - Pelikan, Goldberg, Lobo, 1999
9The Bayesian Optimization Algorithm
- Represents problem decomposition as a Bayes Net
- learned greedily, via a network scoring metric
- Augmented in Hierarchical BOA
- BOA Bayesian Optimization Algorithm
- uses Bayes Nets with local structure
- allows smaller model-building steps
- leads to more accurate models
- restricted tournament replacement
- promotes diversity
- Robust and scalable results on problems with both
known and unknown decompositions - Pelikan Goldberg, 2003
10Decompositions Representations
- Competent adaptive optimization algorithms
- can overcome a poor choice of representation
- via problem decomposition
- Requires the existence of a problem decomposition
- compact
- satisficing
- in the model-space searched by the algorithm
11Decompositions Representations
- I propose extending methods such as hBOA to
domains where a compact decomposition does not
exist directly in the user-specified problem
12Representation-Building An Example
- Optimizing over strings (X1, X2, , Xn)
- A separate distribution maintained for each xi
- What if there is positional ambiguity?
- Some features refer to absolute position, some do
not - E.g., DNA - a gene's positions is sometimes
critical, and sometimes irrelevant - Consider abstracted features, defined in terms of
"base-level variables" (Xis) - E.g., contains a prime number of ones
- E.g., does not contain the substring AATGC
- Model-based instance generation (sampling) must
be generalized to accommodate features
13Representation-Building An Example
- Exploit background knowledge to choose effective
feature-classes - E.g., motifs (variable-position substrings)
- motifs may be prespecified
- or learned via information-theoretic criteria
- Demonstrated performance gains with learned
motifs (with respect to the BOA) - Looks, 2006 (in submission)
14Representation-Building - Observations
- A superior decomposition may exist that cannot be
compactly represented - Generalize the representational language?
- Computationally intractable!
- Representation-building mechanisms
- Tractable if they incorporate inductive bias
- Goal is to provide salient parameters to the
optimization algorithm
15Learning Open-Ended Hierarchical Structures
- User selects (pre-representationally)
- a set of functions
- E.g., , -, , log, sin
- a set of terminals
- E.g., x, y, z, 0, 1
- a scoring function over trees
- Decrease pre-representational effort
- Solution structure and content must both be
learned - Claim
- Representation-building is thus correspondingly
more instrumental in finding a compact problem
decomposition
16Current Evolutionary Approaches
- Genetic Programming (GP)
- Koza, 1992
- Many variants
- Population-based search with new instances
generated via - swapping of subtrees (crossover)
- random insertions/deletions/modifications
(mutation)
17Current Evolutionary Approaches
- Probabilistic model building approaches without
decomposition-learning - Probabilistic Incremental Program Evolution
- Salustowicz Schmidhuber, 1997
- Hierarchical generalization, 1998
- Based on absolute tree-position (address from the
root) - Assumes complete independence
- Estimation-of-Distribution Programming
- Yanai Iba, 2003
- Assumes a fixed network of dependency
relationships
18Current Evolutionary Approaches
- Probabilistic model building approaches with
decomposition-learning - Grammar-learning methods
- Shan et al., 2004
- Bosman de Jong, 2004
- Based on relative tree-position
- Methods from competent optimization algorithms
- Extended Compact Genetic Programming
- Sastry Goldberg, 2003
- Bayesian-Optimization-Algorithm Programming
- Looks, Goertzel, Pennachin, 2005
19Claim
- Compact problem decompositions rarely exist for
non-trivial problems with generic representation
of general expressions - generic representation
- E.g., trees
- E.g., grammars
- general expressions
- E.g., Boolean formulae
- E.g., symbolic equations
- E.g., finite automatons
20Justification
- Solution scores are assumed to only vary based on
semantics - Determining (semantic) equivalence of general
expressions is NP-hard! - Says nothing about approx. decompositions
- However, a compact decomposition derived from a
generic representation is still implausible - assuming no knowledge of semantics
- and no explicit computational effort towards
specialized representational reduction
21Thesis
- General expressions may be organized so that
compact decompositions may often be found for
non-trivial problems, via representation-building - Representation-building will require
- knowledge of semantics (i.e., domain knowledge)
- explicit computational effort towards
representational reduction - Comparable to the notion of a heuristic solver
for a NP-hard problems
22Meta-Adaptive Programming (MAP)
- Generate a random population of trees
- Select promising trees from the population for
modeling - Build a parameterized representation of these
trees, and transform them into parameter
assignments - Model these assignments using a Bayesian network
with local structure to discover the problem
decomposition - Sample the model to generate new parameter
assignments, apply the inverse transformation to
convert them into trees, and integrate them into
the population - Go to step 2.
23Constructing a Parameterized Representation
24Simplification Normalization
25Alignment
26Parameterization
27Constructing a Parameterized Representation
- Simplify trees via rewrite rules and convert them
into a normal form - Incrementally align all trees
- Based on an alignment scoring function
- May be solved optimally via dynamic programming
- Unfortunately, is NP-hard for
- unordered operators (e.g., )
- multiple trees
- Pairwise greedy alignment (agglomerative
clustering) - quadratic in the number of trees
- Feng Doolittle, 1987
- For unordered operators, do greedy alignment of
children
28Proposed Goals
- Theoretical
- Modeling tree growth
- GP schema theory
- Experimental (and Implementational)
- Adversarial problems
- Normal forms
- Challenge problems
- Conceptual
- The role of representation-building in AI
29Theoretical Goals
- Modeling Tree Growth
- How does the average / maximal tree size change
over time? - GP is prone to bloat
- Cf. Langon Poli, 2002
- Probabilistic modeling approaches may avoid this
- pressure toward solutions that are easy to model
30Theoretical Goals
- Tree growth in meta-adaptive programming
- is constrained by the size of the representations
- in turn constrained by the alignment scoring
functions - Alignment scoring function
- may lead to a completely bounded space
- may lead to unbounded growth
- Subject to the fitness functions
- Goal is to analyze this theoretically
- leading to speed limit results for scoring
functions
31Theoretical Goals
- Exact GP schema theory
- Recently developed
- Cf. Poli Langdon, 2002
- Equivalent to Markov Chain Models
- Provides exact distributional data for the next
generation based on fitness - Intractable for real problems!
- Goal is to analyze the differences in schema
processing between GP and MAP - crossover (subcomponent mixing) is not random
- Controlled by alignment and probabilistic
modeling - no notion of problem semantics in GP
- In GP, schema (2,a) and (a,a) are completely
separate
32Theoretical Goals - Checklist
Goal Status
Modeling Tree Growth Content-Free Binary Trees Binary Trees With Content Effects of Rewrite Rules ???
Schema-Processing Comparative Analysis ?
33Experimental Goals
- Design Benchmarking on Adversarial Problems
- Decomposition should be known to the user, not
the algorithm - Dimensions of Deceptiveness for Trees
- Relative-position (subtree) deceptiveness
- Absolute-position deceptiveness
- Operator deceptiveness
34Experimental Goals
- Normal Forms
- Heuristically remove redundancy
- Preserve hierarchical structure
- Domains
- Simple Agent Control (Artificial Ants)
- E.g., progn(turn-left, turn-right, move) ? move
- Boolean Formulae
- CNF doesnt preserve hierarchical structure
- Holmans normal form does
- Advanced Agent Control
- Including general programmatic constructs
35Experimental Goals - Checklist
Goal Status
Modeling and Sampling with Features ?
Adversarial Problems ?
Domains Simple Agent Control Boolean Formulae (CNF) Boolean Formulae (Hierarchical) Advanced Agent Control ? ? ??
Tree Alignment and Representation-Building 75
36Conceptual Goals
- A central challenge of AI create systems with
representations that are - Dynamic
- Informed by background knowledge
- Built by the system, not humans
- Facilitate effective problem decomposition for
learning