Title: CIS732Lecture1320011004
1Ben Perry M.S. Thesis Defense
A Genetic Algorithm for Learning Bayesian
Network Adjacency Matrices from Data
Benjamin B. Perry Laboratory for Knowledge
Discovery in Databases Kansas State
University http//www.kddresearch.org http//www.
cis.ksu.edu/bbp9857
2Overview
- Bayesian Network
- Definitions and examples
- Inference and learning
- Genetic Algorithms
- Structure Learning Background
- Problem
- K2 algorithm
- Sparse Candidate
- Improving K2 Permutation Genetic Algorithm
(GASLEAK) - Shortcoming greedy, sensitive to ordering
- Permutation GA
- Masters thesis Adjacency Matrix GA (SLAM GA)
- Rationale
- Evaluation with Known Bayesian Networks
- Summary
3Bayesian Belief Networks (BBNS)Definition
- Bayesian Network
- Directed acyclic graph
- Vertices (nodes) denote events, or states of
affairs (each a random variable) - Edges (arcs, links) denote conditional
dependencies, causalities - Model of conditional dependence assertions (or CI
assumptions) - Example (Bens Presentation BBN)
(sprinkler) - General Product (Chain) Rule for BBNs
Appearance Good, Bad
Ben is nervous Extremely, Yes, No
Sleep Narcoleptic Well Bad All-nighter
X2
X1
X4
X5
Bens presentation Good, Not so good, Failed
miserably
X3
Memory Elephant, Good, Bad, None
P(Well, Good, Good, No, Good) P(G) P(G W)
P(G W) P(N G, G) P(G N)
4Graphical Modelsof Probability Distributions
- Idea
- Want model that can be used to perform inference
- Desired properties
- Correlations among variables
- Ability to represent functional, logical,
stochastic relationships - Probability of certain events
- Inference Decision Support Problems
- Diagnosis (medical, equipment)
- Pattern recognition (image, speech)
- Prediction
- Want to Learn Most Likely Model that Generates
Observed Data - Under certain assumptions (Causal Markovity), it
has been shown that we can do it - Given data D (tuples or vectors containing
observed values of variables) - Return directed graph (V, E) expressing target
CPTs - NEXT Genetic algorithms
5Genetic Algorithms
- Idea
- Emulate natural process of survival of the
fittest (Example Roaches adapt) - Each generation has many diverse individuals
- Each individual competes for the chance to
survive - Most common approach best individuals live to
the next generation and mate - Produce children with traits from both parents
- If parents are strong, children might be stronger
- Major components (operators)
- Fitness function
- Chromosome manipulation
- Cross-over (Not the John Edward type!),
mutation - From (Educated?) Guess to Gold
- Initial population typically random or not much
better than random bad scores - Performs well with a non-deceptive search space
and good genetic operators - Ability to escape local optima with mutations.
- Not guaranteed to get the best answer, but
usually gets close
6Learning StructureK2 Algorithm
- Algorithm Learn-BBN-Structure-K2 (D, Max-Parents)
- FOR i ? 1 to n DO // arbitrary ordering of
variables x1, x2, , xn - WHILE (Parentsxi.Size lt Max-Parents) DO // find
best candidate parent - Best ? argmaxjgti (P(D xj ? Parentsxi) // max
Dirichlet score - IF (Parentsxi Best).Score gt
Parentsxi.Score) THEN Parentsxi Best - RETURN (Parentsxi i ? 1, 2, , n)
- A Logical Alarm Reduction Mechanism Beinlich et
al, 1989 - BBN model for patient monitoring in surgical
anesthesia - Vertices (37) findings (e.g., esophageal
intubation), intermediates, observables - K2 found BBN different in only 1 edge from gold
standard (elicited from expert)
7Learning StructureK2 downfalls
- Greedy (may fall into local maxima)
- Highly dependent upon node ordering
- Optimal node ordering must be given
- If optimal order is already known, an expert
could probably create the network - Number of orderings consistent with DAGs is
exponential (n!)
8Learning StructureSparse Candidate
- General Idea
- Inspect k-best parent candidates at a time. (K2
only inspects one) - k is typically very small 5 k 15
- Exponential to the order of k
- Algorithm
- Loop until no improvements or iteration limit
exceeds - For each node, select the top k parent
candidates (mutual information or m_disc)
RestrictBuild a network by manipulating
parents (add, remove, reverse from candidate set
for each node) . Only accept changes that
maximizes the network score (Minimum Descriptor
Length) Maximize phase - Must handle cycles.. expensive.
- K2 gives this to us for free
- Next Improving K2
9GASLEAKA Permutation GA for Variable Ordering
10Properties of the Genetic Algorithm
- Elitist
- Chromosome representation
- Integer permutation ordering
- Sample chromosome in a BBN of 5 nodes might look
like 3 1 2 0 4 - Seeding
- Random shuffle
- Operators
- Order crossover
- Swap mutation
- Fitness
- RMSE
- Job farm
- Java-based Utilize many machines regardless of
OS
11GASLEAK results
- Not encouraging
- Bad fitness functionor bad evidence b.v.
- Many graph errors
12Masters Thesis SLAM GA
- SLAM GA Structure Learning Adjacency Matrix
Genetic Algorithm - Initial population- tried several approaches
- Completely Random Bayesian Networks (Box-Muller,
Max parents) - Many illegal structures wrote fixCycles
algorithm. - Random networks generated from parents
pre-selected by the Restrict phase of Sparse
Candidate - Performed better than random
- Aggregate of k learned networks from K2 given
random orderings (cycles eliminated) Best
approach
13Aggregator Instantiater
Training Data
K2 Manager
BBN
K2
D
Random Order
1
K2
Random Order
BBN
2
Aggregator
Aggregate BBN
. . . .
BBN
BBN
k
For small networks, k1 is best. For larger
networks, k2 is best.
14SLAM GA
- Chromosome representation
- Edge matrix n2 bits
- Each bit represents a parent edge to node.
- 1 parent, 0 not parent
- Operators
- Crossover Swap parents, fix cycles.
15SLAM GA Crossover
16SLAM GA
- Chromosome representation
- Edge matrix n2
- Each bit represents a parent edge to node.
- 1 parent, 0 not parent
- Operators
- Crossover Swap parents, fix cycles.
- Mutation Reverse, delete, or add a random number
of edges. Fix cycles. - Fitness
- Total Bayesian Dirichlet equivalence score for
each node
17Results - Asia
Best of first generation
Actual
15 Graph Errors
18Results Asia
19Results - Poker
Best of first generation
Actual
11 Graph Errors
20Results - Poker
21Results - Golf
Best of first generation
Actual
11 Graph Errors
22Results - Golf
23Results Boerlage92
Initial
Actual
24Results - Boerlage92
25Results - Alarm
26Final Fitness Values
27K2 vs. SLAM GA
- K2
- Very good if ordering is known
- Ordering is often not known
- Greedy, very dependent on ordering.
- SLAM GA
- Stochastic falls out of local optima trap
- Can improve on bad structures learned by K2
- Takes much longer than K2
28GASLEAK vs. SLAM GA
- GASLEAK
- Gold network never recovered
- Much more computationally-expensive
- K2 is run on each new individual each
generation - Each chromosome must be scored
- Final network has many graph errors
- SLAM GA
- For small networks, gold standard network often
recovered. - Relatively few graph errors for final network.
- Less computationally intensive
- Initial population most expensive
- Each chromosome must be scored
29SLAM GA Ramifications
- Effective structure learning algorithm
- Ideal for small networks
- Improvement over GASLEAK
- SLAM GA faster in spite of same GA parameters
- SLAM GA more accurate
- Improvement over K2
- Aggregate algorithm produces better initial
population - Parent-swapping crossover technique effective
- Diversifies search space while retaining past
information
30SLAM GA Future Work
- Parameter tweaking
- Better fitness function
- Several bad structures score better than gold
standard - GA works fine
- Intelligent mutation operator
- Add edges from pre-qualified set of candidate
parents - New instantiation methods
- Use GASLEAK
- Other structure-learning algorithms
- Scalability
- Job farm
31Summary
- Bayesian Network
- Genetic Algorithms
- Learning Structure K2, Sparse Candidate
- GASLEAK
- SLAM GA