CIS732Lecture1320011004 - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

CIS732Lecture1320011004

Description:

Diagnosis (medical, equipment) Pattern recognition (image, speech) Prediction ... Histogram of estimated fitness for all 8! = 40320 permutations of Asia variables. ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 32
Provided by: lindaj160
Category:

less

Transcript and Presenter's Notes

Title: CIS732Lecture1320011004


1
Ben Perry M.S. Thesis Defense
A Genetic Algorithm for Learning Bayesian
Network Adjacency Matrices from Data
Benjamin B. Perry Laboratory for Knowledge
Discovery in Databases Kansas State
University http//www.kddresearch.org http//www.
cis.ksu.edu/bbp9857
2
Overview
  • Bayesian Network
  • Definitions and examples
  • Inference and learning
  • Genetic Algorithms
  • Structure Learning Background
  • Problem
  • K2 algorithm
  • Sparse Candidate
  • Improving K2 Permutation Genetic Algorithm
    (GASLEAK)
  • Shortcoming greedy, sensitive to ordering
  • Permutation GA
  • Masters thesis Adjacency Matrix GA (SLAM GA)
  • Rationale
  • Evaluation with Known Bayesian Networks
  • Summary

3
Bayesian Belief Networks (BBNS)Definition
  • Bayesian Network
  • Directed acyclic graph
  • Vertices (nodes) denote events, or states of
    affairs (each a random variable)
  • Edges (arcs, links) denote conditional
    dependencies, causalities
  • Model of conditional dependence assertions (or CI
    assumptions)
  • Example (Bens Presentation BBN)
    (sprinkler)
  • General Product (Chain) Rule for BBNs

Appearance Good, Bad
Ben is nervous Extremely, Yes, No
Sleep Narcoleptic Well Bad All-nighter
X2
X1
X4
X5
Bens presentation Good, Not so good, Failed
miserably
X3
Memory Elephant, Good, Bad, None
P(Well, Good, Good, No, Good) P(G) P(G W)
P(G W) P(N G, G) P(G N)
4
Graphical Modelsof Probability Distributions
  • Idea
  • Want model that can be used to perform inference
  • Desired properties
  • Correlations among variables
  • Ability to represent functional, logical,
    stochastic relationships
  • Probability of certain events
  • Inference Decision Support Problems
  • Diagnosis (medical, equipment)
  • Pattern recognition (image, speech)
  • Prediction
  • Want to Learn Most Likely Model that Generates
    Observed Data
  • Under certain assumptions (Causal Markovity), it
    has been shown that we can do it
  • Given data D (tuples or vectors containing
    observed values of variables)
  • Return directed graph (V, E) expressing target
    CPTs
  • NEXT Genetic algorithms

5
Genetic Algorithms
  • Idea
  • Emulate natural process of survival of the
    fittest (Example Roaches adapt)
  • Each generation has many diverse individuals
  • Each individual competes for the chance to
    survive
  • Most common approach best individuals live to
    the next generation and mate
  • Produce children with traits from both parents
  • If parents are strong, children might be stronger
  • Major components (operators)
  • Fitness function
  • Chromosome manipulation
  • Cross-over (Not the John Edward type!),
    mutation
  • From (Educated?) Guess to Gold
  • Initial population typically random or not much
    better than random bad scores
  • Performs well with a non-deceptive search space
    and good genetic operators
  • Ability to escape local optima with mutations.
  • Not guaranteed to get the best answer, but
    usually gets close

6
Learning StructureK2 Algorithm
  • Algorithm Learn-BBN-Structure-K2 (D, Max-Parents)
  • FOR i ? 1 to n DO // arbitrary ordering of
    variables x1, x2, , xn
  • WHILE (Parentsxi.Size lt Max-Parents) DO // find
    best candidate parent
  • Best ? argmaxjgti (P(D xj ? Parentsxi) // max
    Dirichlet score
  • IF (Parentsxi Best).Score gt
    Parentsxi.Score) THEN Parentsxi Best
  • RETURN (Parentsxi i ? 1, 2, , n)
  • A Logical Alarm Reduction Mechanism Beinlich et
    al, 1989
  • BBN model for patient monitoring in surgical
    anesthesia
  • Vertices (37) findings (e.g., esophageal
    intubation), intermediates, observables
  • K2 found BBN different in only 1 edge from gold
    standard (elicited from expert)

7
Learning StructureK2 downfalls
  • Greedy (may fall into local maxima)
  • Highly dependent upon node ordering
  • Optimal node ordering must be given
  • If optimal order is already known, an expert
    could probably create the network
  • Number of orderings consistent with DAGs is
    exponential (n!)

8
Learning StructureSparse Candidate
  • General Idea
  • Inspect k-best parent candidates at a time. (K2
    only inspects one)
  • k is typically very small 5 k 15
  • Exponential to the order of k
  • Algorithm
  • Loop until no improvements or iteration limit
    exceeds
  • For each node, select the top k parent
    candidates (mutual information or m_disc)
    RestrictBuild a network by manipulating
    parents (add, remove, reverse from candidate set
    for each node) . Only accept changes that
    maximizes the network score (Minimum Descriptor
    Length) Maximize phase
  • Must handle cycles.. expensive.
  • K2 gives this to us for free
  • Next Improving K2

9
GASLEAKA Permutation GA for Variable Ordering
10
Properties of the Genetic Algorithm
  • Elitist
  • Chromosome representation
  • Integer permutation ordering
  • Sample chromosome in a BBN of 5 nodes might look
    like 3 1 2 0 4
  • Seeding
  • Random shuffle
  • Operators
  • Order crossover
  • Swap mutation
  • Fitness
  • RMSE
  • Job farm
  • Java-based Utilize many machines regardless of
    OS

11
GASLEAK results
  • Not encouraging
  • Bad fitness functionor bad evidence b.v.
  • Many graph errors

12
Masters Thesis SLAM GA
  • SLAM GA Structure Learning Adjacency Matrix
    Genetic Algorithm
  • Initial population- tried several approaches
  • Completely Random Bayesian Networks (Box-Muller,
    Max parents)
  • Many illegal structures wrote fixCycles
    algorithm.
  • Random networks generated from parents
    pre-selected by the Restrict phase of Sparse
    Candidate
  • Performed better than random
  • Aggregate of k learned networks from K2 given
    random orderings (cycles eliminated) Best
    approach

13
Aggregator Instantiater
Training Data
K2 Manager
BBN
K2
D
Random Order
1
K2
Random Order
BBN
2
Aggregator
Aggregate BBN
. . . .
BBN
BBN
k
For small networks, k1 is best. For larger
networks, k2 is best.
14
SLAM GA
  • Chromosome representation
  • Edge matrix n2 bits
  • Each bit represents a parent edge to node.
  • 1 parent, 0 not parent
  • Operators
  • Crossover Swap parents, fix cycles.

15
SLAM GA Crossover
16
SLAM GA
  • Chromosome representation
  • Edge matrix n2
  • Each bit represents a parent edge to node.
  • 1 parent, 0 not parent
  • Operators
  • Crossover Swap parents, fix cycles.
  • Mutation Reverse, delete, or add a random number
    of edges. Fix cycles.
  • Fitness
  • Total Bayesian Dirichlet equivalence score for
    each node

17
Results - Asia
Best of first generation
Actual
15 Graph Errors
18
Results Asia
19
Results - Poker
Best of first generation
Actual
11 Graph Errors
20
Results - Poker
21
Results - Golf
Best of first generation
Actual
11 Graph Errors
22
Results - Golf
23
Results Boerlage92
Initial
Actual
24
Results - Boerlage92
25
Results - Alarm
26
Final Fitness Values
27
K2 vs. SLAM GA
  • K2
  • Very good if ordering is known
  • Ordering is often not known
  • Greedy, very dependent on ordering.
  • SLAM GA
  • Stochastic falls out of local optima trap
  • Can improve on bad structures learned by K2
  • Takes much longer than K2

28
GASLEAK vs. SLAM GA
  • GASLEAK
  • Gold network never recovered
  • Much more computationally-expensive
  • K2 is run on each new individual each
    generation
  • Each chromosome must be scored
  • Final network has many graph errors
  • SLAM GA
  • For small networks, gold standard network often
    recovered.
  • Relatively few graph errors for final network.
  • Less computationally intensive
  • Initial population most expensive
  • Each chromosome must be scored

29
SLAM GA Ramifications
  • Effective structure learning algorithm
  • Ideal for small networks
  • Improvement over GASLEAK
  • SLAM GA faster in spite of same GA parameters
  • SLAM GA more accurate
  • Improvement over K2
  • Aggregate algorithm produces better initial
    population
  • Parent-swapping crossover technique effective
  • Diversifies search space while retaining past
    information

30
SLAM GA Future Work
  • Parameter tweaking
  • Better fitness function
  • Several bad structures score better than gold
    standard
  • GA works fine
  • Intelligent mutation operator
  • Add edges from pre-qualified set of candidate
    parents
  • New instantiation methods
  • Use GASLEAK
  • Other structure-learning algorithms
  • Scalability
  • Job farm

31
Summary
  • Bayesian Network
  • Genetic Algorithms
  • Learning Structure K2, Sparse Candidate
  • GASLEAK
  • SLAM GA
Write a Comment
User Comments (0)
About PowerShow.com