Rule Extraction 2 - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Rule Extraction 2

Description:

Rule Extraction 2: TREPAN/How to expand? 1. How to choose the next node to be expanded? ... Rule Extraction 2: TREPAN /Membership queries ... – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 40
Provided by: patak
Category:

less

Transcript and Presenter's Notes

Title: Rule Extraction 2


1
Rule Extraction 2
  • Dr. Béla Pataki

2
Rule Extraction 2 TREPAN
  • References
  • M.W.Craven Extracting Comprehensible Models From
    Trained Neural Networks, PhD Thesis, Univ. of
    Wisconsin -Madison, 1996
  • Craven M. and Shavlik J. (1996), Extracting
    tree-structured representations of trained
    networks. In Touretzky, D., Mozer, M.,
    Hasselmo, M., editors, Advances in Neural
    Infromation Processing Systems (volume 8). MIT
    Press, Cambridge, M.A.

3
Rule Extraction 2 TREPAN
  • Basic idea
  • First a black box model (e.g. neural network, but
    any type of black box models) is built based on
    the data available
  • The black-box model is used as an oracle, so if
    any analysis is needed, the black-box model is
    analyzed not the original system
  • Next a humanly comprehensible descriptive model
    is built based on the black box model
  • The comprehensible model is a decision tree

4
Rule Extraction 2 TREPAN
  • Attributes XJ could be either discrete (binary)
    one or continuous one
  • Basically networks for clustering purposes are
    modeled by decision trees

5
Rule Extraction 2 TREPAN
  • Problems to be solved
  • How to expand the tree step by step? How to
    choose the next node to be expanded?
  • How to deal with the problem of the decreasing
    number of samples from the root to the leafs?
  • How to choose the best test for the node to be
    expanded?
  • What stopping criteria should be used in the tree
    expansion process?
  • What pruning should be done before the process is
    finished?

6
Rule Extraction 2 TREPAN/How to expand?
  • 1. How to choose the next node to be expanded?
  • Heuristics suggested Look for the node where the
    ratio of incorrectly classified patterns is the
    highest.
  • Many wrong decisions in a node ?
  • Promise of high gain if properly refined
    (expanded)

7
Rule Extraction 2 TREPAN
Rule Extraction 2 TREPAN/How to expand?
  • Choose the node (nth node) for which the promise
    of gain (G(n)) is the highest

8
Rule Extraction 2 TREPAN/How to expand?
  • Example
  • 10000 patterns are classified into 3 classes C1,
    C2, C3, the original number of patterns in the
    classes are 4500, 1800, 3700 respectively.
  • When the number of patterns reaching a node drops
    under 3000 new artificial patterns are generated.

9
Rule Extraction 2 TREPAN/How to expand?
Test1
Test4
10
Rule Extraction 2 TREPAN /Membership queries
2. How to deal with the problem of the decreasing
number of samples from the root to the leafs?
  • Each test in each node divides the set of samples
    into two subsets the number of samples are
    decreasing.
  • Analyzing the node having less samples, gives
    poorer results.

Example
11
Rule Extraction 2 TREPAN /Membership queries
  • Idea If the number of samples drops below the
    appropriate level, new instances are generated.
    The black-box model (NN) is used as an ORACLE ,
    new input patterns are generated and presented to
    the oracle (membership queries), and its answer
    is taken as the ground truth.
  • Problem How to generate the new input patterns?

12
Rule Extraction 2 TREPAN /Membership queries
  • Membership query artificial input patterns
    (attribute vectors) are generated, the black-box
    model is used to generate the classification of
    the pattern (output).
  • The constraints introduced by the previous nodes
    (parent, grandparent etc nodes) should be kept.
  • The distribution of the instances associated to
    the actual node is to be conserved.

13
Rule Extraction 2 TREPAN /Membership queries
  • The distribution of the instances associated to
    the actual node
  • Possible approach always using uniform
    distributions, correct if the fidelity of the
    black-box (NN) model should be uniform in the
    entire instance space.
  • Another approach taking into account the actual
    distribution, the extraction algorithm focus to
    the parts of the space where most examples are
    found. The fidelity of the tree will be higher at
    these parts than the not so frequently used ones.

14
Rule Extraction 2 TREPAN /Membership queries
Example
  • The effects of the constraints introduced by the
    parent, grandparent, etc nodes.

If memberships queries are to be generated in
node4 x1 should be true in all instances, and
most of the examples should have small negative
x2 attribute.
15
Rule Extraction 2 TREPAN /Membership queries
  • Estimating attribute distributions
  • Discrete valued attributes the frequency of
    values are used as empirical distributions
  • Continuous valued attributes kernel density
    estimates
  • consistent estimate (N??)
  • In all cases marginal distributions are used
    (dependencies among attributes are neglected),
    but locally different ones

16
Rule Extraction 2 TREPAN /Membership queries
  • Example

Give the first 3 membership queries in the node
if the initial distributions (at node0) are
17
Rule Extraction 2 TREPAN /Membership queries
  • Distributions of attributes at node1 (after
    performing Test0)
  • Probability of the conditions tested at node1

18
Rule Extraction 2 TREPAN /Membership queries
  • Distribution of the possible outcomes of the 1 of
    x1?0.5x2 test (node1)
  • Distribution of conditions tested at node2
  • (using a uniform
  • distribution
  • over 0,1)

19
Rule Extraction 2 TREPAN /Membership queries
  • If a random generator of uniform distribution
    over 0,1 is used, and the following random
    numbers (?i) are provided
  • 0.950 0.231 0.607 0.486 0.018
    0.762 0.456 0.892 0.821 0.444
  • 1st step ?10.950
  • ?choose
  • x1lt0.5 x2F

20
Rule Extraction 2 TREPAN /Membership queries
  • 2nd step
  • second random number
  • 0.950 0.231 0.607 0.486 0.018
    0.762 0.456 0.892 0.821 0.444

21
Rule Extraction 2 TREPAN /Membership queries
  • 3rd step next random number
  • 0.950 0.231 0.607 0.486 0.018
    0.762 0.456 0.892 0.821 0.444
  • therefore the second membership query is the same
    type as the previous one

22
Rule Extraction 2 TREPAN /Membership queries
  • 4th step -next random number
  • 0.950 0.231 0.607 0.486 0.018
    0.762 0.456 0.892 0.821 0.444

23
Rule Extraction 2 TREPAN /Membership queries
  • 5th step - next random number
  • 0.950 0.231 0.607 0.486 0.018
    0.762 0.456 0.892 0.821 0.444
  • ?50.018
  • ? choose
  • x1?0.5 x2T

24
Rule Extraction 2 TREPAN /Membership queries
  • 6th step
  • next random number
  • 0.950 0.231 0.607 0.486 0.018
    0.762 0.456 0.892 0.821 0.444

25
Rule Extraction 2 TREPAN/Search for best test
  • 3. How to choose the best test for the node to be
    expanded?
  • Tests m-of-n type tests are used.
  • (Integer threshold m and a set of n Boolean
    literals. The test is satisfied if at least m of
    the literals are satisfied.)
  • Example 2 of A,B,C is the same as
  • (the last term is not needed)

26
Rule Extraction 2 TREPAN /Search for best test
  • Literals used in tests
  • Binary attributes - binary test separates the
    data according to the test
  • Discrete features with more than 2 values
  • binary tests of the form attributevalue1?
    attributevalue2, etc.
  • Real valued features binary tests of the form
    xjgtthresholdj1? or xjltthresholdj2?

27
Rule Extraction 2 TREPAN /Search for best test
  • Real valued features
  • threshold midpoints between values of the
    attributes of patterns reaching the actual node
    are threshold candidates
  • only that midpoints are considered where the two
    values belong to different classes

28
Rule Extraction 2 TREPAN /Search for best test
  • Search for best splitting test
  • The best binary test at the current node is
    selected based on the information gain
    criterion. It is the seed of the search process.
  • Two operators are used in the search
  • m of n ? m of n1
  • m of n ? m1 of n1
  • Limited form of backtracking conflicting
    conditions can be omitted

29
Rule Extraction 2 TREPAN /Search for best test
  • Examples
  • 2 of x1,x3 ? 2 of x1,x3,x4gt0.7
  • 2 of x1,x3 ? 3 of x1,x3,x4gt0.7
  • 2 of x1,x3,x4gt0.7,? x3 ? 2 of x1,x4gt0.7

30
Rule Extraction 2 TREPAN /Search for best test
  • Using these operators a beam-search method is
    used with a beam width of two. Both the seed
    test and its negated version are used as starting
    points.
  • Example
  • Assume the best binary test is 1 of x1.
    Naturally 1 of ?x1 is of the same value.
  • If the target is 2 of x1,, x3, x7 the first
    seed is to be used if the target is 2 of ?x1,x3,
    x7.

31
Rule Extraction 2 TREPAN /Search for best test
  • Information gain criterion for tests
  • The test T is selected that maximizes the
    information gained about the class labels of the
    examples S
  • gain(T)info(S)-infoT(S)
  • where info(S) is thhe information needed to
    classify the examples of S, that belong to
    classes C1,C2,C3,,Ck

32
Rule Extraction 2 TREPAN /Search for best test
  • infoT is the information needed after performing
    test T that has outcomes i1,,n. The subset of
    examples that have the ith outcome is Si.

33
Rule Extraction 2 TREPAN /Search for best test
  • Example A neural network performs two-class
    classification. A decision tree is generated
    using TREPAN when expanding the kth node the
    1000 examples reaching that node have the
    following attribute distribution. Which
  • of the possible 4
  • tests gives the highest
  • gain?

34
Rule Extraction 2 TREPAN /Search for best test
  • Testing x1

35
Rule Extraction 2 TREPAN /Search for best test
  • Testing x2

36
Rule Extraction 2 TREPAN /Search for best test
  • Testing 1 of x1,x2

37
Rule Extraction 2 TREPAN /Search for best test
  • Testing 2 of x1,x2

?BEST !!!
38
Rule Extraction 2 TREPAN /Stopping criteria
  • 4. Stopping criteria in the tree expansion
    process
  • Local criterion Purity of the example set
    reaching the node. The node becomes a leaf if
    with high probability it covers only instances of
    a single class.
  • Global criterion Complexity of the decision
    tree complexity of the generated explanations.
    Limit of number of nodes, depth of the tree, etc.

39
Rule Extraction 2 TREPAN /Pruning the tree
  • 5. Pruning the final decision tree
  • Purpose to detect subtrees that predict the same
    class at all of their leaves, and to collapse
    such subtree into one single leaf.
  • Other pruning methods can be used as well, but
    they are not built into the TREPAN.
Write a Comment
User Comments (0)
About PowerShow.com