Rule Extraction 2 - PowerPoint PPT Presentation

1 / 39

About This Presentation

Title:

Rule Extraction 2

Description:

Rule Extraction 2: TREPAN/How to expand? 1. How to choose the next node to be expanded? ... Rule Extraction 2: TREPAN /Membership queries ... – PowerPoint PPT presentation

Number of Views:113

Avg rating:3.0/5.0

Slides: 40

Provided by: patak

Category:

more less

Transcript and Presenter's Notes

Title: Rule Extraction 2

1
Rule Extraction 2

Dr. Béla Pataki

2
Rule Extraction 2 TREPAN

References
M.W.Craven Extracting Comprehensible Models From
Trained Neural Networks, PhD Thesis, Univ. of
Wisconsin -Madison, 1996
Craven M. and Shavlik J. (1996), Extracting
tree-structured representations of trained
networks. In Touretzky, D., Mozer, M.,
Hasselmo, M., editors, Advances in Neural
Infromation Processing Systems (volume 8). MIT
Press, Cambridge, M.A.

3
Rule Extraction 2 TREPAN

Basic idea
First a black box model (e.g. neural network, but
any type of black box models) is built based on
the data available
The black-box model is used as an oracle, so if
any analysis is needed, the black-box model is
analyzed not the original system
Next a humanly comprehensible descriptive model
is built based on the black box model
The comprehensible model is a decision tree

4
Rule Extraction 2 TREPAN

Attributes XJ could be either discrete (binary)
one or continuous one
Basically networks for clustering purposes are
modeled by decision trees

5
Rule Extraction 2 TREPAN

Problems to be solved
How to expand the tree step by step? How to
choose the next node to be expanded?
How to deal with the problem of the decreasing
number of samples from the root to the leafs?
How to choose the best test for the node to be
expanded?
What stopping criteria should be used in the tree
expansion process?
What pruning should be done before the process is
finished?

6
Rule Extraction 2 TREPAN/How to expand?

1. How to choose the next node to be expanded?
Heuristics suggested Look for the node where the
ratio of incorrectly classified patterns is the
highest.
Many wrong decisions in a node ?
Promise of high gain if properly refined
(expanded)

7
Rule Extraction 2 TREPAN
Rule Extraction 2 TREPAN/How to expand?

Choose the node (nth node) for which the promise
of gain (G(n)) is the highest

8
Rule Extraction 2 TREPAN/How to expand?

Example
10000 patterns are classified into 3 classes C1,
C2, C3, the original number of patterns in the
classes are 4500, 1800, 3700 respectively.
When the number of patterns reaching a node drops
under 3000 new artificial patterns are generated.

9
Rule Extraction 2 TREPAN/How to expand?
Test1
Test4
10
Rule Extraction 2 TREPAN /Membership queries
2. How to deal with the problem of the decreasing
number of samples from the root to the leafs?

Each test in each node divides the set of samples
into two subsets the number of samples are
decreasing.
Analyzing the node having less samples, gives
poorer results.

Example
11
Rule Extraction 2 TREPAN /Membership queries

Idea If the number of samples drops below the
appropriate level, new instances are generated.
The black-box model (NN) is used as an ORACLE ,
new input patterns are generated and presented to
the oracle (membership queries), and its answer
is taken as the ground truth.
Problem How to generate the new input patterns?

12
Rule Extraction 2 TREPAN /Membership queries

Membership query artificial input patterns
(attribute vectors) are generated, the black-box
model is used to generate the classification of
the pattern (output).
The constraints introduced by the previous nodes
(parent, grandparent etc nodes) should be kept.
The distribution of the instances associated to
the actual node is to be conserved.

13
Rule Extraction 2 TREPAN /Membership queries

The distribution of the instances associated to
the actual node
Possible approach always using uniform
distributions, correct if the fidelity of the
black-box (NN) model should be uniform in the
entire instance space.
Another approach taking into account the actual
distribution, the extraction algorithm focus to
the parts of the space where most examples are
found. The fidelity of the tree will be higher at
these parts than the not so frequently used ones.

14
Rule Extraction 2 TREPAN /Membership queries
Example

The effects of the constraints introduced by the
parent, grandparent, etc nodes.

If memberships queries are to be generated in
node4 x1 should be true in all instances, and
most of the examples should have small negative
x2 attribute.
15
Rule Extraction 2 TREPAN /Membership queries

Estimating attribute distributions
Discrete valued attributes the frequency of
values are used as empirical distributions
Continuous valued attributes kernel density
estimates
consistent estimate (N??)
In all cases marginal distributions are used
(dependencies among attributes are neglected),
but locally different ones

16
Rule Extraction 2 TREPAN /Membership queries

Example

Give the first 3 membership queries in the node
if the initial distributions (at node0) are
17
Rule Extraction 2 TREPAN /Membership queries

Distributions of attributes at node1 (after
performing Test0)
Probability of the conditions tested at node1

18
Rule Extraction 2 TREPAN /Membership queries

Distribution of the possible outcomes of the 1 of
x1?0.5x2 test (node1)
Distribution of conditions tested at node2
(using a uniform
distribution
over 0,1)

19
Rule Extraction 2 TREPAN /Membership queries

If a random generator of uniform distribution
over 0,1 is used, and the following random
numbers (?i) are provided
0.950 0.231 0.607 0.486 0.018
0.762 0.456 0.892 0.821 0.444
1st step ?10.950
?choose
x1lt0.5 x2F

20
Rule Extraction 2 TREPAN /Membership queries

2nd step
second random number
0.950 0.231 0.607 0.486 0.018
0.762 0.456 0.892 0.821 0.444

21
Rule Extraction 2 TREPAN /Membership queries

3rd step next random number
0.950 0.231 0.607 0.486 0.018
0.762 0.456 0.892 0.821 0.444
therefore the second membership query is the same
type as the previous one

22
Rule Extraction 2 TREPAN /Membership queries

4th step -next random number
0.950 0.231 0.607 0.486 0.018
0.762 0.456 0.892 0.821 0.444

23
Rule Extraction 2 TREPAN /Membership queries

5th step - next random number
0.950 0.231 0.607 0.486 0.018
0.762 0.456 0.892 0.821 0.444
?50.018
? choose
x1?0.5 x2T

24
Rule Extraction 2 TREPAN /Membership queries

6th step
next random number
0.950 0.231 0.607 0.486 0.018
0.762 0.456 0.892 0.821 0.444

25
Rule Extraction 2 TREPAN/Search for best test

3. How to choose the best test for the node to be
expanded?
Tests m-of-n type tests are used.
(Integer threshold m and a set of n Boolean
literals. The test is satisfied if at least m of
the literals are satisfied.)
Example 2 of A,B,C is the same as
(the last term is not needed)

26
Rule Extraction 2 TREPAN /Search for best test

Literals used in tests
Binary attributes - binary test separates the
data according to the test
Discrete features with more than 2 values
binary tests of the form attributevalue1?
attributevalue2, etc.
Real valued features binary tests of the form
xjgtthresholdj1? or xjltthresholdj2?

27
Rule Extraction 2 TREPAN /Search for best test

Real valued features
threshold midpoints between values of the
attributes of patterns reaching the actual node
are threshold candidates
only that midpoints are considered where the two
values belong to different classes

28
Rule Extraction 2 TREPAN /Search for best test

Search for best splitting test
The best binary test at the current node is
selected based on the information gain
criterion. It is the seed of the search process.
Two operators are used in the search
m of n ? m of n1
m of n ? m1 of n1
Limited form of backtracking conflicting
conditions can be omitted

29
Rule Extraction 2 TREPAN /Search for best test

Examples
2 of x1,x3 ? 2 of x1,x3,x4gt0.7
2 of x1,x3 ? 3 of x1,x3,x4gt0.7
2 of x1,x3,x4gt0.7,? x3 ? 2 of x1,x4gt0.7

30
Rule Extraction 2 TREPAN /Search for best test

Using these operators a beam-search method is
used with a beam width of two. Both the seed
test and its negated version are used as starting
points.
Example
Assume the best binary test is 1 of x1.
Naturally 1 of ?x1 is of the same value.
If the target is 2 of x1,, x3, x7 the first
seed is to be used if the target is 2 of ?x1,x3,
x7.

31
Rule Extraction 2 TREPAN /Search for best test

Information gain criterion for tests
The test T is selected that maximizes the
information gained about the class labels of the
examples S
gain(T)info(S)-infoT(S)
where info(S) is thhe information needed to
classify the examples of S, that belong to
classes C1,C2,C3,,Ck

32
Rule Extraction 2 TREPAN /Search for best test

infoT is the information needed after performing
test T that has outcomes i1,,n. The subset of
examples that have the ith outcome is Si.

33
Rule Extraction 2 TREPAN /Search for best test

Example A neural network performs two-class
classification. A decision tree is generated
using TREPAN when expanding the kth node the
1000 examples reaching that node have the
following attribute distribution. Which
of the possible 4
tests gives the highest
gain?

34
Rule Extraction 2 TREPAN /Search for best test

Testing x1

35
Rule Extraction 2 TREPAN /Search for best test

Testing x2

36
Rule Extraction 2 TREPAN /Search for best test

Testing 1 of x1,x2

37
Rule Extraction 2 TREPAN /Search for best test

Testing 2 of x1,x2

?BEST !!!
38
Rule Extraction 2 TREPAN /Stopping criteria

4. Stopping criteria in the tree expansion
process
Local criterion Purity of the example set
reaching the node. The node becomes a leaf if
with high probability it covers only instances of
a single class.
Global criterion Complexity of the decision
tree complexity of the generated explanations.
Limit of number of nodes, depth of the tree, etc.

39
Rule Extraction 2 TREPAN /Pruning the tree

5. Pruning the final decision tree
Purpose to detect subtrees that predict the same
class at all of their leaves, and to collapse
such subtree into one single leaf.
Other pruning methods can be used as well, but
they are not built into the TREPAN.

Write a Comment

User Comments (0)