Introduction of Decomposition Methods in Classification Models - PowerPoint PPT Presentation

1 / 51

About This Presentation

Title:

Introduction of Decomposition Methods in Classification Models

Description:

if Employment == None. then - Silver. if Employment ... then - None. if Employment==Employee and Ownership ==Tenement and Volume 1300. then - Silver ... – PowerPoint PPT presentation

Number of Views:66

Avg rating:3.0/5.0

Slides: 52

Provided by: Ask78

Category:

more less

Transcript and Presenter's Notes

Title: Introduction of Decomposition Methods in Classification Models

1
Introduction of Decomposition Methods in
Classification Models

Lior Rokach and Oded Maimon
Department of Industrial Engineering
Tel-Aviv University

CsStat 2001 Haifa Winter Workshop on Computer
Science and Statistics
2
Agenda

Decomposition Concepts.
Attribute Decomposition
DOT Algorithm.
Generalization Error of DOT.
Benchmark Testing.
Conclusions Future Work.

3
Decomposition

The purpose of decomposition methodology is to
break down a complex problem into several
smaller, less complex and more manageable
sub-problems that are solvable by existing tools,
then joining them to get the solution of the
original problem.

4
Decomposition Advantages

Increased performance (classification accuracy)
Reduced execution time
Achieving clearer results (more understandable)
Suitability for Parallel or Distributed
computation
Ability to use different solution techniques for
individual sub problems
Modularity easier maintenance and support of
the evolutionary computation concept
Feasibility of large problems

5
Issues in Decomposition

What basic types of decomposition methods exist
in supervised learning?
Given a certain problem and certain inducer which
decomposition method performs best?
How should the sub-problems recomposed to
represent original concept learning?
How can we utilize prior knowledge for
decomposing the learning task?

6
Various Elementary Decomposition Approaches
7
How does the decompositions structure is
obtained?

Manually based on expert's knowledge on a
specific domain (Michie, 1995).
Arbitrarily (Domingos, 1995).
Due to some restrictions (distributed learning).
Induced by a suitable algorithm (Zupan, 1997).

8
mutually exclusive or partially overlapping?

Mutually exclusive (pure decomposition) forms a
restriction on the problem space.
However
ME has a greater tendency in reduction of
execution time.
Smaller models better comprehensibility and
better maintenance of the solution.
help avoid some of the error correlation problems
that characterize non mutually exclusive
decompositions.

9
Our Long Term Goal
Develop a meta-algorithm that recursively
decomposes a classification problem using
elementary decomposition methods.
10
Illustrative Example
11
Attribute Decomposition
Using the attributes Ownership and Volume
if Ownership House then - Gold if
Ownership Tenement and Volume 1000
and Volume
None if Ownership None then -
Silver if Ownership Tenement and
(Volume1300) then -
Silver
Using the attributes Employment and Education
if Employment Employee and Education 12
then - Silver if Employment Employee and
Education none if Employment
Self then - Gold if Employment None
then - Silver
12
Sample Decomposition
Model induced from the first half
if Employment Self then - Gold if
Volume - None if Volume 1100 and Employment
Employee then - Silver if Employment
None then - Silver
Model induced from the second half
if Employment Self then - Gold if
Education then - None if Education 12 and
Employment Employee then - Silver if
Employment None then - Silver
13
Space Decomposition
Model induced for Education 15
if Volume 1000 then - Gold
if Volume Silver
Model induced for Education if EmploymentEmployee and Ownership House
then - None if EmploymentEmployee and
Ownership None then - Silver if
EmploymentEmployee and Ownership Tenement
and Volume None if
EmploymentEmployee and Ownership Tenement
and Volume 1300 then - Silver if
Employment None then - Silver if
Employment Self then - Gold
14
Concept Aggregation Decomposition
Initially we check whether the customer is
willing to purchase
if OwnershipTenement and VolumeEmploymentEmployee then - No else - Yes
Then what type of insurance
if Employment Self then -
Gold if Ownership House and
Aggregation Yes then - Gold
if Employment Employee then -
Silver if Ownership Tenement and
Employment None then - Silver
if Ownership None and Employment
None then - Silver
15
Function Decomposition
First a new concept named "wealth" is defined as
following
Then the original concept is considered
if Wealth Rich then - Gold
if Wealth Poor and Employment
Employee then - Silver if
Wealth Poor and Employment None
then - None if Wealth Else
then - Silver
16
Classification Using Attributes Decomposition
17
Simple Example(Illustrating the concepts)

A Training set containing 20 examples created
according to

Uniformly Distributed.
No noise.
No irrelevant/redundant attributes.

18
(No Transcript)
19
Optimal Decision Tree
Minimal optimal tree non unique solution
Classification Accuracy 100
20

Actual Decision Tree Generated by C4.5
Classification Accuracy 93.75
21
Two Decision Trees Generated Using Attribute
Decomposition
Classification Accuracy 100
22
Naive Bayes in Attribute Decomposition
Terminology"
Classification Accuracy 68.75
23
Notation
24
The Bayesian Approach for Classification Problems
25
The Bayesian Approach (cont.)

Duda and Hart (1973) showed that Bayesian
classifier has the highest possible accuracy.
The problem It is hard to estimate the actual
probability distribution.

26
Naïve/Simple Bayes
The well know representation of Naïve Bayes
Representation suitable for Attribute
Decomposition

27
Justification for Using Naïve/Simple Bayes

Suitable to Attribute Decomposition.
Understandable.
Despite its simplicity it tends (in many cases)
to outperform complicated methods like Decision
Trees or Neural Networks.

28
Why? Bias-Variance Tradeoff

The bias is the persistent or systematic error
that the learning algorithm is expected to make.
The variance captures random variation in the
algorithm from one training set to another, due
to noise in the training data, or from random
behavior in the learning algorithm itself.

29
Bias-Variance

Simple methods (like Naïve Bayes) tend to have
high bias error and low variance error.
Complex methods (like Decision Trees) tend to
have low bias error and high variance error.

30
Bias-Variance Tradeoff
Attribute Decomposition
Optimal
Attribute Decomposition can be better but need
to find the right one
31
Attribute Decomposition Approach with Simple
Bayesian Combination

Decomposing the original input attribute set into
mutually exclusive subsets.
Running an inducer upon the training data for
each subset independently.
Combining the generated models with naïve Bayes.

32
Generalization Error

Let h represents the model generated by Inducer I
on S.
Generalization error of the model h is the
probability of h to misclassify an instance in
the space selected according to the distribution
D of the instance labeled space.

33
Problem Definition
Note This is an extension of the feature
selection problem when
34
Attribute Decomposition
Problem It is very hard to find the optimal
decomposition. (NP-Hard)Conclusion We need a
heuristic algorithm.
35
Definition Complete Equivalence
36
Lemma 1 sufficient condition
37
Lemma 2 k-CNF problem
38
Lemma 3 The XOR Problem
39
Oblivious Decision Tree
40
(No Transcript)
41
Generalization Error using VC Dimension
42
Generalization Error of DOT
43
Generalization Error of DOT

We use the estimation of the generalization error
in order decide whether adding a certain
attribute to a certain subset improve the entire
decomposition.
Our experiments have shown that using the lower
bound of the VC-Dimension is more reliable.

44
Artificial Dataset I

Fabricated Dataset with four input attributes
that constitute two independent groups given the
target attribute.
The aim of this problem is to check whether the
algorithm can identify the groups and reduce
error rate.

45
Artificial Dataset I ResultsLemma 1
46
Artificial Dataset II k-CNF Lemma2
47
UCI Repository Results
48
The link between Error Reduction and the
Complexity
49
Main Result