Introduction of Decomposition Methods in Classification Models - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Introduction of Decomposition Methods in Classification Models

Description:

if Employment == None. then - Silver. if Employment ... then - None. if Employment==Employee and Ownership ==Tenement and Volume 1300. then - Silver ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 52
Provided by: Ask78
Category:

less

Transcript and Presenter's Notes

Title: Introduction of Decomposition Methods in Classification Models


1
Introduction of Decomposition Methods in
Classification Models
  • Lior Rokach and Oded Maimon
  • Department of Industrial Engineering
  • Tel-Aviv University

CsStat 2001 Haifa Winter Workshop on Computer
Science and Statistics
2
Agenda
  • Decomposition Concepts.
  • Attribute Decomposition
  • DOT Algorithm.
  • Generalization Error of DOT.
  • Benchmark Testing.
  • Conclusions Future Work.

3
Decomposition
  • The purpose of decomposition methodology is to
    break down a complex problem into several
    smaller, less complex and more manageable
    sub-problems that are solvable by existing tools,
    then joining them to get the solution of the
    original problem.

4
Decomposition Advantages
  • Increased performance (classification accuracy)
  • Reduced execution time
  • Achieving clearer results (more understandable)
  • Suitability for Parallel or Distributed
    computation
  • Ability to use different solution techniques for
    individual sub problems
  • Modularity easier maintenance and support of
    the evolutionary computation concept
  • Feasibility of large problems

5
Issues in Decomposition
  • What basic types of decomposition methods exist
    in supervised learning?
  • Given a certain problem and certain inducer which
    decomposition method performs best?
  • How should the sub-problems recomposed to
    represent original concept learning?
  • How can we utilize prior knowledge for
    decomposing the learning task?

6
Various Elementary Decomposition Approaches
7
How does the decompositions structure is
obtained?
  • Manually based on expert's knowledge on a
    specific domain (Michie, 1995).
  • Arbitrarily (Domingos, 1995).
  • Due to some restrictions (distributed learning).
  • Induced by a suitable algorithm (Zupan, 1997).

8
mutually exclusive or partially overlapping?
  • Mutually exclusive (pure decomposition) forms a
    restriction on the problem space.
  • However
  • ME has a greater tendency in reduction of
    execution time.
  • Smaller models better comprehensibility and
    better maintenance of the solution.
  • help avoid some of the error correlation problems
    that characterize non mutually exclusive
    decompositions.

9
Our Long Term Goal
Develop a meta-algorithm that recursively
decomposes a classification problem using
elementary decomposition methods.
10
Illustrative Example
11
Attribute Decomposition
Using the attributes Ownership and Volume
if Ownership House then - Gold if
Ownership Tenement and Volume 1000
and Volume
None if Ownership None then -
Silver if Ownership Tenement and
(Volume1300) then -
Silver
Using the attributes Employment and Education
if Employment Employee and Education 12
then - Silver if Employment Employee and
Education none if Employment
Self then - Gold if Employment None
then - Silver
12
Sample Decomposition
Model induced from the first half
if Employment Self then - Gold if
Volume - None if Volume 1100 and Employment
Employee then - Silver if Employment
None then - Silver
Model induced from the second half
if Employment Self then - Gold if
Education then - None if Education 12 and
Employment Employee then - Silver if
Employment None then - Silver
13
Space Decomposition
Model induced for Education 15
if Volume 1000 then - Gold
if Volume Silver
Model induced for Education if EmploymentEmployee and Ownership House
then - None if EmploymentEmployee and
Ownership None then - Silver if
EmploymentEmployee and Ownership Tenement
and Volume None if
EmploymentEmployee and Ownership Tenement
and Volume 1300 then - Silver if
Employment None then - Silver if
Employment Self then - Gold
14
Concept Aggregation Decomposition
Initially we check whether the customer is
willing to purchase
if OwnershipTenement and VolumeEmploymentEmployee then - No else - Yes
Then what type of insurance
if Employment Self then -
Gold if Ownership House and
Aggregation Yes then - Gold
if Employment Employee then -
Silver if Ownership Tenement and
Employment None then - Silver
if Ownership None and Employment
None then - Silver
15
Function Decomposition
First a new concept named "wealth" is defined as
following
Then the original concept is considered
if Wealth Rich then - Gold
if Wealth Poor and Employment
Employee then - Silver if
Wealth Poor and Employment None
then - None if Wealth Else
then - Silver
16
Classification Using Attributes Decomposition
17
Simple Example(Illustrating the concepts)
  • A Training set containing 20 examples created
    according to
  • Uniformly Distributed.
  • No noise.
  • No irrelevant/redundant attributes.

18
(No Transcript)
19
Optimal Decision Tree
Minimal optimal tree non unique solution
Classification Accuracy 100
20

Actual Decision Tree Generated by C4.5
Classification Accuracy 93.75
21
Two Decision Trees Generated Using Attribute
Decomposition
Classification Accuracy 100
22
Naive Bayes in Attribute Decomposition
Terminology"
Classification Accuracy 68.75
23
Notation
24
The Bayesian Approach for Classification Problems
25
The Bayesian Approach (cont.)
  • Duda and Hart (1973) showed that Bayesian
    classifier has the highest possible accuracy.
  • The problem It is hard to estimate the actual
    probability distribution.

26
Naïve/Simple Bayes
The well know representation of Naïve Bayes
Representation suitable for Attribute
Decomposition

27
Justification for Using Naïve/Simple Bayes
  • Suitable to Attribute Decomposition.
  • Understandable.
  • Despite its simplicity it tends (in many cases)
    to outperform complicated methods like Decision
    Trees or Neural Networks.

28
Why? Bias-Variance Tradeoff
  • The bias is the persistent or systematic error
    that the learning algorithm is expected to make.
  • The variance captures random variation in the
    algorithm from one training set to another, due
    to noise in the training data, or from random
    behavior in the learning algorithm itself.

29
Bias-Variance
  • Simple methods (like Naïve Bayes) tend to have
    high bias error and low variance error.
  • Complex methods (like Decision Trees) tend to
    have low bias error and high variance error.

30
Bias-Variance Tradeoff
Attribute Decomposition
Optimal
Attribute Decomposition can be better but need
to find the right one
31
Attribute Decomposition Approach with Simple
Bayesian Combination
  • Decomposing the original input attribute set into
    mutually exclusive subsets.
  • Running an inducer upon the training data for
    each subset independently.
  • Combining the generated models with naïve Bayes.

32
Generalization Error
  • Let h represents the model generated by Inducer I
    on S.
  • Generalization error of the model h is the
    probability of h to misclassify an instance in
    the space selected according to the distribution
    D of the instance labeled space.

33
Problem Definition
Note This is an extension of the feature
selection problem when
34
Attribute Decomposition
Problem It is very hard to find the optimal
decomposition. (NP-Hard)Conclusion We need a
heuristic algorithm.
35
Definition Complete Equivalence
36
Lemma 1 sufficient condition
37
Lemma 2 k-CNF problem
38
Lemma 3 The XOR Problem
39
Oblivious Decision Tree
40
(No Transcript)
41
Generalization Error using VC Dimension
42
Generalization Error of DOT
43
Generalization Error of DOT
  • We use the estimation of the generalization error
    in order decide whether adding a certain
    attribute to a certain subset improve the entire
    decomposition.
  • Our experiments have shown that using the lower
    bound of the VC-Dimension is more reliable.

44
Artificial Dataset I
  • Fabricated Dataset with four input attributes
    that constitute two independent groups given the
    target attribute.
  • The aim of this problem is to check whether the
    algorithm can identify the groups and reduce
    error rate.

45
Artificial Dataset I ResultsLemma 1
46
Artificial Dataset II k-CNF Lemma2
47
UCI Repository Results
48
The link between Error Reduction and the
Complexity
49
Main Result
  • Attribute Decomposition contributes to the
    accuracy.

50
Conclusion Advantages
  • Increase the classification accuracy.
  • Decrease Model Complexity.
  • Enabling effective treatment of database with
    high dimensionally.

51
Conclusion Disadvantages
  • Developing if-then rules is difficult.
  • Potential losing of complex models.
Write a Comment
User Comments (0)
About PowerShow.com