- PowerPoint PPT Presentation

About This Presentation
Title:

Description:

BOF Trees Diagram as a Visual Way to Improve Interpretability of Tree Ensembles Vesna Luzar-Stiffler, Ph.D. University Computing Centre, and CAIR Research ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 25
Provided by: VesnaLuza3
Category:
Tags: adaboost

less

Transcript and Presenter's Notes

Title:


1
BOF Trees Diagram as a Visual Way to Improve
Interpretability of Tree Ensembles
  • Vesna Luzar-Stiffler, Ph.D.
  • University Computing Centre, and CAIR Research
    Centre, Zagreb, Croatia
  • Charles Stiffler, Ph.D.
  • CAIR Research Centre, Zagreb, Croatia
  • vluzar_at_srce.hr, charles.stiffler_at_cair-center.hr

2
Outline
  • Introduction/Background
  • Trees
  • Ensemble Trees
  • Visualization Tools
  • Simulation Results
  • Web Survey Results
  • Conclusions/Recommendations

3
Introduction / Background
  • Classification / Decision Trees
  • Data mining (statistical learning) method for
    classification
  • Invented twice
  • Statistical community Breiman Friedman et.al.
    (1984)
  • Machine Learning community Quinlan (1986)
  • Many positive features
  • Interpretability, ability to handle data of mixed
    type and missing values, robustness to outliers,
    etc.
  • Disadvantage
  • unstable vis-à-vis seemingly minor data
    perturbations ? low predictive power

4
Introduction / Background
  • Possible improvements Ensembles
  • Bagging i.e., Bootstraping trees (Breiman, 1996)
  • Boosting, e.g., AdaBoost (Freund Schapire,
    1997)
  • Random Forests (Breiman, 2001)
  • Stacking, randomized trees, etc.
  • Advantage
  • Improved prediction
  • Disadvantage
  • Loss of interpretability (black box)

5
Classification Tree
  • Let
  • be the classification tree prediction at input x
    obtained from the full training data Z
  • (x1,y1),(x2,y2)(xN,yN)

6
Bagging Classification Tree
  • Let
  • be the classification tree prediction at input x
    obtained from the bootstrap sample Zb, b1,2,B.
  • Bagging estimate

1
2
B
7
Visualization tools
  • Graphs based on predictor importances
  • (Bxp) matrix F (p of predictors)
  • For bagged trees, we take the avg
  • Diagram 1, importance mean bar chart
  • Diagram 2, (BOF Clusters) is the cluster means
    chart (NEW)
  • Diagram 3, (BOF MDPREF) is the multidimensional
    preference bi-plot (NEW)

8
Visualization tools
  • Graphs based on proximity (nxn) matrix P, (n of
    cases)
  • Diagram 4 (Proximity Clusters) is the cluster
    means chart (Breiman,2002)
  • Diagram 5 (Proximity MDS) is the
    multidimensional scaling plot of similar cases
    (Breiman,2002)

9
Simulation experiments
  • S1
  • Generate a sample of size n30,
  • two classes, and
  • p5 variables (x1-x5), with a standard normal
    distribution and pair-wise correlation 0.95.
  • The responses are generated according to
  • Pr(Y1x10.5) 0.2, Pr(Y1x1gt0.5)0.8.
  • S2
  • Generate a sample of size n30,
  • two classes, and
  • p5 variables (x1-x5), with a standard normal
    distribution and pair-wise correlation 0.95
    between x1 and x2, and 0 among other predictors.
  • The responses are generated according to
  • Pr(Y1x10.5) 0.2, Pr(Y1x1gt0.5)0.8.

10
Diagram 1, Mean importance
S1
S2
11
Diagram 2, BOF Clusters
S1
S2
12
Diagram 3, BOF MDPREF
S1
S2
13
Diagram 4, Proximity Clusters
S1
S2
14
Web Survey data
  • ICT infrastructure/usage in Croatian primary and
    secondary schools
  • 25,000 teachers (cases)
  • 200 variables
  • Response classroom use of a computer by
    educators (yes/no)
  • Partition
  • 50 training
  • 25 validation
  • 25 test

15
Initial tree (before bagging)
16
Diagram 1, Mean importance
17
Diagram 2, BOF Clusters
18
Diagram 3, BOF MDPREF
19
Bootstrap tree 11
20
Bootstrap tree 22
21
Bootstrap tree 12
22
Clustering trees
23
Diagram 5, Proximity MDS
24
Conclusions/ Recommendations
  • There are SWs for trees
  • There are some SWs for tree ensembles
  • There are some visualization tools (old and new)
  • The problem is
  • they are not interfaced (integrated)
Write a Comment
User Comments (0)
About PowerShow.com