Title: Prediction Cubes
1Prediction Cubes
- Bee-Chung Chen, Lei Chen,
- Yi Lin and Raghu Ramakrishnan
- University of Wisconsin - Madison
2Big Picture
- Subset analysis Use models to identify
interesting subsets - Cube space Dimension hierarchies
- Combination of dimension-attribute values defines
a candidate subset (like regular OLAP) - Measure of interest for a subset
- Decision/prediction behavior within the subset
- Big difference from regular OLAP!!
3Example (1/5) Traditional OLAP
Goal Look for patterns of unusually
high numbers of applications
4Example (2/5) Decision Analysis
Goal Analyze a banks loan approval process
w.r.t. two dimensions Location and Time
5Example (3/5) Questions of Interest
- Goal Analyze a banks loan decision process with
respect to two dimensions Location and Time - Target Find discriminatory loan decision
- Questions
- Are there locations and times when the decision
making was similar to a set of discriminatory
decision examples (or similar to a given
discriminatory decision model)? - Are there locations and times during which
approvals depended highly on Race or Sex?
6Example (4/5) Prediction Cube
- Build a model using data from WI in Dec., 1985
- Evaluate that model
- Measure in a cell
- Accuracy of the model
- Predictiveness of Race
- measured based on that
- model
- Similarity between that
- model and a given model
Data
7Example (4/5) Prediction Cube
Cell value Predictiveness of Race
8Model-Based Subset Analysis
- Given A data table D with schema Z, X, Y
- Z Dimension attributes, e.g., Location, Time
- X Predictor attributes, e.g., Race, Sex,
- Y Class-label attribute, e.g., Approval
9Model-Based Subset Analysis
Z Dimension
X Predictor
Y Class
- Goal To understand the relationship between X
and Y on different subsets ?Z(D) of data D - Relationship p(Y X, ?Z(D))
- Approach
- Build model h(X ?Z(D)) ? p(Y X, ?Z(D))
- Evaluate h(X ?Z(D))
- Accuracy, model similarity, predictiveness
10Outline
- Motivating example
- Definition of prediction cubes
- Efficient prediction cube materialization
- Experimental results
- Conclusion
11Prediction Cubes
- User interface OLAP data cubes
- Dimensions, hierarchies, roll up and drill down
- Values in the cells
- Accuracy
- Similarity
- Predictiveness
12Prediction Cubes
- Three types of prediction cubes
- Test-set accuracy cube
- Model-similarity cube
- Predictiveness cube
13Test-Set Accuracy Cube
Given - Data table D - Test set ?
The decision model of USA during Dec 04 has high
accuracy when applied to ?
14Model-Similarity Cube
Given - Data table D - Model h0(X) -
Test set ? w/o labels
The loan decision process in USA during Dec 04
is similar to a discriminatory decision model
15Predictiveness Cube
Given - Data table D - Attributes V -
Test set ? w/o labels
Data table D
Yes No . . No
Yes No . . Yes
Build models
h(X?V)
h(X)
Level Country, Month
Predictiveness of V
Race is an important factor of loan approval
decision in USA during Dec 04
Test set ?
16Outline
- Motivating example
- Definition of prediction cubes
- Efficient prediction cube materialization
- Experimental results
- Conclusion
17Roll Up and Drill Down
18Full Materialization
Full Materialization Table
All, Year
All, All
Country, Year
Country, All
19Bottom-Up Data Cube Computation
Cell Values Numbers of loan applications
20Functions on Sets
- Bottom-up computable functions Functions that
can be computed using only summary information - Distributive function ?(X) F(?(X1), ,
?(Xn)) - X X1 ? ? Xn and Xi ? Xj ??
- E.g., Count(X) Sum(Count(X1), , Count(Xn))
- Algebraic function ?(X) F(G(X1), , G(Xn))
- G(Xi) returns a length-fixed vector of values
- E.g., Avg(X) F(G(X1), , G(Xn))
- G(Xi) Sum(Xi), Count(Xi)
- F(s1, c1, , sn, cn) Sum(si) / Sum(ci)
21Scoring Function
- Conceptually, a machine-learning model h(X S) is
a scoring function Score(y, x S) that gives each
class y a score on test example x - h(x S) argmax y Score(y, x S)
- Score(y, x S) ? p(y x, S)
- S A set of training examples
x
22Bottom-up Score Computation
- Key observations
- Observation 1 Having the scores for each test
example is sufficient to compute the value of a
cell - Details depend on what each cell means (i.e.,
type of prediction cubes) but straightforward - Observation 2 Fixing the class label y and test
example x, Score(y, x S) is a function of the
set S of training examples if it is distributive
or algebraic, the data cube bottom-up technique
can be directly applied
23Algorithm
- Input The dataset D and test set ?
- For each finest-grained cell, which contains data
bi(D) - Build a model on bi(D)
- For each x ? ? and y, compute
- Score(y, x bi(D)), if distributive
- G(y, x bi(D)), if algebraic
- Use standard data cube computation technique to
compute the scores in a bottom-up manner (by
Observation 2) - Compute the cell values using the scores (by
Observation 1)
24Machine-Learning Models
- Naïve Bayes
- Scoring function algebraic
- Kernel-density-based classifier
- Scoring function distributive
- Decision tree, random forest
- Neither distributive, nor algebraic
- PBE Probability-based ensemble (new)
- To make any machine-learning model distributive
- Approximation
25Probability-Based Ensemble
PBE version of decision tree on WA, 85
Decision tree on WA, 85
Decision tree trained on a finest-grained cell
26Outline
- Motivating example
- Definition of prediction cubes
- Efficient prediction cube materialization
- Experimental results
- Conclusion
27Experiments
- Quality of PBE on 8 UCI datasets
- The quality of the PBE version of a model is
slightly worse (0 6) than the quality of the
model trained directly on the whole training
data. - Efficiency of the bottom-up score computation
technique - Case study on demographic data
PBE
vs.
28Efficiency of the Bottom-up Score Computation
- Machine-learning models
- J48 J48 decision tree
- RF Random forest
- NB Naïve Bayes
- KDC Kernel-density-based classifier
- Bottom-up method vs. Exhaustive method
29Synthetic Dataset
- Dimensions Z1, Z2 and Z3.
- Decision rule
Z1 and Z2
Z3
30Efficiency Comparison
Using exhaustive method
Execution Time (sec)
Using bottom-up score computation
of Records
31Conclusion
- Exploratory data analysis paradigm
- Models built on subsets
- Subsets defined by dimension hierarchies
- Meaningful subsets
- Precomputation
- Interactive analysis
32Questions
33Test-Set-Based Model Evaluation
- Given a set-aside test set ? of schema X, Y
- Accuracy of h(X)
- The percentage of ? that are correctly classified
- Similarity between h1(X) and h2(X)
- The percentage of ? that are given the same class
labels by h1(X) and h2(X) - Predictiveness of V ? X (based on h(X))
- The difference between h(X) and h(X?V) measured
by ? i.e., the percentage of ? that are
predicted differently by h(X) and h(X?V)
34Model Accuracy
- Test-set accuracy (TS-accuracy)
- Given a set-aside test set ? with schema X, Y,
- ? The number of examples in ?
- I(?) 1 if ? is true otherwise, I(?) 0
- Alternative Cross-validation accuracy
- This will not be discussed further!!
accuracy(h(X D) ?)
35Model Similarity
- Prediction similarity (or distance)
- Given a set-aside test set ? with schema X
- Similarity between ph1(Y X) and ph2(Y X)
- phi(Y X) Class-probability estimated by hi(X)
similarity(h1(X), h2(X))
distance(h1(X), h2(X)) 1 similarity(h1(X),
h2(X))
KL-distance
36Attribute Predictiveness
- Predictiveness of V ? X (based on h(X))
- PD-predictiveness
- KL-predictiveness
- Alternative
- accuracy(h(X)) accuracy(h(X V))
- This will not be discussed further!!
distance(h(X), h(X V))
KL-distance(h(X), h(X V))
37Target Patterns
- Find subset ?(D) such that h(X ?(D)) has high
prediction accuracy on a test set?? - E.g., The loan decision process in 2003s WI is
similar to a set ? of discriminatory decision
examples - Find subset ?(D) such that h(X ?(D)) is similar
to a given model h0(X) - E.g., The loan decision process in 2003s WI is
similar to a discriminatory decision model h0(X) - Find subset ?(D) such that V is predictive on
?(D) - E.g., Race is an important factor of loan
approval decision in 2003s WI
38Test-Set Accuracy
- We would like to discover
- The loan decision process in 2003s WI is similar
to a set of problematic decision examples - Given
- Data table D The loan decision dataset
- Test set ? The set of problematic decision
examples - Goal
- Find subset ?Loc,Time(D) such that h(X
?Loc,Time(D)) has high prediction accuracy on ?
39Model Similarity
- We would like to discover
- The loan decision process in 2003s WI is similar
to a problematic decision model - Given
- Data table D The loan decision dataset
- Model h0(X) The problematic decision model
- Goal
- Find subset ?Loc,Time(D) such that h(X
?Loc,Time(D)) is similar to h0(X)
40Attribute Predictiveness
- We would like to discover
- Race is an important factor of loan approval
decision in 2003s WI - Given
- Data table D The loan decision dataset
- Attribute V of interest Race
- Goal
- Find subset ?Loc,Time(D) such that h(X
?Loc,Time(D)) is very different to h(X V
?Loc,Time(D))
41Dimension and Level
42Example Full Materialization
All, All
City, Month
43Bottom-Up Score Computation
- Base cells The finest-grained cells in a cube
- Base subsets bi(D) The finest-grained data
subsets - The subset of data records in a base cell is a
base subset - Properties
- D ??i bi(D) and bi(D) ? bj(D) ?
- Any subset ?S(D) of D that corresponds to a cube
cell is the union of some base subsets - Notation
- ?S(D) bi(D) ? bj(D) ? bk(D), where S i, j, k
44Bottom-Up Score Computation
Domain Lattice
Scores Score(y, x ?S(D)) F(Score(y, x
bi(D)) i ? S)
Data subset ?S(D) ??i?S bi(D)
45Decomposable Scoring Function
- Let ?S(D) ?i?S bi(D).
- bi(D) is a base (finest-grained) subset
- Distributively decomposable scoring function
- Score(y, x ?S(D)) F(Score(y, x bi(D)) i ?
S) - F is an distributive aggregate function
- Algebraically decomposable scoring function
- Score(y, x ?S(D)) F(G(y, x bi(D)) i ? S)
- F is an algebraic aggregate function
- G(y, x bi(D)) returns a length-fixed vector of
values
46Probability-Based Ensemble
- Scoring function
- h(y x bi(D)) Model hs estimation of p(y x,
bi(D)) - g(bi x) A model that predicts the probability
that x belongs to base subset bi(D)
47Optimality of PBE
- ScorePBE(y, x ?S(D)) c ? p(y x, x??S(D))
?bi(D)s partitions ?S(D)
48Efficiency Comparison
49Where the Time Spend on
50Accuracy of PBE
- Goal
- To compare PBE with the gold standard
- PBE A set of J48s/RFs each of which is trained
on a small partition of the whole dataset - Gold standard A J48/RF trained on the whole data
- To understand how the number of base classifiers
in a PBE affects the accuracy of the PBE - Datasets
- Eight UCI datasets
51Accuracy of PBE
52Accuracy of PBE
53Accuracy of PBE
Error The average of the absolute difference
between a ground-truth cell value
and a cell value computed by PBE