Prediction Cubes - PowerPoint PPT Presentation

About This Presentation

Title:

Prediction Cubes

Description:

Prediction Cubes. Bee-Chung Chen, Lei Chen, Yi Lin and Raghu Ramakrishnan ... The Idea. Build OLAP data cubes in which cell values represent decision/prediction ... – PowerPoint PPT presentation

Number of Views:74

Avg rating:3.0/5.0

Slides: 35

Provided by: CISE9

Learn more at: https://www.cise.ufl.edu

Category:

more less

Transcript and Presenter's Notes

Title: Prediction Cubes

1
Prediction Cubes

Bee-Chung Chen, Lei Chen,
Yi Lin and Raghu Ramakrishnan
University of Wisconsin - Madison

2
Subset Mining

We want to find interesting subsets of the
dataset
Interestingness Defined by the model built on
a subset
Cube space A combination of dimension attribute
values defines a candidate subset (just like
regular OLAP)
We want the measures to represent
decision/prediction behavior
Summarize a subset using the model built on it
Big change from regular OLAP!

3
The Idea

Build OLAP data cubes in which cell values
represent decision/prediction behavior
In effect, build a tree for each cell/region in
the cubeobserve that this is not the same as a
collection of trees used in an ensemble method!
The idea is simple, but it leads to promising
data mining tools
Ultimate objective Exploratory analysis of the
entire space of data mining choices
Choice of algorithms, data conditioning
parameters

4
Example (1/7) Regular OLAP
Z Dimensions
Y Measure
Goal Look for patterns of unusually
high numbers of applications
5
Example (2/7) Regular OLAP
Goal Look for patterns of unusually
high numbers of applications
Z Dimensions
Y Measure
Finer regions
6
Example (3/7) Decision Analysis
Goal Analyze a banks loan decision process
w.r.t. two dimensions Location and Time
Fact table D
Z Dimensions
X Predictors
Y Class
7
Example (3/7) Decision Analysis

Are there branches (and time windows) where
approvals were closely tied to sensitive
attributes (e.g., race)?
Suppose you partitioned the training data by
location and time, chose the partition for a
given branch and time window, and built a
classifier. You could then ask, Are the
predictions of this classifier closely correlated
with race?
Are there branches and times with decision making
reminiscent of 1950s Alabama?
Requires comparison of classifiers trained using
different subsets of data.

8
Example (4/7) Prediction Cubes

Build a model using data from USA in Dec., 1985
Evaluate that model

Measure in a cell
Accuracy of the model
Predictiveness of Race
measured based on that
model
Similarity between that
model and a given model

9
Example (5/7) Model-Similarity
Given - Data table D - Target model h0(X)
- Test set ? w/o labels
The loan decision process in USA during Dec 04
was similar to a discriminatory decision model
10
Example (6/7) Predictiveness
Given - Data table D - Attributes V -
Test set ? w/o labels
Data table D
Yes No . . No
Yes No . . Yes
Build models
h(X?V)
h(X)
Level Country, Month
Predictiveness of V
Race was an important predictor of loan approval
decision in USA during Dec 04
Test set ?
11
Example (7/7) Prediction Cube
Cell value Predictiveness of Race
12
Efficient Computation

Reduce prediction cube computation to data cube
computation
Represent a data-mining model as a distributive
or algebraic (bottom-up computable) aggregate
function, so that data-cube techniques can be
directly applied

13
Bottom-Up Data Cube Computation
Cell Values Numbers of loan applications
14
Functions on Sets

Bottom-up computable functions Functions that
can be computed using only summary information
Distributive function ?(X) F(?(X1), ,
?(Xn))
X X1 ? ? Xn and Xi ? Xj ??
E.g., Count(X) Sum(Count(X1), , Count(Xn))
Algebraic function ?(X) F(G(X1), , G(Xn))
G(Xi) returns a length-fixed vector of values
E.g., Avg(X) F(G(X1), , G(Xn))
G(Xi) Sum(Xi), Count(Xi)
F(s1, c1, , sn, cn) Sum(si) / Sum(ci)

15
Scoring Function

Represent a model as a function of sets
Conceptually, a machine-learning model h(X
?Z(D)) is a scoring function Score(y, x ?Z(D))
that gives each class y a score on test example x
h(x ?Z(D)) argmax y Score(y, x ?Z(D))
Score(y, x ?Z(D)) ? p(y x, ?Z(D))
?Z(D) The set of training examples (a cube
subset of D)

16
Bottom-up Score Computation

Key observations
Observation 1 Score(y, x ?Z(D)) is a function
of cube subset ?Z(D) if it is distributive or
algebraic, the data cube bottom-up technique can
be directly applied
Observation 2 Having the scores for all the test
examples and all the cells is sufficient to
compute a prediction cube
Scores ?? predictions ?? cell values
Details depend on what each cell means (i.e.,
type of prediction cubes) but straightforward

17
Machine-Learning Models

Naïve Bayes
Scoring function algebraic
Kernel-density-based classifier
Scoring function distributive
Decision tree, random forest
Neither distributive, nor algebraic
PBE Probability-based ensemble (new)
To make any machine-learning model distributive
Approximation

18
Probability-Based Ensemble
PBE version of decision tree on WA, 85
Decision tree on WA, 85
Decision trees built on the lowest-level cells
19
Probability-Based Ensemble

Scoring function
h(y x bi(D)) Model hs estimation of p(y x,
bi(D))
g(bi x) A model that predicts the probability
that x belongs to base subset bi(D)

20
Outline

Motivating example
Definition of prediction cubes
Efficient prediction cube materialization
Experimental results
Conclusion

21
Experiments

Quality of PBE on 8 UCI datasets
The quality of the PBE version of a model is
slightly worse (0 6) than the quality of the
model trained directly on the whole training
data.
Efficiency of the bottom-up score computation
technique
Case study on demographic data

PBE
vs.
22
Efficiency of Bottom-up Score Computation

Machine-learning models
J48 J48 decision tree
RF Random forest
NB Naïve Bayes
KDC Kernel-density-based classifier
Bottom-up method vs. Exhaustive method

? PBE-J48
PBE-RF
NB
KDC

? J48ex
RFex
NBex
KDCex

23
Synthetic Dataset

Dimensions Z1, Z2 and Z3.
Decision rule

Z1 and Z2
Z3
24
Efficiency Comparison
Using exhaustive method
Execution Time (sec)
Using bottom-up score computation
of Records
25
Related Work Building models on OLAP Results

Multi-dimensional regression Chen, VLDB 02
Goal Detect changes of trends
Build linear regression models for cube cells
Step-by-step regression in stream cubes Liu,
PAKDD 03
Loglinear-based quasi cubes Barbara, J. IIS 01
Use loglinear model to approximately compress
dense regions of a data cube
NetCube Margaritis, VLDB 01
Build Bayes Net on the entire dataset of
approximate answer count queries

26
Related Work (Contd.)

Cubegrades Imielinski, J. DMKD 02
Extend cubes with ideas from association rules
How does the measure change when we rollup or
drill down?
Constrained gradients Dong, VLDB 01
Find pairs of similar cell characteristics
associated with big changes in measure
User-cognizant multidimensional analysis
Sarawagi, VLDBJ 01
Help users find the most informative unvisited
regions in a data cube using max entropy
principle
Multi-Structural DBs Fagin et al., PODS 05, VLDB
05

27
Take-Home Messages

Promising exploratory data analysis paradigm
Can use models to identify interesting subsets
Concentrate only on subsets in cube space
Those are meaningful subsets, tractable
Precompute results and provide the users with an
interactive tool
A simple way to plug something into cube-style
analysis
Try to describe/approximate something by a
distributive or algebraic function

28
Big Picture

Why stop with decision behavior? Can apply to
other kinds of analyses too
Why stop at browsing? Can mine prediction cubes
in their own right
Exploratory analysis of mining space
Dimension attributes can be parameters related to
algorithm, data conditioning, etc.
Tractable evaluation is a challenge
Large number of dimensions, real-valued
dimension attributes, difficulties in
compositional evaluation
Active learning for experiment design, extending
compositional methods

29
Community Information Management (CIM)
UI
Anhai Doan University of Illinois at
Urbana-Champaign Raghu Ramakrishnan University
of Wisconsin-Madison
30
Structured Web-Queries
UI

Example Queries
How many alumni are top-10 faculty members?
Wisconsin does very well, by the way
Find trends in publications
By topic, by conference, by alumni of schools
Change tracking
Alert me if my co-authors publish new papers or
move to new jobs
Information is extracted from text sources on the
web, then queried

31
Key Ideas
UI

Communities are ideally scoped chunks of the web
for which to build enhanced portals
Relative uniformity in content, interests
Can exploit people power via mass
collaboration, to augment extraction
CIM platform Facilitate collaborative creation
and maintenance of community portals
Extraction management
Uncertainty, provenance, maintenance,
compositional inference for refining extracted
information
Mass collaboration for extraction and integration

Watch for new DBWorld!
32
Challenges
UI

User Interaction
Declarative specification of background knowledge
and user feedback
Intelligent prompting for user input
Explanation of results

33
Challenges
UI

Extraction and Query Plans
Starting from user input (ER schema, hints) and
background knowledge (e.g., standard types,
look-up tables), compile a query into an
execution plan
Must cover extraction, storage and indexing, and
relational processing
And maintenance!
Algebra to represent such plans? Query optimizer?
Handling uncertainty, constraints, conflicts,
multiple related sources, ranking, modular
architecture

34
Challenges
UI