Efficient and Effective Itemset Pattern Summarization: Regressionbased Approaches

About This Presentation

Title:

Efficient and Effective Itemset Pattern Summarization: Regressionbased Approaches

Description:

Efficient and Effective Itemset Pattern Summarization: Regression-based Approaches ... We use 2-norm in this study. Probabilistic Restoration Function ... – PowerPoint PPT presentation

Number of Views:57

Avg rating:3.0/5.0

Slides: 20

Provided by: csK4

Learn more at: https://www.cs.kent.edu

Category:

more less

Transcript and Presenter's Notes

Title: Efficient and Effective Itemset Pattern Summarization: Regressionbased Approaches

1
Efficient and Effective Itemset Pattern
Summarization Regression-based Approaches

Ruoming Jin
Kent State University
Joint work with Muad Abu-Ata, Yang Xiang, and
Ning Ruan (KSU)

2
Problem Definition

Given a large collection of frequent itemsets and
their supports, how we can concisely represent
them?
Coverage criterion
The Spanning Set Approach F. Arati, A. Gionis,
Mannila, Approximating a collection of frequent
sets, KDD04.
Frequency criterion
The Profile-based Approach X. Yan, H. Cheng, J.
Han, and D. Xin, Summarizing itemset patterns, a
profile-based approach, KDD05.
The Markov Random Field Approach C. Wang and S.
Parthasarathy, Summarizing itemset patterns using
probabilistic models, KDD06.

3
Frequency Criterion

The restoration function of a set of itemsets S
is a function
The restoration error
We use 2-norm in this study.

4
Probabilistic Restoration Function

Applying the independence probabilistic model for
a set of itemsets S
An example,

5
Problem 1 Optimal Parameters
What are the optimal parameters,
p(S),p(a),p(c),p(d), minimizing the restoration
error
6
Non-Linear Regression

We introduce the independent variable
We have S data points.

7
Linear Regression Approximation
Using Taylor expansion, we show the restoration
error from linear regression is very close to
the error by using the non-linear regression!
8
Problem 2 Optimal Partition

To reduce the restoration error, we adopt the
partition strategy
Partition the entire collection of frequent
itemsets into K disjoint subsets, and build the
restoration function for each subset
How to optimally partition a set of itemsets into
K disjoint subsets so that the total restoration
error can be minimized?

9
Our Approaches

NP-hard problem
Two heuristic algorithms
K-Regression
Tree Regression

10
K-Regression

A k-means type clustering procedure
Random partition the set of itemsets S into K
partition
Regression Step Apply regression to find the
optimal parameters on each partition
Re-assignment Step For each itemset, assign it
to the partition which minimizes its restoration
error based on the optimal parameters discovered
by Step 2
Repeat 2 and 3 until the total restoration error
does not increase or the improvement is small
Just as k-means, k-regression is guaranteed
to converge!

11
Tree Regression
Sa,b,c,d,a,b,a,c,b,c,a,d,c,d,
a,b,c,a,b,d,a,c,d
Using Regression to find optimal parameters for
each subset of itemsets
12
Tree Regression Construction

A Decision-type of construction algorithm
Question 1 How to find K subsets of itemsets?
Question 2 How to find the optimal splitting?
Answer for Q1
Maintain a queue for the current leaf node, and
always pick up the leaf nodes with the maximal
average restoration error to split
Answer for Q2
Maximally reduce the total restoration error
Min E(S)-E(S_1)-E(S_2)

13
An Interesting Connection

Jerome H. Friedmans 1977 paper, A
tree-structured Approach to nonparametric
multiple regression.
Unfortunately, this work seems never got enough
attention. However, it seems part of the
inspiration for the CART (regression tree) and
MARS (Multivariate Adaptive Regression Spline).

14
Experimental Results
15
Chess Restoration Error
16
BMS-POS Restoration Error
17
BMS-POS Running Time
18
Conclusion

Using linear regression to identify optimal
parameters of the probabilistic restoration
function (based on the independence assumption)
for a set of itemsets
Two algorithms to optimally partition the set of
itemsets into K parts
K-regression
Tree regression

19
Thanks!!

Write a Comment

User Comments (0)