Title: Presentation: Pedro Gabriel Ferreira
1A Hybrid Methdod for Discovering Distance-Enhanced
Inter-Transactional Rules
JISBD'2005 X Jornadas sobre Ingeniería del
Software y Bases de Datos, 16 of September 2005
Presentation Pedro Gabriel Ferreira pedrogabriel_at_
di.uminho.pt Team P. Ferreira, R. Alves, P.
Azevedo, O. Belo Dep. Informatics - University of
Minho
2Outline
- Introduction
- Motivation
- Method
- Results
- References
2
3Introduction Association Rules
Rules are in the form X ? Y X and Y are sets of
items. Implication means co-occurrence, not
causality! Typical Example Market-Basket
Example of Association Rules
Diaper ? Beer (3)
Milk, Bread ? Diaper (2)Beer, Bread ?
Milk (1)
3
4Introduction Interest Measures
- Rules are in the form X ? Y
- Rule Evaluation Metrics
- Support (s) Fraction of transactions that
contain both X and Y - Confidence (c) Measures how often items in Y
appear in transactions that contain X - s(X
?Y)/s(X) - Other Measures Conviction, Lift, Leverage,
Chi-Square, Statistical tests, - Algorithms Apriori, Eclat, FP-Growth, Closet,
MaxMiner, DCI, DIC, Mafia.... See the FIMI
Workshop Page!!!
4
5Motivation Inter-transactional Patterns
Classical association rules are by nature
intra-transaction based! Contexts associated with
those transactions typically are ignored.
Contexts can be time, location, distance, This
prevents Inter-Transactional patterns to be
discovered!
Example Rule (1,2) ? (4, 5) ? (8, 9)
5
6Motivation Inter-transactional Patterns
Typical Example Stock market databases A0 gt B1
? C4 (X) if company A goes up (day0), company B
goes down (day1) then with probability of X
company C will go down (day4) Algorithms
Proposed for Inter-transaction Mining EH-Apriori
1 and FITI 2! Problem They are too rigid in
the discovered rules! Example If company B goes
down sometimes at day 1 (A0 gt B1 ? C4) (X/2)
and other times at day 2 (A0 gt B2 ? C4) (X/2),
for a support of X the above rule is not
reported!!!
6
7Motivation Inter-transactional Patterns
Our proposal make rules syntax more
flexible! Example if company A goes up in one
day, company B goes down in a subsequent day,
then with a probability of X company C goes down
after B and the mean distance between A and C is
µ and the standard deviation is s Applications
princing strategies in retail market, effect of
promotions in travel agencies, stock market
databases, weather forecast,...
7
8Method
- To achieve the proposed goal, we combine
association and sequence mining algorithms to
obtain frequent sequences of items that occur
within a specified time window, W. - The method consists in three steps
- Database Transformation
- Sequence Conversion and Mining
- Sequence Rule Extraction
8
9Method Database Transformation
Each database transaction is decomposed in all
its subsets. Other criteria can be used
depending on the domain of application! The
original database T is transformed in a database
T
9
10Method Sequence Conversion and Mining
- In this phase two steps are performed
- T is converted in a database of sequences P
- P is mined in order to obtain sequence patterns
- (S s1 s2 sn, where si is a ordered list
of items and is wild card symbol matches any
zero or more items) - Step 1
- Step 2 Given a sequence minimum support, a
window size, apply a sequence mining algorithm to
obtain all the - frequent sequence patterns.
10
11Method Sequence Rule Extraction
- Two steps
- Filter Sequence Patterns
- Generate Rules
- Step 1 Filter out sequence patterns that do not
fulfil user constraints. Apply a measure of
variance, cvd s/µ. - Only patterns below a user defined cvd threshold
are accepted. This eliminates highly deviating
patterns. - Step 2From the filtered patterns, generate rules
in the form X?Y, that fulfil the measures of
confidence, lift.
11
12Results
Meaning of parameters in database generation and
characteristics of tested databases
Distribution of cvd and Confidence measure for
sequence patterns and rules DS50K
12
13Conclusions
We propose a method to extract inter-transactional
patterns in dimensional databases. The method
combines Association and sequence mining. When
compared with the offset based pattern
description 1, 2 the proposed patterns present
a more flexible but accurate description of the
dimensional behaviour.
13
14References
1 - H.J. Lu, L. Feng, J.W. Han, Beyond
Intra-Transaction Association Analysis Mining
Multi-Dimensional Inter-Transaction Association
Rules, ACM Transactions on Information Systems,
2000, vol. 18, no. 4, pp.423-454. 2 - Anthony
K. H. Tung, Hongjun Lu, Jiawei Han, Ling Feng,
Efficient Mining of Intertransaction Association
Rules, IEEE Transactions on Knowledge and Data
Engineering, Volume 15 , Issue 1, January 2003,
pp.43--56. Others Ramakrishnan Srikant and
Rakesh Agrawal, Fast Algorithms forMining
Association Rules, Proc. 20th VLDB, Morgan
Kaufmann, 12--15 1994, pp.487--499. Rakesh
Agrawal and Ramakrishnan Srikant, Mining
sequential patterns, Eleventh International
Conference on Data Engineering , IEEE Computer
Society Press, 1995, pp.3--14.
14