Information Retrieval from Data Bases for Decisions - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

Information Retrieval from Data Bases for Decisions

Description:

Table of Co-Occurrence of Products. 292. 55. 62. 75. 54 ... Result: the mirror pages. Conclusion. Planning store layout. Bundling products. Offering coupons ... – PowerPoint PPT presentation

Number of Views:30

Avg rating:3.0/5.0

Slides: 26

Provided by: sim126

Category:

more less

Transcript and Presenter's Notes

Title: Information Retrieval from Data Bases for Decisions

1
Information Retrieval from Data Bases for
Decisions

Dr. Gábor SZUCS, Ph.D.
Assistant professor
BUTE, Department Information and Knowledge
Management

2
Contents

Aims
General steps in the procedure
Market basket analysis
Frequent itemsets
Conclusion

3
Aims

search hidden coherences in the existing data
bases (DB)
help to take a well grounded decision
Data mining techniques are able to find such
relationships.
they provide the ability to optimize
decision-making
they are the most powerful tools for retrieval
important information

4
Steps of the data mining

Declaration of the key and the predictor
variables in order to analyse
(Sampling from a large amount of data)
Modification of variables, where we should
examine whether some variables should be
integrated (in large DBs always occur some
mistakes)
(some transformations should be executed)

5
Additional steps of the data mining

Modelling, data mining techniques neural
network, decision tree, regression procedures,
cluster analysis, factor analysis, discriminant
analysis, etc.
Comparison the data mining models built on the
same DB (the best model can be selected).
The
procedure can be cyclically repeated. After the
whole procedure the hidden relationships between
different aspects can be shown.

6
Market Basket Analysis

is used for finding groups of items that tend to
occur together.
The models give the likelihood of different
products being purchased together.
Market basket analysis is useful for
items occur together
items occur in a particular sequence

7
Table of Co-Occurrence of Products
Product 1 Product 2 Product 3 Product 4 Product 5
Product 1 234 12 0 125 54
Product 2 12 175 65 23 75
Product 3 0 65 229 67 62
Product 4 125 23 67 315 55
Product 5 54 75 62 55 292
8
Procedure of the market basket analysis

Choose the right level of the product hierarchy
for the items.
Probabilities and joint probabilities of the
items are calculated.
Determine the association rules.

9
Example
Bicycle (A) 140
Hand tools for bicycle (B) 100
Tool rack (C) 61
Bicycle and hand tool (A B) 50
Bicycle and tool rack (A C) 7
Hand tool and tool rack (B C) 45
Bicycle and hand tool and tool rack (A B C) 5
10
Table of probabilities and joint probabilities of
items
A 14
B 10
C 6,1
A B 5
A C 0,7
B C 4,5
A B C 0,5
11
Association rules

The rules (A?B) consist of two parts
condition and
consequence
A confidence can be defined for the rules

12
Example

P(A?B) 5 / 14 0.357
P((AB)?C) 0.05 / 0.5 0.1
P((AC)?B) 0.05 / 0.07 0.714
P((BC)?A) 0.05 / 0.45 0.111
Is this association rule can help us?
If we offer product A for everybody,
then 14 of the persons will purchase.
If A for only B and C,
then 11 of the people will purchase.

13
Improvement

This will help us to decide that the association
rule is useful or not.

14
In our example

Improvement ((BC)?A) 0.111 / 0.14 0.794
Improvement ((AB)?C) 0.1 / 0.061 1.639
The value of improvement shows the usefulness of
the analysis
improvement gt 1
improvement lt 1

15
Dissociation rules

similar to association rules
count the inverse of the original item, ?
modify each transaction
A transaction includes an inverse item if, and
only if, it does not contain the original item.

16
Time series

the transactions must have two additional
features
time information (e.g. time sequence or time
stamp)
identifying information (e.g. customer id,
account number in a bank)

17
Frequent itemsets

appear in at least fixed ratio
problem
a-priori trick
If a set of items S is frequent, then every
subset of S is also frequent.
procedure built from lower level to upper level
(frequent items, frequent pairs, etc.)

18
A-Priori Algorithm

Define a threshold for relative frequency. All
items are examined.
The set of the frequent items L1.
Pairs of items in L1 become the candidate (C2).
This is compared with the threshold limit. L2
contains the frequent pairs.

19
A-Priori Algorithm (cont.)

The candidate triples (C3) are those sets A,B,C
such that all of subset are in L2. L3 will
contain the frequent triples.
Li is the frequent sets of size i,
Ci1 is the candidate set of size i1
until the sets become empty

20
Criticism of A-Priori Algorithm

good if we would like to know only the frequent
pairs
at searhing maximal frequent itemsets too
many steps may be needed
physical capacity of computers

21
Market Basket Mining with High Correlation
Analysis

The data are organised in a matrix.
The cells contain Boolean.
1 yes
0 no
This matrix is very sparse.
We want to find the highly correlated pairs.

22
Applications of High Correlation Mining

Rows are the document, columns are the words. The
highly correlated pairs of columns will give the
words that appear almost together.
Rows and columns are Web pages. The cell contains
1, if the page of row links to the page of
column. Result pages about the same topic.
Page of columns links to the page of row. Result
the mirror pages.

23
Conclusion

Planning store layout
Bundling products
Offering coupons

24
Future

Further development
hierarchical association rules
association rules maintenance
sequential pattern mining
functional dependency mining

25
Thank you!

The flow is open for the discussion.
E-mail szucs_at_itm.bme.hu

26
References

Fajszi Bulcsú, Cser László Üzleti tudás az
adatok mélyén Adatbányászat alkalmazói szemmel,
Budapest, 2004, Budapesti Muszaki és
Gazdaságtudományi Egyetem, Információ- és
Tudásmenedzsment Tanszék.
Michael J. A. Berry, Gordon Linoff Data Mining
Techniques For Marketing, Sales, and Customor
Support, Canada, 1997, John Wiley Sons, Inc.
Sam Kash Kachigan Multivariate Statistical
Analysis, New York, 1991, Radius Press.
Ferenc Bodon A fast APRIORI implementation.
Agrawal, R., Srikant, R Fast algorithms for
mining association rules, The International
Conference on Very Large Databases, 1994, pages
487-499.