Information Retrieval from Data Bases for Decisions - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Information Retrieval from Data Bases for Decisions

Description:

Table of Co-Occurrence of Products. 292. 55. 62. 75. 54 ... Result: the mirror pages. Conclusion. Planning store layout. Bundling products. Offering coupons ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 26
Provided by: sim126
Category:

less

Transcript and Presenter's Notes

Title: Information Retrieval from Data Bases for Decisions


1
Information Retrieval from Data Bases for
Decisions
  • Dr. Gábor SZUCS, Ph.D.
  • Assistant professor
  • BUTE, Department Information and Knowledge
    Management

2
Contents
  • Aims
  • General steps in the procedure
  • Market basket analysis
  • Frequent itemsets
  • Conclusion

3
Aims
  • search hidden coherences in the existing data
    bases (DB)
  • help to take a well grounded decision
  • Data mining techniques are able to find such
    relationships.
  • they provide the ability to optimize
    decision-making
  • they are the most powerful tools for retrieval
    important information

4
Steps of the data mining
  1. Declaration of the key and the predictor
    variables in order to analyse
    (Sampling from a large amount of data)
  2. Modification of variables, where we should
    examine whether some variables should be
    integrated (in large DBs always occur some
    mistakes)
    (some transformations should be executed)

5
Additional steps of the data mining
  1. Modelling, data mining techniques neural
    network, decision tree, regression procedures,
    cluster analysis, factor analysis, discriminant
    analysis, etc.
  2. Comparison the data mining models built on the
    same DB (the best model can be selected).
    The
    procedure can be cyclically repeated. After the
    whole procedure the hidden relationships between
    different aspects can be shown.

6
Market Basket Analysis
  • is used for finding groups of items that tend to
    occur together.
  • The models give the likelihood of different
    products being purchased together.
  • Market basket analysis is useful for
  • items occur together
  • items occur in a particular sequence

7
Table of Co-Occurrence of Products
Product 1 Product 2 Product 3 Product 4 Product 5
Product 1 234 12 0 125 54
Product 2 12 175 65 23 75
Product 3 0 65 229 67 62
Product 4 125 23 67 315 55
Product 5 54 75 62 55 292
8
Procedure of the market basket analysis
  • Choose the right level of the product hierarchy
    for the items.
  • Probabilities and joint probabilities of the
    items are calculated.
  • Determine the association rules.

9
Example
Bicycle (A) 140
Hand tools for bicycle (B) 100
Tool rack (C) 61
Bicycle and hand tool (A B) 50
Bicycle and tool rack (A C) 7
Hand tool and tool rack (B C) 45
Bicycle and hand tool and tool rack (A B C) 5
10
Table of probabilities and joint probabilities of
items
A 14
B 10
C 6,1
A B 5
A C 0,7
B C 4,5
A B C 0,5
11
Association rules
  • The rules (A?B) consist of two parts
  • condition and
  • consequence
  • A confidence can be defined for the rules

12
Example
  • P(A?B) 5 / 14 0.357
  • P((AB)?C) 0.05 / 0.5 0.1
  • P((AC)?B) 0.05 / 0.07 0.714
  • P((BC)?A) 0.05 / 0.45 0.111
  • Is this association rule can help us?
  • If we offer product A for everybody,
    then 14 of the persons will purchase.
  • If A for only B and C,
    then 11 of the people will purchase.

13
Improvement
  • This will help us to decide that the association
    rule is useful or not.

14
In our example
  • Improvement ((BC)?A) 0.111 / 0.14 0.794
  • Improvement ((AB)?C) 0.1 / 0.061 1.639
  • The value of improvement shows the usefulness of
    the analysis
  • improvement gt 1
  • improvement lt 1

15
Dissociation rules
  • similar to association rules
  • count the inverse of the original item, ?
  • modify each transaction
  • A transaction includes an inverse item if, and
    only if, it does not contain the original item.

16
Time series
  • the transactions must have two additional
    features
  • time information (e.g. time sequence or time
    stamp)
  • identifying information (e.g. customer id,
    account number in a bank)

17
Frequent itemsets
  • appear in at least fixed ratio
  • problem
  • a-priori trick
  • If a set of items S is frequent, then every
    subset of S is also frequent.
  • procedure built from lower level to upper level
    (frequent items, frequent pairs, etc.)

18
A-Priori Algorithm
  • Define a threshold for relative frequency. All
    items are examined.
    The set of the frequent items L1.
  • Pairs of items in L1 become the candidate (C2).
  • This is compared with the threshold limit. L2
    contains the frequent pairs.

19
A-Priori Algorithm (cont.)
  • The candidate triples (C3) are those sets A,B,C
    such that all of subset are in L2. L3 will
    contain the frequent triples.
  • Li is the frequent sets of size i,
    Ci1 is the candidate set of size i1
  • until the sets become empty

20
Criticism of A-Priori Algorithm
  • good if we would like to know only the frequent
    pairs
  • at searhing maximal frequent itemsets too
    many steps may be needed
  • physical capacity of computers

21
Market Basket Mining with High Correlation
Analysis
  • The data are organised in a matrix.
  • The cells contain Boolean.
  • 1 yes
  • 0 no
  • This matrix is very sparse.
  • We want to find the highly correlated pairs.

22
Applications of High Correlation Mining
  1. Rows are the document, columns are the words. The
    highly correlated pairs of columns will give the
    words that appear almost together.
  2. Rows and columns are Web pages. The cell contains
    1, if the page of row links to the page of
    column. Result pages about the same topic.
  3. Page of columns links to the page of row. Result
    the mirror pages.

23
Conclusion
  • Planning store layout
  • Bundling products
  • Offering coupons

24
Future
  • Further development
  • hierarchical association rules
  • association rules maintenance
  • sequential pattern mining
  • functional dependency mining

25
Thank you!
  • The flow is open for the discussion.
  • E-mail szucs_at_itm.bme.hu

26
References
  • Fajszi Bulcsú, Cser László Üzleti tudás az
    adatok mélyén Adatbányászat alkalmazói szemmel,
    Budapest, 2004, Budapesti Muszaki és
    Gazdaságtudományi Egyetem, Információ- és
    Tudásmenedzsment Tanszék.
  • Michael J. A. Berry, Gordon Linoff Data Mining
    Techniques For Marketing, Sales, and Customor
    Support, Canada, 1997, John Wiley Sons, Inc.
  • Sam Kash Kachigan Multivariate Statistical
    Analysis, New York, 1991, Radius Press.
  • Ferenc Bodon A fast APRIORI implementation.
  • Agrawal, R., Srikant, R Fast algorithms for
    mining association rules, The International
    Conference on Very Large Databases, 1994, pages
    487-499.
Write a Comment
User Comments (0)
About PowerShow.com