Algorithms for Webpage Traversal Pattern Mining - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Algorithms for Webpage Traversal Pattern Mining

Description:

Two algorithms for the two key steps in webpage traversal pattern mining. ... The full-scan algorithm essentially utilizes the concept of hashing and pruning ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 15
Provided by: publi3
Category:

less

Transcript and Presenter's Notes

Title: Algorithms for Webpage Traversal Pattern Mining


1
Algorithms for Webpage Traversal Pattern Mining
  • Spring 2002
  • CSE791 Final Project
  • by Dalei Xing

2
Highlights
  • What is webpage traversal pattern mining and
  • its motivations.
  • Problem description.
  • Two algorithms for the two key steps in webpage
    traversal pattern mining.
  • Finding Maximal Forward References (the MF
    algorithm).
  • Finding Frequent Reference Sequences (the FS
    algorithm).
  • Demo of a simple MF algorithm implementation.

3
What is webpage traversal pattern mining?
  • Association rule mining for people's internet
    browsing activities.
  • In the WWW environment, users access information
    of interest and travel from one object to another
    via the corresponding facilities provided. We
    want to capture the regulations of user access
    patterns.

4
Motivations
  • WWWs ubiquity increasing
  • Complexity of websites increasing
  • Service providers and online business want to
    track user browsing habits to better their
    services and get more profits. For example
  • More efficient access between highly correlated
    webpages
  • Better customer classification and behavior
    analysis.

5
Problem Description
  • Finding maximal forward references
  • Data preprocessing.
  • There are things we do not need.
  • Generate a transactional database.
  • Determine Frequent Reference

6
Some Terms
Traversal Paths A,B,C,D,E,D,C,B,F,A,G,H,G,I,J,I,
K, Maximal Forward References ABCDE, ABF,
AGH,AGIJ,AGIK
  • Traversal path and
  • Maximal Forward Reference

7
Some Terms (Continued)
  • Frequent reference sequence
  • A frequent reference sequence is a reference
    sequence that occurred more than or equal to the
    number of times decided by the minimal support.
    (Remember Frequent Itemset in the textbook)

8
Algorithm MF
  • Raw data input comes from log files, which can be
    found in client-level, proxy-level or server
    level.
  • MF is applied on the information of each user to
    find Maximal Forward References

9
Algorithm MF (Continued)
Input A,B,C,D,E,D,C,B,F,A,G,H,G,I,J,I,K,
Output ABCDE, ABF, AGH,AGIJ,AGIK
10
Algorithm MF (Continued)
  • Step 1 Initialization
  • Set S null and the forward Flag true. Start
    scanning database
  • Step 2 Forward References
  • Set flag to true, keep writing the forward visits
    to S
  • Step 3 Backward Reference happens
  • Set flag to false. Write the maximal reference to
    result database and discard the backward nodes in
    S, until a new node is encountered then go to 2
  • gt Steps 2 and 3 are iterated until the end of log
    is found

11
Finding Frequent k-references Sequences
  • The Idea of Full-Scan(FS) algorithm
  • The full-scan algorithm essentially utilizes the
    concept of hashing and pruning while solving the
    discrepancy between traversal patterns and
    association rules. In the FS algorithm, although
    trimming the transaction database as it proceeds
    to later passes, it requires the scan of the
    database in each pass.

12
MF Implementation Demo
13
Conclusion
  • Hopefully, the following key points are conveyed
  • gt Webpage access pattern mining is an
  • special kind of association rule
    mining.
  • gt Key steps are finding maximal forward
  • references and frequent k-reference
  • sequences.

14
References
1 Ming-Syan Chen, Jong Soo Park, Philips S. Yu.
Data Mining for Path Traversal Patterns in a Web
Environment (1995)  2 Ming-Syan Chen, Jong Soo
Park, Philips S. Yu. An Effective Hash-Based
Algorithm for Mining Association Rules
(1996)  3 Behzad Mortazavi-Asl. Discovering and
Mining User Web-page Traversal Patterns (1999)
Questions?
Write a Comment
User Comments (0)
About PowerShow.com