Web Log Mining EE380L Data Mining - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Web Log Mining EE380L Data Mining

Description:

Web Log Mining EE380L Data Mining – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 19
Provided by: brh1
Category:
Tags: ee380l | data | log | mining | nonjunk | web

less

Transcript and Presenter's Notes

Title: Web Log Mining EE380L Data Mining


1
Web Log MiningEE380L- Data Mining
  • David James
  • Roberto Zuniga
  • Alextair Mascarenhas

2
Uses of Web Log Mining
  • Market research
  • Infer interests and preferences of site visitors
  • Life time value of clients
  • Marketing strategies for products and services
  • Evaluate effectiveness of promotions and
    campaigns
  • Site design
  • Find most logical structure of a web site
  • Dynamically create index pages

3
Successful Web Log Mining
  • Establishing a need for analysis
  • Specifying information and objective requirements
  • Evaluating data sources and design sample
    procedures
  • Analyzing data and apply results

4
Data available
  • Server logs
  • Error logs
  • Cookie logs
  • Query data
  • Web meta data

5
Domain Knowledge
  • Navigation templates
  • Topology networks

ltindex.html offers/gifts.html
purchase.html ? gt lt
offers/reduced.html gt lt
offers/junk.html gt lt
offers/secondhand.html gt
Graphical Representation Textual
Representation
6
Relative Access
  • Absolute accesses might be misleading
  • Additional factors
  • dDepth of the page
  • nNumber of pages at the same depth
  • rNumber of references to page
  • Relative Access ac1dc2n/r
  • link-editing algorithm example

Initial HTML Structure Revised
HTML Structure
7
Page Interest Estimator
  • Non-invasive approach to monitor users behavior
    and evaluate users interest
  • Considers history, bookmarks, links to other
    pages and access logs

8
Techniques in Web Log Mining
  • Association Rules
  • (a priori, is-a hierarchical, )
  • Discovery of sequential patterns
  • (modified a priori -- order is important)
  • Classification and clustering
  • (k-means, birch, )

9
Processing Server Logs
10
Steps in Enterprise Miner
  • Import Data
  • Association Node
  • Sequence Discovery mode
  • requires user_id, time_stamp, page
  • features
  • Transaction window length
  • Consolidate time differences

11
Site Maphttp//www.ece.utexas.edu
LinkBot Pro
12
Site Statistics
13
Site Statistics
14
Access Logs
15
Data Processing
16
Page Frequencies
  • Top 10 most accessed pages

17
Frequency of Nodes at Nth level
18
Session File
Write a Comment
User Comments (0)
About PowerShow.com