Web%20Usage%20Mining%20(Clickstream%20Analysis) - PowerPoint PPT Presentation

About This Presentation
Title:

Web%20Usage%20Mining%20(Clickstream%20Analysis)

Description:

sc = server-to-client actions. Analog Web Log File Analyser. Gives basic statistics such as ... Detecting visits from crawlers as opposed to human visitors. ... – PowerPoint PPT presentation

Number of Views:269
Avg rating:3.0/5.0
Slides: 20
Provided by: markl78
Category:

less

Transcript and Presenter's Notes

Title: Web%20Usage%20Mining%20(Clickstream%20Analysis)


1
Web Usage Mining(Clickstream Analysis)
  • Mark Levene
  • (Follow the links to learn more!)

2
Reminder - W3C Extended Log File Format
3
Analog Web Log File Analyser
  • Gives basic statistics such as
  • number of hits
  • average hits per time period
  • what are the popular pages in your site
  • who is visiting your site
  • what keywords are users searching for to get to
    you
  • what is being downloaded
  • Log data does not disclose the visitors identity
  • What do analogs reports mean?
  • Report for www.dcs.bbk.ac.uk/mark

4
Applications of Usage Mining
  • Pre-fetching and caching web pages
  • eCommerce and clickstream analysis
  • Web site reorganisation
  • Personalisation
  • Recommendation of links and products

5
Identification of User
  • By IP address
  • Not so reliable as IP can be dynamic
  • Different users may use same IP
  • Through cookies
  • Reliable but user may remove cookies
  • Security and privacy issues
  • Through login
  • Users have to register

6
Sessionising
  • Time oriented (robust)
  • By total duration of session
  • not more than 30 minutes
  • By page stay times (good for short sessions)
  • not more than 10 minutes per page
  • Navigation oriented (good for short sessions and
    when timestamps unreliable)
  • Referrer is previous page in session, or
  • Referrer is undefined but request within 10 secs,
    or
  • Link from previous to current page in web site

7
Mining Navigation Patterns
  • Each session induces a user trail through the
    site
  • A trail is a sequence of web pages followed by a
    user during a session, ordered by time of access.
  • A pattern in this context is a frequent trail.
  • Co-occurrence of web pages is important, e.g.
    shopping-basket and checkout.
  • Use a Markov chain model.

8
Trails inferred from Log data(Each session
results in a trail)
ID Trail
1 A1 gt A2 gt A3
2 A1 gt A2 gt A3
3 A1 gt A2 gt A3 gt A4
4 A5 gt A2 gt A4
5 A5 gt A2 gt A4 gt A6
6 A5 gt A2 gt A3 gt A6
9
Construct Markov Chain from Data
  • Add a unique start state.
  • the start state has a transition to all visited
    web pages in the site.
  • Add a unique final state.
  • the last page in each trail has a transition to
    the final state.
  • The transition probabilities are obtained from
    counting click-throughs.
  • The Markov chain built is called absorbing since
    we always end up in the final state.

10
The Markov Chain from the Data
11
Support and Confidence
  • Support s in 0,1) accept only trails whose
    initial probability is above s.
  • Setting support to be above the average
    click-through is reasonable.
  • Confidence c in 0,1) accept only trails whose
    probability is above c.
  • The probability of a trail is obtained by
    multiplying the transition probabilities of the
    links in the trail.

12
Mining Frequent Trails
  • Find all trails whose initial probability is
    higher than s, and whose trail probability is
    above c.
  • Use depth-first search on the Markov chain to
    compute the trails.
  • The average time needed to find the frequent
    trails is proportional to the number of web pages
    in the site.

13
Frequent Trails Support 0.1 and Confidence
0.3
Trail Probability
A1 gt A2 gt A3 0.67
A5 gt A2 gt A3 0.67
A2 gt A3 0.67
A1 gt A2 gt A4 0.33
A5 gt A2 gt A4 0.33
A2 gt A4 0.33
A4 gt A6 0.33
14
Frequent Trails Support 0.1 and Confidence
0.5
Trail Probability
A1 gt A2 gt A3 0.67
A5 gt A2 gt A3 0.67
A2 gt A3 0.67
15
Content Mining
  • Incorporate the categories that users are
    navigating through so we may better understand
    their activities.
  • E.g. what type of book is the user interested in
    this may be used for recommendation.
  • Classify users according to behaviour.
  • Is the users intent to browse, search or buy?
  • Cluster users with common interests.

16
Pre-fetching and Caching Pages
  • Learn access patterns to predict future accesses.
  • Pre-fetch predicted pages to reduce latency.
  • Can use Markov model and base the prediction on
    history of access.
  • Also cache results of popular search engine
    queries.

17
ECommerce Click stream Analysis
  • What is the users intention browse, search or
    buy?
  • Measure time spent on site - site stickiness
  • Repeat visits it has been shown that repeat
    visitors spend less time on the site can be
    explained by learning.
  • Measure visit-to-purchase conversion ratio, and
    predict purchase likelihood.

18
Supplementary Analyses to Improve eCommerce Web
Sites
  • Detecting visits from crawlers as opposed to
    human visitors.
  • Form error analysis, e.g. login errors, mandatory
    fields not filled, incorrect format.
  • When and why do people exit the site, e.g.
    visitor puts item in cart but exists before
    reaching the checkout.
  • Analysis of local search engine logs correlate
    with site behaviour.
  • Product recommendations based on association
    rules (people who bought x also bought y).
  • Geographic analysis where are the customers?
  • Demographic analysis who are the customers?

19
Adaptive web sites
  • Modify the web site according to user access.
  • Automatic synthesis of index pages (hubs that
    contain links on a specific topic)
  • Based on a clustering algorithm that uses the
    co-occurrence frequencies of pages from the log
    data.
  • Finds a concept that best describes each cluster.
Write a Comment
User Comments (0)
About PowerShow.com