Title: Web Usage Mining for Website Design Improvement
1Web Usage Mining for Website Design Improvement
- I-Hsien (Derrick) Ting
- Department of Computer Science
- The University of York
- 3/November/2006
2Outline
- 1. Introduction
- The KDD Process for Website Design Improvement
- 2. The Step 1 of the KDD Process
- 3. The Step 2 of the KDD Process
- 4. The Step 3 and Step 4 of the KDD Process
- 5. Closing the KDD Loop
- 6. An empirical Study
- 7. Conclusion
2
31. Introduction (1)
- The concept behind the research project
41. Introduction Cont.
- Website design in an important success factor for
a website - Understanding the browsing behaviour of users is
a good way for improving a websites design - Web Usage Mining is a technique that can help to
understand users browsing behaviour (Kohavi 2001)
Figure An example of website design improvement
(Kohavi 2003)
51. Introduction Cont.
- The KDD (Knowledge Discovery in Database) Process
for Website Design Improvement
Figure The KDD Process for Web Site Design
Improvement (Lee et al. 2001)
61. Introduction Cont.
- The aims of the research
- Developing and using web usage mining techniques
- To understand the browsing behaviour of users
- To improve websites design
- Closing the KDD loop
- Few researchers focus on how to close the KDD
loop and the loop has never been closed (Kohavi
2001, Ansari 2001)
72. The Step 1 of the KDD Process-Data Collection
- The Clickstream data format
- 83.151.206.241 - - 04/Dec/2004181035 0000
"GET /storedetail-2-product_id-305054 HTTP/1.1"
200 31898 "http//www.google.com/search"
"Mozilla/4.0 (compatible MSIE 6.0 Windows 98)"
82. The Step 1 of the KDD Process Cont.-Data
Pre-processing
- Raw Clickstream data usually full of noises,
incomplete and unnecessary data - Common Data Pre-processing Process
Figure A Common Data Pre-Processing Process
(Cooley et al., 1999 Eirinaki et al., 2003)
8
92. The Step 1 of the KDD Process Cont.-Data
Pre-processing (Novel Contribution)
- Problems of the common data pre-processing
process - Many accesses are made by Bot
- Data lost due to caching
- Backward browsing
- The modified data pre-processing process
Figure The Modified data pre-processing process
9
102. The Step 1 of the KDD Process Cont.-Data
Pre-processing
- Bot detection and cleaning
- A Bot list
- The behaviour of Bot
- The user requests robots.txt file
- Request different web pages at the same time
- Low percentage of the Clickstream data with
referrer
112. The Step 1 of the KDD Process Cont.-Data
Pre-processing
- The PRM Algorithm
- It can help to reconstruct the lost Clickstream
data and incomplete users browsing path - The Algorithm
- Checking the referrer information
- Checking the websites structure
Server-side Clickstream data Page1.htm Page2.htm
Page3.htm
Client-side Clickstream data Page1.htm Page2.htm
Page1.htm Page3.htm
Cooley, R. et al. 1999
11
122. The Step 1 of the KDD Process Cont.-Data
Pre-processing
- An example of the PRM algorithm
() 61.59.121.221, 160249, http//www-users.cs.y
ork.ac.uk/kimble/, -, Restored (1)
61.59.121.221, 160254, http//www-users.cs.york.
ac.uk/kimble/research/research.html,
http//www-users.cs.york.ac.uk/kimble/ (2)
61.59.121.221, 160300, http//www-users.cs.york.
ac.uk/kimble/teaching/teach.html,
http//www-users.cs.york.ac.uk/kimble/
() 61.59.121.221, 160249, http//www-users.cs.y
ork.ac.uk/kimble/, -, Restored (1)
61.59.121.221, 160254, http//www-users.cs.york.
ac.uk/kimble/research/research.html,
http//www-users.cs.york.ac.uk/kimble/,
Original () 61.59.121.221, 160257,
http//www-users.cs.york.ac.uk/kimble, -,
Restored (2) 61.59.121.221, 160300,
http//www-users.cs.york.ac.uk/kimble/teaching/te
ach.html, http//www-users.cs.york.ac.uk/kimble/
, Original
133. The Step 2 of the KDD Process-Pattern
Discovery and Analysis
- Web usage mining techniques
- Basic statistical method (Srivastava et al.,
2000) - The web page index.htm has been viewed average 20
times per week - Clustering Classification
- Grouping users who have similar browsing
behaviour - Association Rule Mining
- The user who view index.htm and also view
product.htm, the support0.5 the confidence0.6 - Sequential Mining
- 30 users browsing behaviour follow the
sequential pattern web page A, web page B then
web page C
143. The Step 2 of the KDD Process Cont.- Pattern
Discovery and Analysis
- Novel Web Usage Mining Techniques
- Footstep Graph
- A Clickstream data visualisation tool
- APD (Automatic Pattern Discovery) Method
- Discovering some pre-identified patterns
automatically - Distance-based Association Rule Mining
- Distance The third measurement of Association
rule Mining
152. The Step 2 of the KDD Process Cont.- Pattern
Discovery and Analysis
- A Visualisation Tool Footstep Graph
R O U T E
Time
A.htm?B.htm?C.htm?D.htm?C.htm?E.htm
15
163. The Step 2 of the KDD Process Cont.-APD
Method Some Interesting Patterns
Mountain Pattern
Upstairs Pattern
Index?produc1?product2?shopping cart?checkout
Index?product_index?product1?product1_price?produc
t1_price?product1?product_index
Valley Pattern
Fingers Pattern
Index?product1?index?product2?index?product3
Index?product1?product2?product3?index?product4
173. The Step 2 of the KDD Process Cont. The APD
Method
- An automatic way to discover pre-identified
patterns - Users browsing route transformation
- Transforming users browsing route to number-based
sequence
Users Browsing Route0,10,0,20,0,30,0,40,0
183. The Step 2 of the KDD Process Cont.-The APD
Method
- Level-1 and Level-2 users browsing elements
- Level-1 elements
- Browsing Trend
- Same 0, 0 1, 1
- Up 1, 2
- Down 2,1 7, 0
- Level-2 elements
- Turning Point
- Peak Up, Down
- Trough Down Up
193. The Step 2 of the KDD Process Cont.-The APD
Method
0 1 2 0 1 3 4 0 5 0 6 0 7
0 1 0 7
203. The Step 2 of the KDD Process Cont.-The APD
Method
213. The Step 2 of the KDD Process
Cont.-Distance-based Association Rule Mining
Short Stairs Pattern
Long Stairs Pattern
Short Fingers Pattern
Long Fingers Pattern
223. The Step 2 of the KDD Process
Cont.-Distance-based Association Rule Mining
- To discover the association between web pages
- E.g. The people who view University Home Page
then view Computer Science Department page (Rule
A) - Support Rule A/All Sessions
- Confidence Rule A/All Rules from University home
page - Distance From University home page to CS
Department page
Distance10
The concept of distance measurement in
Association Rule
233. The Step 2 of the KDD Process-Distance-based
Association Rule Mining
Top The people who view Universitys home page
also view (Frequencygt10 and Distancegt5)
244. The Step 3 and Step 4 of the KDD
Process-Recommendation and Action
- Recommendation
- The analysis results must be reviewed from
different aspects - Three ways to generate recommendations
- Automatically
- Semi automatically
- Manually
Figure The process of generate the actionable
recommendation (Adopted and Modified from
Perkowitz and Etzioni 2000)
254. The Step 3 and Step 4 of the KDD
Process-Action
- Action
- Actionable recommendation
- Cost
- Appropriate techniques
- Valuable or interesting enough for the website
- Improving the design of website
- Modifying the content of a web page
- Adding or removing Links
- Changing the layout of the web page
- Changing the structure of the web site
- Completely redesign the website
264. The Step 3 and Step 4 of the KDD
Process-Recommendation and Action
- A heuristic for website design improvement
- Based on APD and Distance-based Association Rule
Mining
Figure A sample heuristic for website design
improvement
275. Closing the KDD Loop
- Closing the KDD Loop
- Sequentially
- The four steps of the KDD process must be done
step by step - Completely
- All of the four steps must be done completely
- Smoothly
- No any gap in between any two steps of the KDD
process
Figure The KDD Process for Website Design
Improvement
286. Empirical Study- Channel 6 Website
- We collaborated with an E-commerce website design
company - Channel 6 Multimedia
- Analysis Target Channel 6 website
(http//www.ch-6.co.uk) - The Company provided us Clickstream data for the
period of 6 months - A website designer for us to discuss
296. Empirical Study-Potential Problem of the
website
30(No Transcript)
31Service page-Solutions
Service index page
Service page-Microstyle
326. Empirical Study-Recommendation and Taking
Action
- Recommendation
- Is it possible to provide cross linking table in
the top or bottom of each service related page? - Action
Channel 6 Website http//www.ch-6.co.uk/services.
asp
336. Empirical Study- Performance Evaluation
- Evaluation Criteria
- Distance
- The Amount of Fingers or Downstairs Pattern
- Results
346. Empirical Study-Performance Evaluation
35Distance
T-test Average Change1.82 S2 2.773247059 T4.63
6746577 P1.740 (significance level0.05) The
distance after the changing of the website is
shorter than before, which is achieving the
significance level.
36Fingers and Mountain Pattern
T-test Average Change32.67778 T4.597186
P1.740 (significance level0.05) The
percentage of Fingers and Mountain pattern after
the changing of the website is lower than before,
which is achieving the significance level.
36
377. Conclusion
- Web usage mining is helpful to understand the
browsing behaviour of users - The websites design can be improved through the
KDD process for website design improvement - The techniques that developed in this research
can be treat as a toolkit, which can help a
website to improve its design. - This research provides one way to close the KDD
loop
38Thanks for Your AttentionAny Question?