Title: Web Usage Mining: Processes and Applications
1Web Usage Mining Processes and Applications
Qiaoyuan Jiang CSE 8331 November 24, 2003
2Outline
- Brief overview of Web mining
- Web usage mining
- Application areas of Web usage mining
- Future research directions
- Conclusions
3Web Mining
- Web Mining is the application of data mining
techniques to discover and retrieve useful
information and patterns from the World Wide Web
documents and services Etzioni, 1996.
4Web Mining Categories
- Web Content Mining- extracting knowledge from the
content of the Web - Web Structure Mining- discovering the model
underlying the link structures of the Web - Web Usage Mining- discovering users navigation
pattern and predicting users behavior
5Web Usage Mining Processes
- Preprocessing conversion of the raw data into
the data abstraction (users, sessions, episodes,
clicktreams, and pageviews) necessary for further
applying the data mining algorithm. - Pattern Discovery is the key component of WUM,
which converges the algorithms and techniques
from data mining, machine learning, statistics
and pattern recognition etc. research categories. - Pattern Analysis Validation and interpretation
of the mined patterns
6Web Usage Mining Processes (Cont.)
7Web Usage Mining- Preprocessing
- Data Cleaning remove outliers and/or irrelative
data - User Identification associate page references
with different users - Session Identification divide all pages accessed
by a user into sessions - Path Completion add important page access
records that are missing in the access log due to
browser and proxy server caching - Formatting format the sessions according to the
type of data mining to be accomplished.
8Web Usage Mining Preprocessing (Cont.)
9Web Usage Mining - Pattern Discovery Tasks
- Statistical Analysis
- Clustering
- Classification
- Association Rules
- Sequential Patterns
- Dependency Modeling
10Web Usage Mining - Pattern Discovery Tasks
(Cont.)
- Statistical Analysis frequency analysis, mean,
median, etc. - Improve system performance
- Provide support for marketing decisions
- Simplify site modification task
- Clustering
- Clustering of users help to discover groups of
users with similar navigation patterns gt provide
personalized Web content - Clustering of pages help to discover groups of
pages having related content gt search engine
11Web Usage Mining - Pattern Discovery Tasks
(Cont.)
- Classification the technique to map a data item
into one of several predefined classes - Develop profile of users belonging to a
particular class or category - Association Rules discover correlations among
pages accessed together by a client - Help the restructure of Web site
- Page prefetching
- Develop e-commerce marketing strategies
12Web Usage Mining - Pattern Discovery Tasks
(Cont.)
- Sequential Patterns extract frequently occurring
inter-session patterns such that the presence of
a set of items s followed by another item in time
order - Predict future user visit patternsgtplacing ads
or recommendations - Page prefeteching
- Dependency Modeling determine if there are any
significant dependencies among the variables in
the Web domain - Predict future Web resource consumption
- Develop business strategies to increase sales
- Improve navigational convenience of users
13Web Usage Mining - Pattern Analysis
- Pattern Analysis is the final stage of WUM, which
involves the validation and interpretation of the
mined pattern - Validation to eliminate the irrelative rules or
patterns and to extract the interesting rules or
patterns from the output of the pattern discovery
process - Interpretation the output of mining algorithms
is mainly in mathematic form and not suitable for
direct human interpretations
14Web Usage Mining - Pattern Analysis
Methodologies and Tools
- Visualization help people to understand both
real and abstract concepts - WebViz Web is visualized as a direct graph
- Query mechanism allow analysts to extract only
relevant and useful patterns by specifying
constraints. - WEBMINER
- On-Line Analytical Processing (OLAP) enable
analysts to perform ad hoc analysis of data in
multiple dimensions for decision-making - WebLogMiner
15WEMINER Query Example
- Finds all ARs with min support of 1 and min
confidence of 90. The analyst only interested in
clients from .edu domain and data later than
Nov. 1st, 2003 with page accesses start with URL
A and contains B and C in that order - SELECT association-rules(ABC)
- FROM log.data
- WHERE dategt031101 AND domainedu
- AND support 1.0 AND confidence 90.0
16Application Areas for Web Usage Mining
- Personalized discover the preference and needs
of individual Web users in order to provide
personalized Web site for certain types of users - Impersonalized examine general user navigation
patterns in order to understand how general users
use the site - System Improvement
- Site Modification
- Business Intelligence
- Web Characterization
17System Improvement
- High performance of a web application is expected
since it directly affects users satisfaction - WUM provides a key to understanding Web traffic
behavior - Applications
- Develop policies for web caching, network
transmission, load balancing, or data
distribution - Detecting intrusion, fraud, and attempted
break-ins to the system
18Site Modification
- Structure of a Web site is another crucial
attribute for attracting users other than the
content of the Web - WUM can provide detailed feedback on users
navigation behavior, which can be used to
redesign the Web site structure for users
navigational convenience - Adaptive Web site project Perkowiz Etzioni,
1998-1999
19Business Intelligence
- Information on how customers are using a Web site
is critical information for marketers of
e-commerce businesses - WUM can provide business process optimization and
marketing decisions - Business intelligence includes personalization
for C2B systems
20Usage Characterization
- Mining general usage patterns (do not focus on
any specific users or web sites) help in the
study of how browsers are used and the users
interaction with a browser interface. - Enables the ability to look at the dynamics of
the Web and how it is growing.
21Personalization
- Choosing among thousands of options is challenge
for Web users - Goal provides users with dynamic content
tailored to their individual interest - Form recommending one or more items or pages to
a user, based on the users profile and usage
behavior, or the patterns of past visitors who
have similar profiles. - Performance Measurement
- Effectiveness accuracy coverage
- Scalability
22Applications of Personalization
- Customizing access to information sources
- Filtering news or e-mails
- Recommendation services for the browsing process
- Tutoring systems
- Search
- More ...
233 phases of Personalization
- Data preparation and transformation data
cleaning, filtering, transaction identification - Pattern discovery discovery usage patterns
- Recommendation generate personalized content for
a user based on matching the users session.
(online process)
24(No Transcript)
25Personalization Techniques Collaborative
Filtering (CF)
- Pattern discovery online kNN algorithm applied
on user profiles in a given domain and matching
people who have the same taste. - Recommendation pages or items that are
interested to the k-neighbors will be interested
to the active user as well. - Drawbacks
- Online process gtLack of scalability
- Static user profiles gt low quality of
recommendations
26Personalization Techniques Clustering
- Technique clustering user transactions and
pageviews. - Advantages
- User preference is automatically learned from
usage data and therefore up-to-date. - Better scalability through clustering
- Drawbacks
- Low accuracy
27Personalization Techniques Association Rules
(ARs)
- Technique
- For each user, create a transaction contains all
the items the user have ever accessed. - Find all rules satisfy the given support and
confidence. - For each active user, find all the rules
supported by the user. Items predicted by these
rules are the candidate recommendations - Drawbacks
- All association rules must be discovered prior
generating recommendation. This can be improved
by real-time generating ARs from a subset of
transactions within the active users neighborhood
- High support gt better scalability and accuracy,
low coverage.
28Personalization Techniques Sequential Patterns
(SPs)
- Technique Markov Model
- Advantages
- Better accuracy SPs contains more precise
information about user navigation behavior. - Drawbacks
- Low recommendation coverage
- More suitable for predictive tasks, e.g., Web
prefeteching
29Personalization Techniques Hybrid Models
- Hybrid Models automatically switch among
different personalization models based on
localized degree of hyperlink connectivity. - High connectivity degree gt Non-SP models
- Low connectivity degree and deeper navigation
path gt SP models - Performance better than any individual models
30Future Research Directions
- Usage Mining on Semantic Web
- Help to build semantic Web
- With semantic Web, WUM can be improved
- Multimedia Web Data Mining
- Representation, problem solving and learning from
Multimedia data is indeed a challenge
31Future Research Directions (Cont.)
- Software Computing Technology for Web Mining
- Fuzzy logic dealing with imprecision and
conceptual data. Used in clustering Web log data
and mining ARs. - Neural network
- Adaptive to new new data and information
- Suitable for parallel process
- Robust for missing, confusing, ill-defined data
- Capable for modeling non-linear decision
boundaries - Effective for learning user profiles
- Genetic algorithm randomized search and
optimization guided by evaluation criteria. - Efficient, adaptive, robust, parallel process
- Used in search and query optimization, predict
user preference
32Future Research Directions (Cont.)
- Analysis of Discovered Patterns
- Research on efficient, flexible and powerful
analysis tools - More Applications
- Temporal evolutions of usage behavior
- Improving Web services
- Detect credit card fraud
- Privacy issues
33Conclusions