Web Usage Mining: Processes and Applications - PowerPoint PPT Presentation

About This Presentation
Title:

Web Usage Mining: Processes and Applications

Description:

Web Usage Mining: Processes and Applications Qiaoyuan Jiang CSE 8331 November 24, 2003 Outline Brief overview of Web mining Web usage mining Application areas of Web ... – PowerPoint PPT presentation

Number of Views:187
Avg rating:3.0/5.0
Slides: 34
Provided by: bwei7
Learn more at: https://s2.smu.edu
Category:

less

Transcript and Presenter's Notes

Title: Web Usage Mining: Processes and Applications


1
Web Usage Mining Processes and Applications
Qiaoyuan Jiang CSE 8331 November 24, 2003
2
Outline
  • Brief overview of Web mining
  • Web usage mining
  • Application areas of Web usage mining
  • Future research directions
  • Conclusions

3
Web Mining
  • Web Mining is the application of data mining
    techniques to discover and retrieve useful
    information and patterns from the World Wide Web
    documents and services Etzioni, 1996.

4
Web Mining Categories
  • Web Content Mining- extracting knowledge from the
    content of the Web
  • Web Structure Mining- discovering the model
    underlying the link structures of the Web
  • Web Usage Mining- discovering users navigation
    pattern and predicting users behavior

5
Web Usage Mining Processes
  • Preprocessing conversion of the raw data into
    the data abstraction (users, sessions, episodes,
    clicktreams, and pageviews) necessary for further
    applying the data mining algorithm.
  • Pattern Discovery is the key component of WUM,
    which converges the algorithms and techniques
    from data mining, machine learning, statistics
    and pattern recognition etc. research categories.
  • Pattern Analysis Validation and interpretation
    of the mined patterns

6
Web Usage Mining Processes (Cont.)
7
Web Usage Mining- Preprocessing
  • Data Cleaning remove outliers and/or irrelative
    data
  • User Identification associate page references
    with different users
  • Session Identification divide all pages accessed
    by a user into sessions
  • Path Completion add important page access
    records that are missing in the access log due to
    browser and proxy server caching
  • Formatting format the sessions according to the
    type of data mining to be accomplished.

8
Web Usage Mining Preprocessing (Cont.)
9
Web Usage Mining - Pattern Discovery Tasks
  • Statistical Analysis
  • Clustering
  • Classification
  • Association Rules
  • Sequential Patterns
  • Dependency Modeling

10
Web Usage Mining - Pattern Discovery Tasks
(Cont.)
  • Statistical Analysis frequency analysis, mean,
    median, etc.
  • Improve system performance
  • Provide support for marketing decisions
  • Simplify site modification task
  • Clustering
  • Clustering of users help to discover groups of
    users with similar navigation patterns gt provide
    personalized Web content
  • Clustering of pages help to discover groups of
    pages having related content gt search engine

11
Web Usage Mining - Pattern Discovery Tasks
(Cont.)
  • Classification the technique to map a data item
    into one of several predefined classes
  • Develop profile of users belonging to a
    particular class or category
  • Association Rules discover correlations among
    pages accessed together by a client
  • Help the restructure of Web site
  • Page prefetching
  • Develop e-commerce marketing strategies

12
Web Usage Mining - Pattern Discovery Tasks
(Cont.)
  • Sequential Patterns extract frequently occurring
    inter-session patterns such that the presence of
    a set of items s followed by another item in time
    order
  • Predict future user visit patternsgtplacing ads
    or recommendations
  • Page prefeteching
  • Dependency Modeling determine if there are any
    significant dependencies among the variables in
    the Web domain
  • Predict future Web resource consumption
  • Develop business strategies to increase sales
  • Improve navigational convenience of users

13
Web Usage Mining - Pattern Analysis
  • Pattern Analysis is the final stage of WUM, which
    involves the validation and interpretation of the
    mined pattern
  • Validation to eliminate the irrelative rules or
    patterns and to extract the interesting rules or
    patterns from the output of the pattern discovery
    process
  • Interpretation the output of mining algorithms
    is mainly in mathematic form and not suitable for
    direct human interpretations

14
Web Usage Mining - Pattern Analysis
Methodologies and Tools
  • Visualization help people to understand both
    real and abstract concepts
  • WebViz Web is visualized as a direct graph
  • Query mechanism allow analysts to extract only
    relevant and useful patterns by specifying
    constraints.
  • WEBMINER
  • On-Line Analytical Processing (OLAP) enable
    analysts to perform ad hoc analysis of data in
    multiple dimensions for decision-making
  • WebLogMiner

15
WEMINER Query Example
  • Finds all ARs with min support of 1 and min
    confidence of 90. The analyst only interested in
    clients from .edu domain and data later than
    Nov. 1st, 2003 with page accesses start with URL
    A and contains B and C in that order
  • SELECT association-rules(ABC)
  • FROM log.data
  • WHERE dategt031101 AND domainedu
  • AND support 1.0 AND confidence 90.0

16
Application Areas for Web Usage Mining
  • Personalized discover the preference and needs
    of individual Web users in order to provide
    personalized Web site for certain types of users
  • Impersonalized examine general user navigation
    patterns in order to understand how general users
    use the site
  • System Improvement
  • Site Modification
  • Business Intelligence
  • Web Characterization

17
System Improvement
  • High performance of a web application is expected
    since it directly affects users satisfaction
  • WUM provides a key to understanding Web traffic
    behavior
  • Applications
  • Develop policies for web caching, network
    transmission, load balancing, or data
    distribution
  • Detecting intrusion, fraud, and attempted
    break-ins to the system

18
Site Modification
  • Structure of a Web site is another crucial
    attribute for attracting users other than the
    content of the Web
  • WUM can provide detailed feedback on users
    navigation behavior, which can be used to
    redesign the Web site structure for users
    navigational convenience
  • Adaptive Web site project Perkowiz Etzioni,
    1998-1999

19
Business Intelligence
  • Information on how customers are using a Web site
    is critical information for marketers of
    e-commerce businesses
  • WUM can provide business process optimization and
    marketing decisions
  • Business intelligence includes personalization
    for C2B systems

20
Usage Characterization
  • Mining general usage patterns (do not focus on
    any specific users or web sites) help in the
    study of how browsers are used and the users
    interaction with a browser interface.
  • Enables the ability to look at the dynamics of
    the Web and how it is growing.

21
Personalization
  • Choosing among thousands of options is challenge
    for Web users
  • Goal provides users with dynamic content
    tailored to their individual interest
  • Form recommending one or more items or pages to
    a user, based on the users profile and usage
    behavior, or the patterns of past visitors who
    have similar profiles.
  • Performance Measurement
  • Effectiveness accuracy coverage
  • Scalability

22
Applications of Personalization
  • Customizing access to information sources
  • Filtering news or e-mails
  • Recommendation services for the browsing process
  • Tutoring systems
  • Search
  • More ...

23
3 phases of Personalization
  • Data preparation and transformation data
    cleaning, filtering, transaction identification
  • Pattern discovery discovery usage patterns
  • Recommendation generate personalized content for
    a user based on matching the users session.
    (online process)

24
(No Transcript)
25
Personalization Techniques Collaborative
Filtering (CF)
  • Pattern discovery online kNN algorithm applied
    on user profiles in a given domain and matching
    people who have the same taste.
  • Recommendation pages or items that are
    interested to the k-neighbors will be interested
    to the active user as well.
  • Drawbacks
  • Online process gtLack of scalability
  • Static user profiles gt low quality of
    recommendations

26
Personalization Techniques Clustering
  • Technique clustering user transactions and
    pageviews.
  • Advantages
  • User preference is automatically learned from
    usage data and therefore up-to-date.
  • Better scalability through clustering
  • Drawbacks
  • Low accuracy

27
Personalization Techniques Association Rules
(ARs)
  • Technique
  • For each user, create a transaction contains all
    the items the user have ever accessed.
  • Find all rules satisfy the given support and
    confidence.
  • For each active user, find all the rules
    supported by the user. Items predicted by these
    rules are the candidate recommendations
  • Drawbacks
  • All association rules must be discovered prior
    generating recommendation. This can be improved
    by real-time generating ARs from a subset of
    transactions within the active users neighborhood
  • High support gt better scalability and accuracy,
    low coverage.

28
Personalization Techniques Sequential Patterns
(SPs)
  • Technique Markov Model
  • Advantages
  • Better accuracy SPs contains more precise
    information about user navigation behavior.
  • Drawbacks
  • Low recommendation coverage
  • More suitable for predictive tasks, e.g., Web
    prefeteching

29
Personalization Techniques Hybrid Models
  • Hybrid Models automatically switch among
    different personalization models based on
    localized degree of hyperlink connectivity.
  • High connectivity degree gt Non-SP models
  • Low connectivity degree and deeper navigation
    path gt SP models
  • Performance better than any individual models

30
Future Research Directions
  • Usage Mining on Semantic Web
  • Help to build semantic Web
  • With semantic Web, WUM can be improved
  • Multimedia Web Data Mining
  • Representation, problem solving and learning from
    Multimedia data is indeed a challenge

31
Future Research Directions (Cont.)
  • Software Computing Technology for Web Mining
  • Fuzzy logic dealing with imprecision and
    conceptual data. Used in clustering Web log data
    and mining ARs.
  • Neural network
  • Adaptive to new new data and information
  • Suitable for parallel process
  • Robust for missing, confusing, ill-defined data
  • Capable for modeling non-linear decision
    boundaries
  • Effective for learning user profiles
  • Genetic algorithm randomized search and
    optimization guided by evaluation criteria.
  • Efficient, adaptive, robust, parallel process
  • Used in search and query optimization, predict
    user preference

32
Future Research Directions (Cont.)
  • Analysis of Discovered Patterns
  • Research on efficient, flexible and powerful
    analysis tools
  • More Applications
  • Temporal evolutions of usage behavior
  • Improving Web services
  • Detect credit card fraud
  • Privacy issues

33
Conclusions
Write a Comment
User Comments (0)
About PowerShow.com