Data Mining Techniques - PowerPoint PPT Presentation

About This Presentation
Title:

Data Mining Techniques

Description:

Data Analysis and Modeling, Data Fusion and Mining, Knowledge Discovery ... Intelligent Data Mining Tools. Automate the process of discovering patterns ... – PowerPoint PPT presentation

Number of Views:1085
Avg rating:3.0/5.0
Slides: 16
Provided by: csG7
Learn more at: http://www.cs.gsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Data Mining Techniques


1
Data Mining Techniques
  • Cluster Analysis
  • Induction
  • Neural Networks
  • OLAP
  • Data Visualization

2
Association Rule
  • An association rule is a rule, which implies
    certain association relationships among a set of
    objects (such as occur together or one implies
    the other) in a database.
  • Given a set of transactions, where each
    transaction is a set of literals (called items),
    an association rule is an expression of the form
    X Y, where X and Y are sets of items.
  • The intuitive meaning of such a rule is that
    transactions of the database, which contain X,
    tend to contain Y.

3
Support
  • The support of an item set S is the percentage of
    those transactions in T which contain S.
  • If U is the set of all transactions that contain
    all items in S, then support(S) (U / T)
    100, where U and T are the number of
    elements in U and T, respectively.

4
Confidence
  • Confidence of a candidate rule X Y is calculated
    as support(XY) / support(X).
  • The confidence of rule X Y represents the
    percentage of transactions containing items in X
    that also contain items in Y

5
Example Association Rule
  • In a store we might have Icheese,ham,bread,butte
    r,salt,coke
  • A transaction could look like tbread,butter
    for a customer who bought cheese and coke.
  • An association rule would be like the following
    breadgtbutter with support 60 and confidence 80
    also bought butter.

6
Apriori Algorithm
  • Find all combinations of items that have
    transaction support above minimum support. Call
    those combinations frequent itemsets.
  • Use the frequent itemsets to generate the desired
    rules.

7
Apriori Algorithm(contd)
  • Pass 1
  • Generate the candidate itemsets in C1
  • Save the frequent itemsets in L1
  • Pass k
  • Generate the candidate itemsets in Ck from the
    frequent itemsets in Lk-1
  • Join Lk-1 with Lk-1, as follows insert into Ck
    select p.item1, q.item1, . . . , p.itemk-1,
    q.itemk-1 from Lk-1 p, Lk-1q where p.item1
    q.item1, . . . , p.itemk-1 lt q.itemk-1

8
Apriori Algorithm(contd)
  • 3. Generate all (k-1)-subsets from the candidate
    itemsets in Ck
  • 4. Prune all candidate itemsets from Ck where
    some (k-1)-subset of the candidate itemset is not
    in the frequent itemset Lk-1
  • 2. Scan the transaction database to determine the
    support for each candidate itemset in Ck
  • 3. Save the frequent itemsets in Lk

9
Smart Web Search Agents
  • Data Search Engines gtgt Information Search Agents
  • - Traditional searching on the Web is done using
    one of the following three
  • - Directories (Yahoo, Lycos, etc)
  • - Search Engines (AltaVista, NorthernLight,
    etc)
  • - Metasearch Engines (MetaCrawler, SavvySearch,
    AskJeeves, etc)
  • All of these involve keyword searches
    Drawback not easily personalized,
  • too many results (although many give
    relevancy factors)

10
  • - local cache databases (containing frequently
    asked queries/results possibly updated
    periodically - nightly!)
  • - local cache information base (containing mined
    information and discovered knowledge for
    efficient personal use)
  • - domain-based agents (e.g. Job Search
    Sports-NBA Stats, Bibliography-Digital Libraries)

11
Intelligent Tools for E-Business
  • Computational Intelligence, Neural Networks,
    Fuzzy Logic, Genetic Algorithms, Hybrid Systems
  • Learning Algorithms, Heuristic Searching
  • Data Analysis and Modeling, Data Fusion and
    Mining, Knowledge Discovery
  • Prediction Time Series Analysis
  • Information Retrieval, Intelligent User Interface
  • Intelligent Agents, Distributed IA and
    Multi-Agents, Cooperative Knowledge-based Systems

12
Enhancing E-Business Process Through Data Mining
  • Traditional Data Mining Tools
  • Simple query and reporting
  • Visualization driven data exploration tools, OLAP
  • Discovery process is user driven
  • Quality of discovered knowledge
  • Having right data
  • Having appropriate data mining tools!!!

13
Intelligent Data Mining Tools
  • Automate the process of discovering
    patterns/knowledge in data
  • Require hypothesis, exploration
  • Derive business knowledge (patterns) from data
  • Combine business knowledge of users with results
    of discovery algorithms

14
Intelligent Information Agents
  • The Data Mining Problem
  • Clustering/ Classification
  • Association
  • Sequencing
  • Viewed as an Optimization Problem
  • Tools Genetic Algorithms

15
Fuzzy Rules Discovering
  • Rules discovering The discovery of associations
    between business events, i.e. which items are
    purchased together
  • In order to do flexible querying and intelligent
    searching, fuzzy query is developed to uncover
    potential valuable knowledge
  • Fuzzy Query uses fuzzy terms like tall, small,
    and near to define linguistic concepts and
    formulate a query
  • Automated search for fuzzy Rules is carried out
    by the discovery of fuzzy clusters or
    segmentation in data
Write a Comment
User Comments (0)
About PowerShow.com