Data Mining: Potentials and Challenges - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Data Mining: Potentials and Challenges

Description:

Data mining has started to live up to its promise in the commercial world, ... We stand on the brink of great new answers, but even more, of great new ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 16
Provided by: Ra88
Category:

less

Transcript and Presenter's Notes

Title: Data Mining: Potentials and Challenges


1
Data MiningPotentials and Challenges
  • Rakesh Agrawal
  • IBM Almaden Research Center

2
Thesis
  • Data mining has started to live up to its promise
    in the commercial world, particularly in
    applications involving structured data
  • Promising data mining applications in
    non-conventional domains are beginning to emerge,
    involving combination of structured and
    unstructured data
  • Investment in data mining research can have large
    payoff

3
Outline
  • Examples of some promising non-conventional data
    mining applications and technologies
  • Some hurdles we need to cross

4
Identifying Social Links Using Association Rules
Input Crawl of about 1 million pages
5
Website Profiling using Classification
Input Example pages for each category during
training
6
Discovering Trends Using Sequential Patterns
Shape Queries
Input i) patent database ii) shape of interest
7
Discovering Micro-communities
Frequently co-cited pages are related. Pages
with large bibliographic overlap are related.
8
Technical Chasms
  • Privacy Concerns?
  • Privacy-preserving data mining
  • Data for data mining?
  • Data mining over compartmentalized databases

9
Inducing Classifiers over Privacy Preserved
Numeric Data
Alices age
Alices salary
Johns age
30 becomes 65 (3035)
10
Reconstruction Algorithm
  • fX0 Uniform distribution
  • j 0
  • repeat
  • fXj1(a)
    Bayes Rule
  • j j1
  • until (stopping criterion met)
  • Converges to maximum likelihood estimate.
  • D. Agrawal C.C. Aggarwal, PODS 2001.

11
Works Well
12
Accuracy vs. Randomization
13
Discovering frequent itemsets
Breach level 50.
Soccer smin 0.2
Mailorder smin 0.2
14
Computation over Compartmentalized Databases
15
Some Hard Problems
  • Past may be a poor predictor of future
  • Abrupt changes
  • Wrong training examples
  • Reliability and quality of data
  • Actionable patterns (principled use of domain
    knowledge?)
  • Over-fitting vs. not missing the rare nuggets
  • Richer patterns
  • Simultaneous mining over multiple data types
  • When to use which algorithm?
  • Automatic, data-dependent selection of algorithm
    parameters

16
Summary
  • Data mining has shown promise but we need further
    research to realize its full potential

We stand on the brink of great new answers, but
even more, of great new questions -- Matt Ridley
Write a Comment
User Comments (0)
About PowerShow.com