Mining Unexpected Rules by Pushing User Dynamics - PowerPoint PPT Presentation

About This Presentation
Title:

Mining Unexpected Rules by Pushing User Dynamics

Description:

Domain values in data rules, and fuzzy terms (such as 'High', 'Low') in knowledge rules. ... add the selected rule to user knowledge ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 22
Provided by: Jia994
Category:

less

Transcript and Presenter's Notes

Title: Mining Unexpected Rules by Pushing User Dynamics


1
Mining Unexpected Rules by Pushing User Dynamics
  • Ke Wang
  • Yuelong Jiang
  • Laks V.S. Lakshmanan

2
Unexpected Rules
  • Unexpectedness user finds the rules surprising
  • Existing approaches
  • Syntax distance (B. Liu, W. Hsu, AAAI96)
  • Logical contradiction (B. Padmanabhan, A.
    Tuzhilin, KDD98)
  • Both by direct comparison between rules

3
Our approach Data Violation
  • Knowledge rules Ui
  • The data rule r
  • unexpected to the user who links owning
    house at BeverlyHill to movie stars and well
    paid
  • Each tuple that satisfies r but violates Ui is an
    evidence for unexpectedness of r

4
Three Issues
  • Knowledge Dynamics
  • User decides the best knowledge to apply given a
    scenario (i.e., a tuple) --- modeling
  • Knowledge Push
  • Push user knowledge right from the start of
    search --- rule mining
  • Unexpectedness Dynamics
  • Adjust the unexpectedness of remaining rules by
    what has been presented so far --- rule selection

5
Rule Representation
  • Knowledge rules and data rules
  • Domain values in data rules, and fuzzy terms
    (such as High, Low) in knowledge rules.
  • Match degree measures the match between a domain
    value (i.e., Primary) and a fuzzy term (i.e., Low)

Target attribute
6
Main Ideas
  • Preference model the user specifies the best
    knowledge rules for each tuple
  • e.g., U1 and U2 for those owning a house at
    BeverlyHill
  • Violation model we measure the unexpectedness of
    r by the violation of satisfying tuples to
    their best knowledge rules.

7
The Preference Model
  • User specifies covering knowledge for each tuple
  • d (covering depth) best knowledge rules that
    match the tuple
  • Ways to specify best
  • Explicit enumeration (not scalable)
  • Rank by preference max strength, best match,
    min violation, etc.

8
The Violation Model
  • For a tuple t and a knowledge rule U
  • Body match degree, bm(t,U), in 0,1
  • Head match degree, hm(t,U), in 0,1
  • Violation of U by t
  • Violation of t, v(t), is aggregated v(t,U) over
    the covering knowledge U of t.

if bm(t, U) ? ? otherwise
9
The Mining Problem
  • Unexpectedness Support of r
  • Unexpectedness Confidence of r
  • Unexpectedness of r
  • Problem Find all data rules r above specified
    thresholds for Usup and Ustr.

10
The Mining Algorithm
  • Three Phases
  • Violation Phase
  • Rule Phase
  • Final Phase

11
Violation Phase
  • Compute and store v(t) for all tuples t in the
    database T, pruning all t with v(t) 0 get new
    database T
  • prunes the data consistent with the user
    knowledge, very effective.

12
Rule Phase
  • Generate all rules r with Usup(r) above threshold
    using T
  • Usup(r) is anti-monotone
  • Usup(r) decreases as the body b(r) grows
  • independent of preference model and violation
    function v(t)
  • Any frequent itemset algorithms can be applied in
    this phase

13
Final Phase
  • Compute sup(r) and sup(b(r)) for rules produced
    in rule phase
  • Output rules r with Ustr(r) above threshold.

14
The Selection Problem
  • Display a specified number k of rules to the
    user, in the order of unexpectedness
  • See-and-Know Assumption
  • After seeing rules R, user is interested in only
    rules that are unexpected with respect to

15
The Selection Algorithm
  • At each step,
  • greedily select the most unexpected rule (until k
    rules are selected or there is no rule to select)
  • add the selected rule to user knowledge
  • for each matching tuple, update the violation
    values to reflect the new covering knowledge.

16
Experiment Dataset
  • KDD-CUP-98 Dataset
  • Target Attribute
  • NK97 donation amount in 1997 campaign
  • five scales c0, c1, c2, c3, c4, in increasing
    order.
  • 23 non-target attributes
  • Their meanings are easier to understand than
    other attributes

17
User Knowledge
  • Observation People tend to remain unchanged in
    donation behaviors
  • Four knowledge rules

18
Efficiency of Mining
  • Three Algorithms
  • UMINE(NULL), without user knowledge
  • UMINE-Unpruned, without tuple pruning
  • UMINE-Pruned, pruning those tuples with vt 0

19
Interestingness of Rules
Ui(x,y) Ui covers x tuples with total violation y
20
Effectiveness of Selection
21
Conclusion
  • A new approach for finding interesting rules by
    modeling user knowledge
  • Violation of covering knowledge by satisfying
    tuples
  • Model human user as a dynamic entity in applying
    knowledge and interpreting presented rules.
  • Push user knowledge in data preparation, mining,
    and rule selection. This benefits both search and
    quality.
Write a Comment
User Comments (0)
About PowerShow.com