Pr - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Pr

Description:

This work has been partially funded by the European Project AEGIS (IST-2000-26450) ... STULONG Data : A 20 year longitudinal study of risk factors related to ... – PowerPoint PPT presentation

Number of Views:15
Avg rating:3.0/5.0
Slides: 17
Provided by: armad5
Category:
Tags:

less

Transcript and Presenter's Notes

Title: Pr


1
Mining Episode Rules in STULONG dataset
  • N. Méger1, C. Leschi1, N. Lucas2 C. Rigotti1
  • 1 INSA Lyon - LIRIS FRE CNRS 2672
  • 2 Université dOrsay LRI CNRS UMR 8623

This work has been partially funded by the
European Project AEGIS (IST-2000-26450).
2
Content
  • Motivation
  • About WinMiner
  • Data Mining Effort
  • Conclusion

3
Motivation Data
  • STULONG Data A 20 year longitudinal study of
    risk factors related to atherosclerosis in a
    population of middle-aged men
  • Tables ENTRY and CONTROL
  • 1216 patients described by
  • Identification and social characteristics
  • Behavior
  • Health events
  • Physical and biochemical examinations
  • From 1 up to 21 control per patients
  • ? A sequence of controls for each patient

4
Motivation Medical issues
  • identified risks factors
  • no treatment available
  • necessity to consider a global risk instead of
    concentrating prevention efforts on individual
    ones
  • risk comportments dramatically increases
    cardio-vascular disease emergence, but no one
    knows when
  • ? Relations between risk factors and clinical
    demonstration of atherosclerosis?
  • ? Time intervals over which these relations are
    valid?

5
Motivation WinMiner
  • WinMiner a single optimised way to find
    sequential patterns in data along with their
    optimal time intervals, under user constraints
  • WinMiner suggests to experts possible temporal
    dependencies among occurrences of event types
  • WinMiner outputs "small" collections of
    sequential patterns

6
About WinMiner
  • Mining context
  • large event sequences
  • episode episode rules

A
B
A
B
C
A
B
C
7
About WinMiner
  • Selecting patterns
  • support how many times an episode/episode rule
    occurs within an event sequence?
  • A ? B A ? B ? C
  • confidence what is the probability of the RHS of
    an episode rule to occur knowing that its LHS
    already occured?
  • A ? B ? C
  • patterns are selected using
  • a minimum support threshold
  • a minimum confidence threshold

8
About WinMiner
  • Selecting the optimal window span

confidence
First Local Maximum (FLM)
w
9
About WinMiner
  • WinMiner
  • checks all possible episode rules satisfying to
    frequency and confidence thresholds
  • outputs only the FLM-rules, along with their
    respective optimal window sizes
  • uses a maximal gap constraint

10
DM effort Aims
  • Give to the medical expert
  • a mean to follow both the evolution of risk
    factors and
  • (1) impact of medical intervention
  • (2) modifications in patients behavior
  • in addition
  • significant time periods of observation
  • frequency
  • probability

11
DM effort Data preprocessing
  • Mainly focused on table CONTROL (1226
    patients/10572 examinations)
  • Joint operations to export information from table
    ENTRY
  • Categorization of some factors
  • Choice of relevant factors according to
  • Medical expertise
  • Mining approach
  • ? Table Contr_Mod_2

12
DM Effort Data preprocessing
  • Important factors (according to medical experts)
  • cholesterol
  • hypertension
  • smoking
  • physical activity
  • age
  • diabetes
  • alcohol consumption
  • BMI
  • family anamnesis
  • level of education

13
DM Effort Data preprocessing
  • Contr_mod_2 ?? large event sequence
  • For each patient a subsequence containing all
    his control examinations
  • Coding guarantees that events corresponding to 2
    different patients can not be associated in the
    same episode rule
  • Large event sequence concatenation of all sub
    sequences constructed for patients.

14
DM effort Results
  • Examples
  • "If the patient has no hypercholesterolemia, and
    if he sometimes follows his diet, then the
    patient has no hypercholesterolemia with a
    probability of 0.8 within 40 months. This rule is
    supported by 201 examples in the event sequence."
     
  • " If one eats less of fats and carbohydrates and
    he has claudication observed some time later,
    then this claudication does not disappear with a
    probability of 0.8 over 30 months. This rule is
    supported by 21 examples. "

15
DM effort Results
  • Well known phenomena
  • indication about correctness in pre-processing as
    well as in mining data
  • Added-value suggestion concerning their temporal
    aspects
  • To be expected
  • with new data and new risk factors put in
    evidence in the last decade, discovering new
    phenomena along with their optimal window sizes

16
Conclusion
  • With STULONG data Searching for temporal
    dependencies between atherosclerosis risk factors
    and clinical demonstration of atherosclerosis
    that have an optimal interval/window size
  • Offers to the medical expert a possibility to
    explicit impact of a risk factor and to refine
    its part in comparison with other ones within a
    time interval
  • A few episode rules obtained, that allows experts
    to manually analyse the outputs
  • Could be applied to other medical data sets to
    help in finding unknown phenomena
  • ? New perspectives both for data
    miners and physicians
Write a Comment
User Comments (0)
About PowerShow.com