Finding Fuzzy Approximate Dependencies within STULONG Data - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Finding Fuzzy Approximate Dependencies within STULONG Data

Description:

1. Finding Fuzzy Approximate Dependencies within STULONG Data ... VINO. 0.13/0.61. 0.10/0.62. 0.47/0.39. 0.10/0.62. PIVO12. 0.21/0.41. 0.16/0.43. 0.58/0.21 ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 22
Provided by: josmaraser
Category:

less

Transcript and Presenter's Notes

Title: Finding Fuzzy Approximate Dependencies within STULONG Data


1
Finding Fuzzy Approximate Dependencies within
STULONG Data
  • Discovery Challenge, ECML/PKDD 2003
  • September 22-27, 2003
  • Berzal F., Cubero J.C., Sanchez D., Serrano J.M.,
    Vila M.A.
  • University of Granada (Spain)

2
Introduction
  • KDD allow us to obtain relations within data.
  • Non-trivial.
  • Previously unknown.
  • Potentially useful.
  • Fuzzy data ? KDD tools and techniques extensions.

3
Problem representation
  • Fuzzy relational database.
  • aij values Numeric, scalar (nominal), linguistic
    labels.
  • Membership degrees.
  • Fuzzy similarity relations, SA1, ..., SAm.

t A1 A2 ... Am
t1 a11, ?t1(A1) a12, ?t1(A2) ... a1m, ?t1(Am)
t2 a21, ?t2(A1) a22, ?t2(A2) ... a2m, ?t2(Am)
t3 a31, ?t3(A1) a32, ?t3(A2) ... a3m, ?t3(Am)
...
4
Fuzzy Approximate Dependencies
  • We define Fuzzy Approximate Dependencies relaxing
    some properties in Functional Dependencies,
  • V ? W ?
  • ?t,s tV sV ? tW sW

5
FAD Measures
  • Relevance degree
  • Support, supp(V?W)
  • Fulfilment degrees
  • Confidence, conf(V?W)
  • Certainty factor, CF(V?W) Shortliffe and
    Buchanan, 1975
  • Measures belief degree variations.
  • CF(V?W) 1 ? Maximum increment (Perfect
    positive).
  • CF(V?W) 1 ? Maximum decrement.
  • CF(V?W) 0 ? Statistical independence.

6
Applications
  • Fuzzy Databases.
  • Approximate Dependencies Discovery.
  • Functional Dependencies Discovery.
  • Other applications
  • Low granularity data.
  • Overlapping semantics.

7
STULONG Database
  • Entry Table.
  • Normal Group (attribute KONSKUP having values 1
    or 2).
  • Risk Group (attribute KONSKUP having values 3 or
    4).
  • Pathologic Group (value 5 for attribute KONSKUP).

8
Data Preprocessing (I)
  • Problem Semantic overlapping in symbolic or
    scalar attributes.
  • Similarity fuzzy relations (subjective).
  • I.e. DOPRAVA (Means of transport for getting to
    work)

by bike public means car not stated
on foot 0.4 0.3 0.3 0.0
by bike 0.3 0.3 0.0
public means 0.4 0.0
9
Data Preprocessing (II)
  • Problem High granularity in numeric attributes.
  • Linguistic labels sets definition starting from
    intervals.
  • Numeric value ? ltLabel, degreegt
  • P.e. BMI (Body mass index)

10
Analytical Questions (I)
  • Dependencies between social factors and physical
    activity.

ROKVSTUP STAV VZDELANI ZODPOV
TELAKTZA 0.67/0.14 0.24/0.37 0.25/0.28
AKTPOZAM 0.14/0.47 0.58/0.28 0.14/0.49 0.18/0.47
DOPRAVA 0.20/0.32 0.64/0.14 0.19/0.32 0.26/0.32
DOPRATRV 0.17/0.47 0.57/0.22 0.16/0.46 0.21/0.44
11
Analytical Questions (II)
  • Dependencies between social factors and smoking.

ROKVSTUP STAV VZDELANI ZODPOV
KOURENI 0.68/0.07
DOBAKOUR 0.64/0.11 0.26/0.25
BYVKURAK 0.10/0.64 0.42/0.39 0.09/0.65 0.13/0.64
12
Analytical Questions (III)
  • Dependencies between social factors and alcohol
    consumption.

ROKVSTUP STAV VZDELANI ZODPOV
ALKOHOL 0.21/0.35 0.63/0.15 0.19/0.34 0.24/0.31
PIVO10 0.16/0.43 0.58/0.21 0.16/0.43 0.21/0.41
PIVO12 0.10/0.62 0.47/0.39 0.10/0.62 0.13/0.61
VINO 0.16/0.43 0.58/0.21 0.16/0.44 0.21/0.41
LIHOV 0.16/0.43 0.58/0.21 0.16/0.43 0.20/0.41
PIVOMN 0.21/0.33 0.65/0.14 0.20/0.32 0.24/0.29
VINOMN 0.20/0.33 0.64/0.15 0.19/0.33 0.24/0.31
LIHMN 0.20/0.31 0.64/0.14 0.19/0.30 0.25/0.29
13
Analytical Questions (IV)
  • Dependencies between social factors and physical
    features.

ROKVSTUP STAV VZDELANI ZODPOV
BMI 0.16/0.44 0.58/0.23 0.15/0.45 0.20/0.42
SYST1 0.65/0.12 0.25/0.26
DIAST1 0.19/0.32 0.63/0.14 0.19/0.32 0.24/0.30
SYST2 0.65/0.12 0.25/0.25
DIAST2 0.19/0.33 0.63/0.15 0.18/0.33 0.23/0.30
14
Analytical Questions (V)
  • Dependencies between physical activity and
    smoking.

TELAKTZA AKTPOZAM DOPRAVA DOPRATRV
KOURENI 0.50/0.11 0.45/0.13
DOBAKOUR 0.27/0.24 0.47/0.18 0.30/0.24 0.42/0.19
BYVKURAK 0.13/0.62 0.26/0.51 0.15/0.51 0.23/0.55
15
Analytical Questions (VI)
  • Dependencies between physical activity and
    alcohol consumption.

TELAKTZA AKTPOZAM DOPRAVA DOPRATRV
ALKOHOL 0.27/0.31 0.46/0.23 0.29/0.30 0.41/0.25
PIVO10 0.22/0.39 0.40/0.30 0.24/0.39 0.35/0.33
PIVO12 0.14/0.59 0.29/0.50 0.16/0.59 0.23/0.50
VINO 0.22/0.40 0.40/0.31 0.24/0.39 0.35/0.33
LIHOV 0.22/0.39 0.39/0.30 0.24/0.38 0.35/0.33
PIVOMN 0.27/0.29 0.46/0.21 0.30/0.29 0.42/0.24
VINOMN 0.27/0.31 0.46/0.23 0.28/0.30 0.41/0.24
LIHMN 0.27/0.28 0.46/0.21 0.29/0.27 0.41/0.23
16
Analytical Questions (VII)
  • Dependencies between physical activity and
    physical features.

TELAKTZA AKTPOZAM DOPRAVA DOPRATRV
BMI 0.21/0.41 0.39/0.32 0.23/0.40 0.34/0.34
SYST1 0.27/0.26 0.46/0.19 0.29/0.25 0.42/0.21
DIAST1 0.25/0.29 0.44/0.22 0.28/0.29 0.39/0.23
SYST2 0.27/0.25 0.47/0.18 0.29/0.24 0.42/0.20
DIAST2 0.25/0.29 0.45/0.22 0.27/0.29 0.39/0.24
17
Analytical Questions (VIII)
  • Dependencies between physical activity and
    cholesterol degrees.

TELAKTZA AKTPOZAM DOPRAVA DOPRATRV
CHLST 0.28/0.24 0.47/0.17 0.30/0.23 0.42/0.19
TRIGL 0.49/0.13 0.45/0.14
18
Analytical Questions (IX)
  • Dependencies between alcohol consumption and
    physical features.

BMI SYST1 DIAST1 SYST2 DIAST2
ALKOHOL 0.40/0.24 0.25/0.30 0.28/0.29 0.24/0.31 0.28/0.29
PIVO10 0.35/0.33 0.21/0.39 0.38/0.24 0.20/0.40 0.24/0.38
PIVO12 0.25/0.52 0.14/0.60 0.16/0.59 0.13/0.60 0.17/0.58
VINO 0.35/0.32 0.21/0.40 0.24/0.38 0.20/0.40 0.24/0.38
LIHOV 0.35/0.33 0.21/0.40 0.24/0.38 0.20/0.40 0.24/0.38
PIVOMN 0.41/0.23 0.25/0.28 0.29/0.27 0.25/0.29 0.29/0.27
VINOMN 0.40/0.24 0.25/0.30 0.28/0.28 0.24/0.30 0.28/0.28
LIHMN 0.41/0.22 0.25/0.28 0.29/0.27 0.24/0.28 0.29/0.27
19
Analytical Questions (X)
  • Dependencies between alcohol consumption and
    smoking.

KOURENI DOBAKOUR BYVKURAK
ALKOHOL 0.23/0.30 0.61/0.15
PIVO10 0.13/0.44 0.20/0.40 0.56/0.22
PIVO12 0.08/0.65 0.13/0.60 0.44/0.40
VINO 0.13/0.44 0.20/0.40 0.56/0.22
LIHOV 0.13/0.44 0.20/0.40 0.56/0.22
PIVOMN 0.23/0.28 0.61/0.14
VINOMN 0.23/0.30 0.61/0.15
LIHMN 0.24/0.28 0.62/0.14
20
Analytical Questions (XI)
  • Dependencies between skin folds and BMI,
  • TRIC ? BMI, supp 15.85, CF 0.54
  • SUBSC ? BMI, supp 17.28, CF 0.58

21
Concluding Remarks
  • FADs allow us to discover relations within
    imprecise or uncertain data.
  • Experts aid is desirable.
  • Data preprocessing.
  • Results interpretation.
Write a Comment
User Comments (0)
About PowerShow.com