Title: Finding Fuzzy Approximate Dependencies within STULONG Data
1Finding Fuzzy Approximate Dependencies within
STULONG Data
- Discovery Challenge, ECML/PKDD 2003
- September 22-27, 2003
- Berzal F., Cubero J.C., Sanchez D., Serrano J.M.,
Vila M.A. - University of Granada (Spain)
2Introduction
- KDD allow us to obtain relations within data.
- Non-trivial.
- Previously unknown.
- Potentially useful.
- Fuzzy data ? KDD tools and techniques extensions.
3Problem representation
- Fuzzy relational database.
- aij values Numeric, scalar (nominal), linguistic
labels. - Membership degrees.
- Fuzzy similarity relations, SA1, ..., SAm.
t A1 A2 ... Am
t1 a11, ?t1(A1) a12, ?t1(A2) ... a1m, ?t1(Am)
t2 a21, ?t2(A1) a22, ?t2(A2) ... a2m, ?t2(Am)
t3 a31, ?t3(A1) a32, ?t3(A2) ... a3m, ?t3(Am)
...
4Fuzzy Approximate Dependencies
- We define Fuzzy Approximate Dependencies relaxing
some properties in Functional Dependencies, - V ? W ?
- ?t,s tV sV ? tW sW
5FAD Measures
- Relevance degree
- Support, supp(V?W)
- Fulfilment degrees
- Confidence, conf(V?W)
- Certainty factor, CF(V?W) Shortliffe and
Buchanan, 1975 - Measures belief degree variations.
- CF(V?W) 1 ? Maximum increment (Perfect
positive). - CF(V?W) 1 ? Maximum decrement.
- CF(V?W) 0 ? Statistical independence.
6Applications
- Fuzzy Databases.
- Approximate Dependencies Discovery.
- Functional Dependencies Discovery.
- Other applications
- Low granularity data.
- Overlapping semantics.
7STULONG Database
- Entry Table.
- Normal Group (attribute KONSKUP having values 1
or 2). - Risk Group (attribute KONSKUP having values 3 or
4). - Pathologic Group (value 5 for attribute KONSKUP).
8Data Preprocessing (I)
- Problem Semantic overlapping in symbolic or
scalar attributes. - Similarity fuzzy relations (subjective).
- I.e. DOPRAVA (Means of transport for getting to
work)
by bike public means car not stated
on foot 0.4 0.3 0.3 0.0
by bike 0.3 0.3 0.0
public means 0.4 0.0
9Data Preprocessing (II)
- Problem High granularity in numeric attributes.
- Linguistic labels sets definition starting from
intervals. - Numeric value ? ltLabel, degreegt
- P.e. BMI (Body mass index)
10Analytical Questions (I)
- Dependencies between social factors and physical
activity.
ROKVSTUP STAV VZDELANI ZODPOV
TELAKTZA 0.67/0.14 0.24/0.37 0.25/0.28
AKTPOZAM 0.14/0.47 0.58/0.28 0.14/0.49 0.18/0.47
DOPRAVA 0.20/0.32 0.64/0.14 0.19/0.32 0.26/0.32
DOPRATRV 0.17/0.47 0.57/0.22 0.16/0.46 0.21/0.44
11Analytical Questions (II)
- Dependencies between social factors and smoking.
ROKVSTUP STAV VZDELANI ZODPOV
KOURENI 0.68/0.07
DOBAKOUR 0.64/0.11 0.26/0.25
BYVKURAK 0.10/0.64 0.42/0.39 0.09/0.65 0.13/0.64
12Analytical Questions (III)
- Dependencies between social factors and alcohol
consumption.
ROKVSTUP STAV VZDELANI ZODPOV
ALKOHOL 0.21/0.35 0.63/0.15 0.19/0.34 0.24/0.31
PIVO10 0.16/0.43 0.58/0.21 0.16/0.43 0.21/0.41
PIVO12 0.10/0.62 0.47/0.39 0.10/0.62 0.13/0.61
VINO 0.16/0.43 0.58/0.21 0.16/0.44 0.21/0.41
LIHOV 0.16/0.43 0.58/0.21 0.16/0.43 0.20/0.41
PIVOMN 0.21/0.33 0.65/0.14 0.20/0.32 0.24/0.29
VINOMN 0.20/0.33 0.64/0.15 0.19/0.33 0.24/0.31
LIHMN 0.20/0.31 0.64/0.14 0.19/0.30 0.25/0.29
13Analytical Questions (IV)
- Dependencies between social factors and physical
features.
ROKVSTUP STAV VZDELANI ZODPOV
BMI 0.16/0.44 0.58/0.23 0.15/0.45 0.20/0.42
SYST1 0.65/0.12 0.25/0.26
DIAST1 0.19/0.32 0.63/0.14 0.19/0.32 0.24/0.30
SYST2 0.65/0.12 0.25/0.25
DIAST2 0.19/0.33 0.63/0.15 0.18/0.33 0.23/0.30
14Analytical Questions (V)
- Dependencies between physical activity and
smoking.
TELAKTZA AKTPOZAM DOPRAVA DOPRATRV
KOURENI 0.50/0.11 0.45/0.13
DOBAKOUR 0.27/0.24 0.47/0.18 0.30/0.24 0.42/0.19
BYVKURAK 0.13/0.62 0.26/0.51 0.15/0.51 0.23/0.55
15Analytical Questions (VI)
- Dependencies between physical activity and
alcohol consumption.
TELAKTZA AKTPOZAM DOPRAVA DOPRATRV
ALKOHOL 0.27/0.31 0.46/0.23 0.29/0.30 0.41/0.25
PIVO10 0.22/0.39 0.40/0.30 0.24/0.39 0.35/0.33
PIVO12 0.14/0.59 0.29/0.50 0.16/0.59 0.23/0.50
VINO 0.22/0.40 0.40/0.31 0.24/0.39 0.35/0.33
LIHOV 0.22/0.39 0.39/0.30 0.24/0.38 0.35/0.33
PIVOMN 0.27/0.29 0.46/0.21 0.30/0.29 0.42/0.24
VINOMN 0.27/0.31 0.46/0.23 0.28/0.30 0.41/0.24
LIHMN 0.27/0.28 0.46/0.21 0.29/0.27 0.41/0.23
16Analytical Questions (VII)
- Dependencies between physical activity and
physical features.
TELAKTZA AKTPOZAM DOPRAVA DOPRATRV
BMI 0.21/0.41 0.39/0.32 0.23/0.40 0.34/0.34
SYST1 0.27/0.26 0.46/0.19 0.29/0.25 0.42/0.21
DIAST1 0.25/0.29 0.44/0.22 0.28/0.29 0.39/0.23
SYST2 0.27/0.25 0.47/0.18 0.29/0.24 0.42/0.20
DIAST2 0.25/0.29 0.45/0.22 0.27/0.29 0.39/0.24
17Analytical Questions (VIII)
- Dependencies between physical activity and
cholesterol degrees.
TELAKTZA AKTPOZAM DOPRAVA DOPRATRV
CHLST 0.28/0.24 0.47/0.17 0.30/0.23 0.42/0.19
TRIGL 0.49/0.13 0.45/0.14
18Analytical Questions (IX)
- Dependencies between alcohol consumption and
physical features.
BMI SYST1 DIAST1 SYST2 DIAST2
ALKOHOL 0.40/0.24 0.25/0.30 0.28/0.29 0.24/0.31 0.28/0.29
PIVO10 0.35/0.33 0.21/0.39 0.38/0.24 0.20/0.40 0.24/0.38
PIVO12 0.25/0.52 0.14/0.60 0.16/0.59 0.13/0.60 0.17/0.58
VINO 0.35/0.32 0.21/0.40 0.24/0.38 0.20/0.40 0.24/0.38
LIHOV 0.35/0.33 0.21/0.40 0.24/0.38 0.20/0.40 0.24/0.38
PIVOMN 0.41/0.23 0.25/0.28 0.29/0.27 0.25/0.29 0.29/0.27
VINOMN 0.40/0.24 0.25/0.30 0.28/0.28 0.24/0.30 0.28/0.28
LIHMN 0.41/0.22 0.25/0.28 0.29/0.27 0.24/0.28 0.29/0.27
19Analytical Questions (X)
- Dependencies between alcohol consumption and
smoking.
KOURENI DOBAKOUR BYVKURAK
ALKOHOL 0.23/0.30 0.61/0.15
PIVO10 0.13/0.44 0.20/0.40 0.56/0.22
PIVO12 0.08/0.65 0.13/0.60 0.44/0.40
VINO 0.13/0.44 0.20/0.40 0.56/0.22
LIHOV 0.13/0.44 0.20/0.40 0.56/0.22
PIVOMN 0.23/0.28 0.61/0.14
VINOMN 0.23/0.30 0.61/0.15
LIHMN 0.24/0.28 0.62/0.14
20Analytical Questions (XI)
- Dependencies between skin folds and BMI,
- TRIC ? BMI, supp 15.85, CF 0.54
- SUBSC ? BMI, supp 17.28, CF 0.58
21Concluding Remarks
- FADs allow us to discover relations within
imprecise or uncertain data. - Experts aid is desirable.
- Data preprocessing.
- Results interpretation.