Title: Discovery Challenge
1Discovery Challenge ECML/PKDD2004September 20,
2004, Pisa, ItalyAtherosclerosis
- Marie Tomecková
- EuroMISE Centre Cardio
- Institute of Computer Science, Academy of
Sciences of the CR, Prague, The Czech Republic - Supported by the project LN00B107 of the Ministry
of Education of the Czech Republic
2Atherosclerosis
- a total complicated disease of the vessels in all
organism - a dynamic process, it begins in childhood and
adolescence and continues for the whole life - opinions on the origin and progress of the
disease are developing - interaction and influence of genetic
predisposition and exterior environment - the influence of so-called risk factors is still
regarded - On the other hand there some so-called
protective factors
3Risk factors of atherosclerosis
- non-affectable sex, age, family history
- affectable
- factors of life style
- physical activity
- smoking
- reaction on stress
- blood pressure, metabolic factors - level of
lipids and glucose, homocystein - many other factors coagulopathies, infections,
inflammation, factors changing the function of
endothelium, social and psychological factors - combinations, clustering and interactivity
Reavens syndrom
4STULONG(acronym)
- LONGitudinal twenty years lasting STUdy of risk
factors of atherosclerosis - The study was realized in the years 1975-2000 on
the 2nd Dept. of Internal Medicine, 1st Faculty
of Medicine of Charles University, Prague, the
Czech Republic - The data were transferred to the electronic form
by the European Centre of Medical Informatics,
Statistisc, and Epidemiology of Charles
University and Academy of Sciences of the Czech
Republic
5STULONG
- Main aims of the study
- To determine prevalence of the risk factors of
atherosclerosis in middle-aged men - To follow up the development of the risk factors
- To asses the possibilities and the influence of
the complex intervention on the incidence and
values of the risk factors and on the
cardiovascular mortality
6Population
- urban population of middle-aged men (centre of
Prague) -
- 2370 men have been invited
- 1417 men have been examined, the respondence was
59 - Middle-aged men it is the population mostly
threatened by the atherosclerosis and by its
consequenses
7Definition of risk factors
- blood pressure ? 160/95 mm Hg
- cholesterol ? 260 mg (6,7 mmol/l)
- smoking ? 15 cigarettes/day
- obesity ? 15 above optimal weight
- positive family history prematured death on the
atherosclerotic diseases (parents, siblings) -
8STULONG - analysis
- Statistical - descriptive statistics
- - logistic regression
- - survival analysis
- Data mining - different methods
- - resulting in
different conclusions
9Basic characteristics of men in STULONG (risk
group - at least 1 RF, without the disease))
- Prevalence of risk factors at the entry
RF n
hypercholesterolemia 290 34.2
hypertension 287 34.0
smoking 543 63.3 !!!
obesity 196 23.0
positive family history 216 25.3
10Prevalence of risk factors in risk group
11Basic characteristics of men in STULONG(risk
group, age 46.13.6)
mean s ()
Nr of RF 1.7 2.0
cholesterol (mmol/l) 6.25 5.4
systolic blood pressure (mm Hg) 134.4 67.3
diastolic blood pressure (mm Hg) 85.3 47.5
Nr of cig/day 9.4 25.0
Brocca index () 106.8 47.4
12Mortality depending on the number of
RF(atherosclerotic cardiovascular diseases)
13Survival analysis
14The relative risk of death caused by
atheroslcerotic CVD
15Discovery Challenges
- Atherosclerosis growing number of the papers
- 2002 Helsinki .5 papers
- 2003 Cavtat 9 papers
- 2004 Pisa .. 11 papers
16Four data files foranalysis data mining
- Entry - attributes obtained from entry
examination 1417 men 244 attributes of each
men - Control attributes recorded during the follow
up (changing of the social and health status,
values of follow risk factors, therapy ) 10
600 investigations each with 66 attributes - Letter additional information collected at the
end of the study by the postal questionnaire
(men, who disscharged the following) - 403 men
62 attributes of each men - Death date and cause of death 389 men
17Four groups of analytic questions
- Related to
- the entry examination
- the long - term observation follow-up
- the postal questionnaire at the end of the
study - the relations concerning entry examination,
control examination, and death
18Approaches to solve the analytic questions
1given in the past Discovery Challenges
- Univariated and bivariated data analysis
- Assiciation rules
- SDS rules (Set Differs of Set)
- Trend analysis
- Time windows analysis
- ROC analysis
- Disciminate function
19Approaches to solve the analytic questions 2
- Fuzzy approximate dependencies, fuzzy logic
- Functional dependencies
- Inductive logic programming technigue
- Explicit relations
- The selection of the strongest emerging patterns
- Genetic approach
- Approach to generate a mathematical algebraic
model
20Analytic guestions - some results
- Protective influence of number of the visits
- Protective influence of the beer drinking, but
not of the wine drinking - Correlation of Body Mass Index with the skin
foldes very good discrimination of the three
basic groups of men (normal, risk, pathological)
21Further use and publications of the STULONG data
- are possible only under the condition of the
following explicit quotation - The study (STULONG) was realized at the 2nd
Department of - Internal Medicine, 1st Faculty of Medicine of
Charles University - and University Hospital, Prague 2, Czech Republic
(head Prof. - M. Aschermann, MD, SDr, FECS), under the
supervision of Prof. - F. Boudík, MD, SDr, with the collaboration of M.
Tomecková, MD, - PhD, and Ass. Prof. J. Bultas, MD, PhD. The data
were transferred - to the electronic form by the European Centre of
Medical Informatics, - Statistisc, and Epidemiology of Charles
University and Academy of - Sciences of Czech Republic (head Prof. RNDr J.
Zvárová, SDr). - At present time, the data analysis is supported
by the project Nr. - LN 00B 107 of the Ministry of Education of the CR.
22- Thank you
- for your effort in the STULONG data set analysis
and for your attention -
Marie Tomecková -
EuroMISE Centre Cardio -
Pod Vodárenskou veží 2 -
182 07 Prague, The Czech Republic - tomeckova_at_euromise.cz
- http//www.euromise.cz
-