Title: Ingen lysbildetittel
1Multinomial Logistic regression Model Fit and
Predictive Properties. Breast Tumour Type
predicted from FNA. Stian Lydersen, Ingvild
Bore, Anna Bofin stian.lydersen_at_ntnu.no Presented
at ISCB, Alexandroupolis, 29 July 2 August 2007
2Assessment of model fit and predictive properties
- Measures of performance
- Sensitivities, specificities
- Area under the ROC curves
- Estimated probabilities versus outcomes
- A k-1 dimensional plot
- A multinomial g.o.f. test. (Fagerland, Hosmer,
Bofin, submitted 2007.) - Performance in which data set(s)
- In the original data set
- Delete one-at-a time crossvalidation
- Training set and test set
- Bootstrapping. (Steyerberg et al, 2001)
3Dependent variableDiagnostic groups (Breast
tumour types)
- NPBD/PBD - non-proliferative/proliferative
breast disease - AIDH/DCIS - atypical intraductal hyperplasia
/ductal carcinoma in situ - IDC - invasive ductal carcinoma
4FNA Fine Needle Aspiration
5The smear...
3-15 slides/case
6(No Transcript)
7(No Transcript)
8(No Transcript)
9(No Transcript)
10(No Transcript)
11Predictors Cytological criteria
- Nuclear pleomorphism
- Nuclear size
- Chromatin pattern
- Nucleolus
- Signs of invasion
- Degree of dissociation
- Bipolar naked nuclei
- Myoepithelial cells
- Complex 3-D fragments
- Fragments with cribriform pattern
- Calcium deposits
- Necrosis
- Cytoplasmic vacuoles
- Tubular groups with angular edges
- Cellularity
12(No Transcript)
13Multinomial logistic regression
- Dependent variable 3 diagnostic groups
- Predictors 16 candidates (ordinal or
dichotomous) - n133 patients
- Forward LR stepwise variable selection.
Bofin, Lydersen, Hagmar, Diagnostic
Cytopathology, 2004 Bore Statistisk analyse av
celleprøver innen kreftdiagnose. Master thesis,
NTNU, 2007.
14Resulting modelwith P-enter 0.05 and P-remove
0.10.
- g1 lnPNPBD/PBD /PAIDH/DCIS 1.40 1.38pleo
2.66calcium 2.10naked 1.78comp 0.0048myo - g2 lnPIDC /PAIDH/DCIS 1.12 1.40pleo
4.02calcium 0.91naked 2.18comp 1.56myo
15Prediction
- Probability for each of the diagnostic groups
- PNPBD/PBD exp(g1)/1 exp(g1) exp(g2)
- PAIDH/DCIS 1/1 exp(g1) exp(g2)
- PIDC exp(g2)/1 exp(g1) exp(g2)
- Predicted group The one with highest estimated
probability
16- Diagnostic groups
- NPBD or PBD
- AIDH or DCIS
- IDC
17(No Transcript)
18(No Transcript)
19(No Transcript)
20Area under the receiver operator curve NPBD/PBD
-ROC1- AIDH/DCIS -ROC2- IDC
21Steyerberg et al (Medical Decision Making, 2001)
Prognostic Modeling with Logistic regression
Analysis In Search of a Sensible Strategy in
Small data Sets
- Take B (f.ex 200) bootstrap samples from the
original sample, identical in size and with
replacement - In each sample, fit the model and calculate the
performance measures (f.ex area under ROC and
calibration slope) - Evaluate each of the B models in the original
sample. The difference between performance
measure in the bootstrap sample and the original
sample is an estimate of overoptimism.
Problem Fails to converge in about 50 of our
samples
22Assessment of model fit and predictive properties
- Measures of performance
- Sensitivities, specificities
- Area under the ROC curves
- Estimated probabilities versus outcomes
- A k-1 dimensional plot
- A multinomial g.o.f. test. (Fagerland, Hosmer,
Bofin, submitted 2007.) - Performance in which data set(s)
- In the original data set
- Delete one-at-a time crossvalidation
- Training set and test set
- Bootstrapping. (Steyerberg et al, 2001)