Title: View Learning: An extension to SRL An application in Mammography
1View Learning An extension to SRL An application
in Mammography
- Jesse Davis, Beth Burnside, Inês Dutra Vítor
Santos Costa, David Page, Jude Shavlik Raghu
Ramakrishnan
2Background
- Breast cancer is the most common cancer
- Mammography is the only proven screening test
- At this time approximately 61 of women have had
a mammogram in the last 2 years - Translates into 20 million mammograms per year
3The Problem
- Radiologists interpret mammograms
- Variability in among radiologists
- differences in training and experience
- Experts have higher cancer detection and less
benign biopsies - Shortage of experts
4Common Mammography findings
- Microcalcifications
- Masses
- Architectural distortion
5Calcifications
6Mass
7Architectural distortion
8Other important features
- Microcalcifications
- Shape, distribution, stability
- Masses
- Shape, margin, density, size, stability
- Associated findings
- Breast Density
9Other variables influence risk
- Demographic risk factors
- Family History
- Hormone therapy
- Age
10Standardization of Practice
- -Passage of the Mammography Quality Standards Act
(MQSA) in 1992 - -Requires tracking of patient outcomes through
regular audits of mammography interpretations and
cases of breast cancer - -Standardized lexicon BI-RADS was developed
incorporating 5 categories that include 43 unique
descriptors
11BI-RADS
Margins -circumscribed -microlobulated -obscured -
indistinct -Spiculated
Special Cases
Mass
Associated Findings
Skin Thickening
Shape -round -oval -lobular -irregular
Tubular Density
Density -high -equal -low -fat containing
Skin Lesion
Lymph Node
Architectural Distortion
Trabecular Thickening
Assymetric Breast Tissue
Calcifications
Typically Benign -skin -vascular -coarse/popcorn -
rod-like -round -lucent-centered -eggshell/rim -mi
lk of calcium -suture -dystrophic -punctate
Nipple Retraction
Distribution -clustered -linear -segmental -region
al -diffuse/scattered
Focal Assymetric Density
Intermediate -amorphous
Axillary Adenopathy
Skin Retraction
Higher Probability Malignancy -pleomorphic -fine/l
inear/branching
12Mammography Database
- Radiologist interpretation of mammogram
- Patient may have multiple mammograms
- A mammogram may have multiple abnormalities
- Expert defined Bayes net for determining whether
an abnormality is malignant
13Original Expert Structure
14Mammography Database
15Types of Learning
- Hierarchy of types of learning that we can
perform on the Mammography database
16Level 1 Parameters
Given Features (node labels, or fields in
database), Data, Bayes net structure Learn
Probabilities. Note probabilities needed are
Pr(Be/Mal), Pr(ShapeBe/Mal), Pr (SizeBe/Mal)
17Level 2 Structure
Be/Mal
Given Features, Data Learn Bayes
net structure and probabilities. Note with this
structure, now will need Pr(SizeShape,Be/Mal)
instead of Pr(SizeBe/Mal).
Shape
Size
18Mammography Database
19Mammography Database
20Mammography Database
21Level 3 Aggregates
Given Features, Data, Background knowledge
aggregation functions such as average, mode, max,
etc. Learn Useful aggregate features,
Bayes net structure that uses these features, and
probabilities. New features may use other
rows/tables.
Avg size this date
Be/Mal
Shape
Size
22Mammography Database
23Mammography Database
24Mammography Database
25Level 4 View Learning
Given Features, Data, Background knowledge
aggregation functions and intensionally-defined
relations such as increase or same
location Learn Useful new features defined by
views (equivalent to rules or SQL queries), Bayes
net structure, and probabilities.
Shape change in abnormality at this location
Increase in average size of abnormalities
Avg size this date
Be/Mal
Shape
Size
26Structure Learning Algorithms
- Three different algorithms
- Naïve Bayes
- Tree Augmented Naïve Bayes (TAN)
- Sparse Candidate Algorithm
27Naïve Bayes Net
- Simple, computationally efficient
-
28Example TAN Net
- Also computationally efficient
- Friedman,Geiger Goldszmidt 97
Class Value
29TAN
- Arc from class variable to each attribute
- Less Restrictive than Naïve Bayes
- Each attribute permitted at most one extra parent
- Polynomial time bound on constructing network
- O(( attributes)2 training set)
- Guaranteed to maximize LL(BT D)
30TAN Algorithm
- Constructs a complete graph between all the
attributes (excluding class variable) - Edge weight is conditional mutual information
between the vertices - Find maximum weight spanning tree over the graph
- Pick root in tree and make edges directed
- Add edges from directed tree to network
31General Bayes Net
32Sparse Candidate
- Friedman et al 97
- No restrictions on directionality of arcs for
class attribute - Limits possible parents for each node to a small
candidate set
33Sparse Candidate Algorithm
- Greedy hill climbing search with restarts
- Initial structure is empty graph
- Score graph using BDe metric (Cooper Herskovits
92, Heckerman 96) - Selects candidate set using an information metric
- Re-estimate candidate set after each restart
34Sparse Candidate Algorithm
- We looked at several initial structures
- Expert structure
- Naïve Bayes
- TAN
- Scored network on tune set accuracy
35Our Initial Approach for Level 4
- Use ILP to learn rules predictive of malignant
- Treat the rules as intensional definitions of new
fields - The new view consists of the original table
extended with the new fields
36Using Views
- malignant(A) -
- massesStability(A,increasing),
- prior_mammogram(A,B,_),
- H0_BreastCA(B,hxDCorLC).
37Sample Rule
- malignant(A) -
- BIRADS_category(A,b5),
- MassPAO(A,present),
- MassesDensity'(A,high),
- HO_BreastCA(A,hxDCorLC),
- in_same_mammogram(A,B),
- Calc_Pleomorphic(B,notPresent),
- Calc_Punctate(B,notPresent).
38Methodology
- 10 fold cross validation
- Split at the patient level
- Roughly 40 malignant cases and 6000 benign cases
in each fold
39Methodology
- Without the ILP rules
- 6 folds for training set
- 3 folds for tuning set
- With ILP
- 4 folds to learn ILP rules
- 3 folds for training set
- 2 folds for tuning set
- TAN/Naïve Bayes dont require tune set
40Evaluation
- Precision and recall curves
- Why not ROC curves?
- With many negatives ROC curves look overly
optimistic - Large change in number of false positives yields
small change in ROC curve - Pooled results over all 10 folds
41ROC Level 2 (TAN) vs. Level 1
42Precision-Recall Curves
43(No Transcript)
44(No Transcript)
45Related Work ILP for Feature Construction
- Pompe Kononenko, ILP95
- Srinivasan King, ILP97
- Perlich Provost, KDD03
- Neville, Jensen, Friedland and Hay, KDD03
46Ways to Improve Performance
- Learn rules to predict benign as well as
malignant. - Use Gleaner (Goadrich, Oliphant Shavlik,
ILP04) to get better spread of Precision vs.
Recall in the learned rules. - Incorporate aggregation into the ILP runs
themselves.
47Richer View Learning Approaches
- Learn rules predictive of other fields.
- Use WARMR or other first-order clustering
approaches. - Integrate Structure Learning and View
Learningscore a rule by how much it helps the
current model when added
48Level 4 View Learning
Given Features, Data, Background knowledge
aggregation functions and intensionally-defined
relations such as increase or same
location Learn Useful new features defined by
views (equivalent to rules or SQL queries), Bayes
net structure, and probabilities.
Shape change in abnormality at this location
Increase in average size of abnormalities
Avg size this date
Be/Mal
Shape
Size
49Integrated View/Structure Learning
sc(X)- id(X,P), id(Y,P), loc(X,L), loc(Y,L),
date(Y,D1), date(X,D2), before(D1,D2),
shape(X,Sh1), shape(Y,Sh2), Sh1 \ Sh2.
Increase in average size of abnormalities
Avg size this date
Be/Mal
Shape
Size
50Integrated View/Structure Learning
sc(X)- id(X,P), id(Y,P), loc(X,L), loc(Y,L),
date(Y,D1), date(X,D2), before(D1,D2),
shape(X,Sh1), shape(Y,Sh2), Sh1 \ Sh2,
size(X,S1), size(Y,S2), S1 gt S2.
Increase in average size of abnormalities
Avg size this date
Be/Mal
Shape
Size
51Integrated View/Structure Learning
sc(X)- id(X,P), id(Y,P), loc(X,L), loc(Y,L),
date(Y,D1), date(X,D2), before(D1,D2),
shape(X,Sh1), shape(Y,Sh2), Sh1 \ Sh2,
size(X,S1), size(Y,S2), S1 gt S2.
Increase in average size of abnormalities
Avg size this date
Be/Mal
Shape
Size
52Integrated View/Structure Learning
sc(X)- id(X,P), id(Y,P), loc(X,L), loc(Y,L),
date(Y,D1), date(X,D2), before(D1,D2),
shape(X,Sh1), shape(Y,Sh2), Sh1 \ Sh2,
size(X,S1), size(Y,S2), S1 gt S2.
Avg size this date
Be/Mal
Shape
Size
53Richer View Learning (Cont.)
- Learning new tables
- Just rules for non-unary predicates
- Train on pairs of malignancies for the same
mammogram or patient - Train on pairs (triples, etc.) of fields, where
pairs of values that appear in rows for malignant
abnormalities are positive examples, while those
that appear only in rows for benign are negative
examples
54Conclusions
- Graphical models over databases were originally
limited to the schema provided - Humans find it useful to define new views of a
database (new fields or tables intensionally
defined from existing data) - View learning appears to have promise for
increasing the capabilities of graphical models
over relational databases, perhaps other SRL
approaches
55WILD Group
- Jesse Davis
- Beth Burnside
- Ines Dutra
- Vitor Santos Costa
- Raghu Ramakrishnan
- Jude Shavlik
- David Page
- Others
- Hector Corrada-Bravo
- Irene Ong
- Mark Goadrich
- Louis Oliphant
- Bee-Chung Chen