Title: Data Mining in Pharmacovigilence
1Data Mining in Pharmacovigilence
David Madigan, Aimin Feng, Ivan Zorych Rutgers
University dmadigan_at_rutgers.edu http//stat.rutger
s.edu/madigan
2What is Data Mining?
- Finding interesting structure in data
- Structure refers to statistical patterns,
predictive models, hidden relationships - Examples of tasks addressed by Data Mining
- Predictive Modeling (classification, regression)
- Segmentation (Data Clustering )
- Anomaly Detection
- Visualization
3Ronny Kohavi, ICML 1998
4(No Transcript)
5(No Transcript)
6Safety in Lifecycle of a Drug/Biologic product
A P P R O V A L
Phase 1
Phase 2
Pre-clinical
Phase 3
Post- Marketing Safety Monitoring
Safety Dose-Ranging
Safety
Safety Efficacy
Safety
Safety Concern
7Why Post-marketing Surveillance
- Limitations on pre-licensure trials
- Size
- Duration
- Patient population age, comorbidity, severity
- Fact
- Several hundred drugs have been removed from
market in the last 30 years due to safety
problems which became known after approval
8(No Transcript)
9Databases of Spontaneous ADRs
- FDA Adverse Event Reporting System (AERS)
- Online 1997 replace the SRS
- Over 250,000 ADRs reports annually
- 15,000 drugs - 16,000 ADRs
- CDC/FDA Vaccine Adverse Events (VAERS)
- Initiated in 1990
- 12,000 reports per year
- 50 vaccines and 700 adverse events
- Other SRS
- WHO - international pharmacovigilance program
10(No Transcript)
11Weakness of SRS Data
- Passive surveillance
- Underreporting
- Lack of accurate denominator, only numerator
- Numerator No. of reports of suspected reaction
- Denominator No. of doses of administered drug
- Lack of known background rates of disease
- No certainty that a reported reaction was causal
- Missing, inaccurate or duplicated data
12Databases of Spontaneous ADRs
- FDA Adverse Event Reporting System (AERS)
- Online 1997 replace the SRS
- Over 250,000 ADRs reports annually
- 15,000 drugs - 16,000 ADRs
- CDC/FDA Vaccine Adverse Events (VAERS)
- Initiated in 1990
- 12,000 reports per year
- 50 vaccines and 700 adverse events
- Other SRS
- WHO - international pharmacovigilance program
13Weakness of SRS Data
- Passive surveillance
- Underreporting
- Lack of accurate denominator, only numerator
- Numerator No. of reports of suspected reaction
- Denominator No. of doses of administered drug
- No certainty that a reported reaction was causal
- Missing, inaccurate or duplicated data
14Existing Methods
- Multi-item Gamma Poisson Shrinker (MGPS)
- US Food and Drug Administration (FDA)
- Bayesian Confidence Propagation Neural Network
- WHO Uppsala Monitoring Centre (UMC)
- Proportional Reporting Ratio (PRR and aPRR)
- UK Medicines Control Agency (MCA)
- Reporting Odds Ratios and Incidence Rate Ratios
- Other national spontaneous reporting centers and
drug safety research units
15Existing Methods (Contd)
- Focus on 2X2 contingency table projections
- 15,000 drugs 16,000 AEs 240 million tables
- Most Nij 0, even though N.. very large
16The Different Measures
17These Measures not Robust
18These Measures not Robust
Nij/Eij same in both cases
19Think about this
Denote by q the probability that the next
operation in Hospital A results in a death Use
the data to estimate (i.e., guess the value of) q
20Think about this
Denote by qi the probability that the next
operation in Hospital i results in a death Assume
qi beta(a,b) Compute joint posterior
distribution for all the qi simultaneously
21Borrowing strength Shrinks estimate towards
common mean (7.4) Empirical Bayes use the data
to estimate a and b
22Relative Reporting Ratio (RRijNij/Eij )
- Advantages
- Simple
- Easy to interpret
- Disadvantages
- Extreme sampling variability when baseline and
observed frequencies are small - (N1, E0.01 v.s. N100, E1)
- GPS provides a shrinkage estimate of RR that
addresses this concern.
EijNijN../Ni.N.j
23GPS/MGPS
- GPS/MGPS follows the same recipe as for the
hospitals - Denote by ?ij the true RR for Drug i and AE j
- Assumes the ?ijs arise from a particular
5-parameter distribution - Use empirical Bayes to use the data to estimate
these five parameters.
24GPS-EBGM
- Define ?ij ?ij / Eij , where
- Nij ? Poisson( ?ij )
- ?ij ? ? p g(? ?1,?1) (1-p) g(? ?2,?2)
- a mixture of two Gamma Distributions
- EBGM Geometric mean of Post-Dist. of ?ij
- Estimates of ?ij / Eij
- Shrinks Nij /Eij
- EB05
25GPS SHRINKAGE AERS DATA
number of reports
26Simpson's Paradox
- Contingency table analysis ignores effects of
drug-drug association on drug-AE association
Ganclex
X
Rosinex
Nausea
27Rosinex Ganclex Nausea
P(Rosinex1)0.1
P(Ganclex1Rosinex1)0.9 P(Ganclex1Rosinex0)
0.01
P(Nausea1Rosinex1)0.9 P(Nausea1Rosinex0)0.
1
28Logistic Regression
- log P/(1-P) intercept ? (each drug effect )
- P Pr (report with these drugs will have the AE)
- Classic logistic regression hard to scale up
- Huge number of predictors (drugs, drug x drug,
etc.) - Alternative approach
- Bayesian Logistic Regression (Shrinkage Method)
- Put a prior on coefficients (?1,, ?p), and
shrink - their estimates towards zero
- Stabilize the estimation when there are many
predictors - Bayesian solution to the multiple comparison
problem
29Bayesian Logistic Regression
- Two shrinkage methods
- Ridge regression - Gaussian prior
- ?j ? N (0,?)
- Lasso regression - Laplace prior
- f(?j ) ? exp?- ? ?j??
- Choosing hyperparameter ?
- Decide how much to shrink
- Cross-validation choose prior to fit left-out
data - Aggregation method by Bunea and Nobel (2005)
30(No Transcript)
31Bayesian Logistic Regression
- Software Bayesian Binary Regression (BBR)
- http//stat.rutgers.edu/madigan/BBR
- Two priors Gaussian and Laplace
- Hyperparameter fixed, default and CV
- Handles millions of predictors efficiently
- Safety Signal an apparent excess of an adverse
effect associated with use of a drug - Coefficients ?s logs of odds ratios
- Pr(AEj drugi ) - Pr(AEj not drugi )
32Evaluation Strategies
- Top-Rank Plot for Safety Signal
- To compare the timeliness of outbreak detection
- Similar to AMOC (Activity Monitor Operating
Characteristic) curve in fraud detection - Y window (month in 1999)
- X Top rank of association from window 1 to
corresponding window
33RV v.s. INTUSS
- Rotavirus
- Severe diarrhea (with fever and vomiting)
- Hospitalize 55,000 children each year in US
- Intussusception (INTUSS)
- Uncommon type of bowel obstruction
- RotaShield (RV)
- Licensed on 8/31/1998 in US
- Recommended for routine use in infants
- Increased the risk for intussusception
- 1 or 2 cases among each 10,000 infants
- On 10/14/1999, the manufacturer withdrew RV
34(No Transcript)
35(No Transcript)
36Simulation
- Step-by-step procedure
- Choose either a rare (5, 1), intermediate (50,
3), or common (95, 100) vaccine - adverse event
(V-A) combination - Use year 1998 data as baseline
- Add extra report(s) per month of 1999 containing
the chosen V-A combination - Generate the AMOC curve
37(No Transcript)
38(No Transcript)
39(No Transcript)
40(No Transcript)
41Conclusions of Simulation
- The Bayesian Logistic Regressions (Normal-CV and
Laplace-CV) signal consistently, and are at least
as good as the MGPS method - Simple RR cannot signal for intermediate and
common cases - GPS is relatively good on rare and intermediate
cases, but not stable on common cases - Pattern of dependencies in AERS likely to more
complex than VAERS
42Discussion of Logistic Method
- Advantages over low-dimensional tables
- Corrects for some confounding and masking effects
- Analyze multiple drugs/vaccines simultaneously
- Limitations
- Build separate model for each AE
- Ignore dependencies between AEs
- Fail to adjust for unmeasured/unrecorded factors
- health status, unreported drugs, etc.
- Model-based approach
- Require model assumptions
43Causal Inference View
- Rubin causal model
- Potential outcomes
- Factual outcome
- Im a smoker and I get lung cancer
- Counterfactual outcome
- If I hadnt been a smoker, I wouldnt have gotten
lung cancer - Define
- Zi treatment applied to unit i (0control,
1treat) - Yi (0) response for unit i if Zi 0
- Yi (1) response for unit i if Zi 1
- Unit level causal effect Yi (1) - Yi (0)
- Fundamental problem only see one of these!
44(No Transcript)
45Bias Due To Confounding
- Individuals are observed already under their
respective conditions - The two groups may differ in ways other than just
the observed condition - Average effects may be biased due to confounding
between covariates and group condition - We can simulate randomization or counterfactual
world using information from observational
studysort of
46Propensity Score Method
- Definition
- e(xi) P(Zi1 Xixi)
- Conditional probability of assignment to test
treatment Zi1 given observed covariates - Assuming no unmeasured confounders, stratifying
on e(xi) leads to causal inferences just as valid
as in randomized trials - Methods with propensity scores
- Inverse weighting
- Regression adjustment
- Matching
47(No Transcript)
48(No Transcript)
49(No Transcript)
50Conclusion
- First generation Method
- Contingency table methods
- Deal with each drug and each adverse event in
isolation - Second generation Method
- Bayesian logistic regression
- Propensity score
- Deal with large numbers of drugs jointly and with
multi-drug interactions - Ultimate Method
- Not only interactions and relationships among
drugs , but also adverse events - Question which sets of drugs cause which sets of
adverse events?
51(No Transcript)
52Overview
- Brief Introduction to Data Mining
- Data Mining Algorithms
- Currently fashionable DMAs for drug safety
- Future Directions, etc.
53Of Laws, Monsters, and Giants
- Moores law processing capacity doubles every
18 months CPU, cache, memory - Its more aggressive cousin
- Disk storage capacity doubles every 9 months
What do the two laws combined produce? A
rapidly growing gap between our ability to store
data, and our ability to make use of it.
54Data Mining Algorithms
A data mining algorithm is a well-defined
procedure that takes data as input and produces
output in the form of models or patterns
Hand, Mannila, and Smyth
well-defined can be encoded in
software algorithm must terminate after some
finite number of steps
55Algorithm Components
1. The task the algorithm is used to address
(e.g. classification, clustering, etc.) 2. The
structure of the model or pattern we are fitting
to the data (e.g. a linear regression model) 3.
The score function used to judge the quality of
the fitted models or patterns (e.g. accuracy,
BIC, etc.) 4. The search or optimization method
used to search over parameters and/or structures
(e.g. steepest descent, MCMC, etc.) 5. The data
management technique used for storing, indexing,
and retrieving data (critical when data too large
to reside in memory)
56Association Rules Support and Confidence
Customer buys both
- Find all the rules Y ? Z with minimum confidence
and support - support, s, probability that a transaction
contains Y Z - confidence, c, conditional probability that a
transaction having Y Z also contains Z
Customer buys diapers
Customer buys beer
- Let minimum support 50, and minimum confidence
50, we have - A ? C (50, 66.6)
- C ? A (50, 100)
57Mining Association RulesAn Example
Min. support 50 Min. confidence 50
- For rule A ? C
- support support(A C) 50
- confidence support(A C)/support(A) 66.6
- The Apriori principle
- Any subset of a frequent itemset must be frequent
58Mining Frequent Itemsets the Key Step
- Find the frequent itemsets the sets of items
that have minimum support - A subset of a frequent itemset must also be a
frequent itemset - i.e., if AB is a frequent itemset, both A and
B should be a frequent itemset - Iteratively find frequent itemsets with
cardinality from 1 to k (k-itemset) - Use the frequent itemsets to generate association
rules.
59The Apriori Algorithm Example
Database D
L1
C1
Scan D
C2
C2
L2
Scan D
C3
L3
Scan D
60Association Rule Mining A Road Map
- Boolean vs. quantitative associations (Based on
the types of values handled) - buys(x, SQLServer) buys(x, DMBook)
buys(x, DBMiner) 0.2, 60 - age(x, 30..39) income(x, 42..48K)
buys(x, PC) 1, 75 - Single dimension vs. multiple dimensional
associations (see ex. Above) - Single level vs. multiple-level analysis
- What brands of beers are associated with what
brands of diapers? - Various extensions (thousands!)
61(No Transcript)
62Statistics
The subject of statistics concerns itself with
using data to make inferences and predictions
about the world Researchers assembled the vast
bulk of the statistical knowledge base prior to
the availability of significant computing Lots of
assumptions and brilliant mathematics took the
place of computing and led to useful and
widely-used tools Serious limits on the
applicability of many of these methods small
data sets, unrealistically simple models,
Produce hard-to-interpret outputs like p-values
and confidence intervals
63Bayesian Statistics
The Bayesian approach has deep historical roots
but required the algorithmic developments of the
late 1980s before it was of any use The old
sterile Bayesian-Frequentist debates are a thing
of the past Most data analysts take a pragmatic
point of view and use whatever is most useful
64Bayes Theorem
65Bayes Theorem Example
66Bayes Theorem for Densities
67Hospital Example (0/27)
prior distribution
likelihood
posterior distribution
68(No Transcript)
69Unreasonable prior distribution implies
unreasonable posterior distribution
700.032
0.023
What to report? Mode? Mean? Median? Posterior
probability that theta exceeds 0.2? theta such
that Pr(theta gt theta) 0.05 theta such that
Pr(theta gt theta) 0.95
0.013
0.095
0.002
Posterior probability that theta is in
(0.002,0.095) is 90