Data Mining in Pharmacovigilence - PowerPoint PPT Presentation

1 / 70

About This Presentation

Title:

Data Mining in Pharmacovigilence

Description:

Safety in Lifecycle of a Drug/Biologic product. Phase 1. Phase 2. Pre-clinical. Phase 3 ... Limitations on pre-licensure trials. Size. Duration ... – PowerPoint PPT presentation

Number of Views:191

Avg rating:3.0/5.0

Slides: 71

Provided by: davidm45

Category:

more less

Transcript and Presenter's Notes

Title: Data Mining in Pharmacovigilence

1
Data Mining in Pharmacovigilence
David Madigan, Aimin Feng, Ivan Zorych Rutgers
University dmadigan_at_rutgers.edu http//stat.rutger
s.edu/madigan
2
What is Data Mining?

Finding interesting structure in data
Structure refers to statistical patterns,
predictive models, hidden relationships
Examples of tasks addressed by Data Mining
Predictive Modeling (classification, regression)
Segmentation (Data Clustering )
Anomaly Detection
Visualization

3
Ronny Kohavi, ICML 1998
4
(No Transcript)
5
(No Transcript)
6
Safety in Lifecycle of a Drug/Biologic product
A P P R O V A L
Phase 1
Phase 2
Pre-clinical
Phase 3
Post- Marketing Safety Monitoring
Safety Dose-Ranging
Safety
Safety Efficacy
Safety
Safety Concern
7
Why Post-marketing Surveillance

Limitations on pre-licensure trials
Size
Duration
Patient population age, comorbidity, severity
Fact
Several hundred drugs have been removed from
market in the last 30 years due to safety
problems which became known after approval

8
(No Transcript)
9
Databases of Spontaneous ADRs

FDA Adverse Event Reporting System (AERS)
Online 1997 replace the SRS
Over 250,000 ADRs reports annually
15,000 drugs - 16,000 ADRs
CDC/FDA Vaccine Adverse Events (VAERS)
Initiated in 1990
12,000 reports per year
50 vaccines and 700 adverse events
Other SRS
WHO - international pharmacovigilance program

10
(No Transcript)
11
Weakness of SRS Data

Passive surveillance
Underreporting
Lack of accurate denominator, only numerator
Numerator No. of reports of suspected reaction
Denominator No. of doses of administered drug
Lack of known background rates of disease
No certainty that a reported reaction was causal
Missing, inaccurate or duplicated data

12
Databases of Spontaneous ADRs

FDA Adverse Event Reporting System (AERS)
Online 1997 replace the SRS
Over 250,000 ADRs reports annually
15,000 drugs - 16,000 ADRs
CDC/FDA Vaccine Adverse Events (VAERS)
Initiated in 1990
12,000 reports per year
50 vaccines and 700 adverse events
Other SRS
WHO - international pharmacovigilance program

13
Weakness of SRS Data

Passive surveillance
Underreporting
Lack of accurate denominator, only numerator
Numerator No. of reports of suspected reaction
Denominator No. of doses of administered drug
No certainty that a reported reaction was causal
Missing, inaccurate or duplicated data

14
Existing Methods

Multi-item Gamma Poisson Shrinker (MGPS)
US Food and Drug Administration (FDA)
Bayesian Confidence Propagation Neural Network
WHO Uppsala Monitoring Centre (UMC)
Proportional Reporting Ratio (PRR and aPRR)
UK Medicines Control Agency (MCA)
Reporting Odds Ratios and Incidence Rate Ratios
Other national spontaneous reporting centers and
drug safety research units

15
Existing Methods (Contd)

Focus on 2X2 contingency table projections
15,000 drugs 16,000 AEs 240 million tables
Most Nij 0, even though N.. very large

16
The Different Measures
17
These Measures not Robust
18
These Measures not Robust
Nij/Eij same in both cases
19
Think about this
Denote by q the probability that the next
operation in Hospital A results in a death Use
the data to estimate (i.e., guess the value of) q
20
Think about this
Denote by qi the probability that the next
operation in Hospital i results in a death Assume
qi beta(a,b) Compute joint posterior
distribution for all the qi simultaneously
21
Borrowing strength Shrinks estimate towards
common mean (7.4) Empirical Bayes use the data
to estimate a and b
22
Relative Reporting Ratio (RRijNij/Eij )

Advantages
Simple
Easy to interpret
Disadvantages
Extreme sampling variability when baseline and
observed frequencies are small
(N1, E0.01 v.s. N100, E1)
GPS provides a shrinkage estimate of RR that
addresses this concern.

EijNijN../Ni.N.j
23
GPS/MGPS

GPS/MGPS follows the same recipe as for the
hospitals
Denote by ?ij the true RR for Drug i and AE j
Assumes the ?ijs arise from a particular
5-parameter distribution
Use empirical Bayes to use the data to estimate
these five parameters.

24
GPS-EBGM

Define ?ij ?ij / Eij , where
Nij ? Poisson( ?ij )
?ij ? ? p g(? ?1,?1) (1-p) g(? ?2,?2)
a mixture of two Gamma Distributions
EBGM Geometric mean of Post-Dist. of ?ij
Estimates of ?ij / Eij
Shrinks Nij /Eij
EB05

25
GPS SHRINKAGE AERS DATA
number of reports
26
Simpson's Paradox

Contingency table analysis ignores effects of
drug-drug association on drug-AE association

Ganclex
X
Rosinex
Nausea
27
Rosinex Ganclex Nausea
P(Rosinex1)0.1
P(Ganclex1Rosinex1)0.9 P(Ganclex1Rosinex0)
0.01
P(Nausea1Rosinex1)0.9 P(Nausea1Rosinex0)0.
1
28
Logistic Regression

log P/(1-P) intercept ? (each drug effect )
P Pr (report with these drugs will have the AE)
Classic logistic regression hard to scale up
Huge number of predictors (drugs, drug x drug,
etc.)
Alternative approach
Bayesian Logistic Regression (Shrinkage Method)
Put a prior on coefficients (?1,, ?p), and
shrink
their estimates towards zero
Stabilize the estimation when there are many
predictors
Bayesian solution to the multiple comparison
problem

29
Bayesian Logistic Regression

Two shrinkage methods
Ridge regression - Gaussian prior
?j ? N (0,?)
Lasso regression - Laplace prior
f(?j ) ? exp?- ? ?j??
Choosing hyperparameter ?
Decide how much to shrink
Cross-validation choose prior to fit left-out
data
Aggregation method by Bunea and Nobel (2005)

30
(No Transcript)
31
Bayesian Logistic Regression

Software Bayesian Binary Regression (BBR)
http//stat.rutgers.edu/madigan/BBR
Two priors Gaussian and Laplace
Hyperparameter fixed, default and CV
Handles millions of predictors efficiently
Safety Signal an apparent excess of an adverse
effect associated with use of a drug
Coefficients ?s logs of odds ratios
Pr(AEj drugi ) - Pr(AEj not drugi )

32
Evaluation Strategies

Top-Rank Plot for Safety Signal
To compare the timeliness of outbreak detection
Similar to AMOC (Activity Monitor Operating
Characteristic) curve in fraud detection
Y window (month in 1999)
X Top rank of association from window 1 to
corresponding window

33
RV v.s. INTUSS

Rotavirus
Severe diarrhea (with fever and vomiting)
Hospitalize 55,000 children each year in US
Intussusception (INTUSS)
Uncommon type of bowel obstruction
RotaShield (RV)
Licensed on 8/31/1998 in US
Recommended for routine use in infants
Increased the risk for intussusception
1 or 2 cases among each 10,000 infants
On 10/14/1999, the manufacturer withdrew RV

34
(No Transcript)
35
(No Transcript)
36
Simulation

Step-by-step procedure
Choose either a rare (5, 1), intermediate (50,
3), or common (95, 100) vaccine - adverse event
(V-A) combination
Use year 1998 data as baseline
Add extra report(s) per month of 1999 containing
the chosen V-A combination
Generate the AMOC curve

37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
Conclusions of Simulation

The Bayesian Logistic Regressions (Normal-CV and
Laplace-CV) signal consistently, and are at least
as good as the MGPS method
Simple RR cannot signal for intermediate and
common cases
GPS is relatively good on rare and intermediate
cases, but not stable on common cases
Pattern of dependencies in AERS likely to more
complex than VAERS

42
Discussion of Logistic Method

Advantages over low-dimensional tables
Corrects for some confounding and masking effects
Analyze multiple drugs/vaccines simultaneously
Limitations
Build separate model for each AE
Ignore dependencies between AEs
Fail to adjust for unmeasured/unrecorded factors
health status, unreported drugs, etc.
Model-based approach
Require model assumptions

43
Causal Inference View

Rubin causal model
Potential outcomes
Factual outcome
Im a smoker and I get lung cancer
Counterfactual outcome
If I hadnt been a smoker, I wouldnt have gotten
lung cancer
Define
Zi treatment applied to unit i (0control,
1treat)
Yi (0) response for unit i if Zi 0
Yi (1) response for unit i if Zi 1
Unit level causal effect Yi (1) - Yi (0)
Fundamental problem only see one of these!

44
(No Transcript)
45
Bias Due To Confounding

Individuals are observed already under their
respective conditions
The two groups may differ in ways other than just
the observed condition
Average effects may be biased due to confounding
between covariates and group condition
We can simulate randomization or counterfactual
world using information from observational
studysort of

46
Propensity Score Method

Definition
e(xi) P(Zi1 Xixi)
Conditional probability of assignment to test
treatment Zi1 given observed covariates
Assuming no unmeasured confounders, stratifying
on e(xi) leads to causal inferences just as valid
as in randomized trials
Methods with propensity scores
Inverse weighting
Regression adjustment
Matching

47
(No Transcript)
48
(No Transcript)
49
(No Transcript)
50
Conclusion

First generation Method
Contingency table methods
Deal with each drug and each adverse event in
isolation
Second generation Method
Bayesian logistic regression
Propensity score
Deal with large numbers of drugs jointly and with
multi-drug interactions
Ultimate Method
Not only interactions and relationships among
drugs , but also adverse events
Question which sets of drugs cause which sets of
adverse events?

51
(No Transcript)
52
Overview

Brief Introduction to Data Mining
Data Mining Algorithms
Currently fashionable DMAs for drug safety
Future Directions, etc.

53
Of Laws, Monsters, and Giants

Moores law processing capacity doubles every
18 months CPU, cache, memory
Its more aggressive cousin
Disk storage capacity doubles every 9 months

What do the two laws combined produce? A
rapidly growing gap between our ability to store
data, and our ability to make use of it.
54
Data Mining Algorithms
A data mining algorithm is a well-defined
procedure that takes data as input and produces
output in the form of models or patterns
Hand, Mannila, and Smyth
well-defined can be encoded in
software algorithm must terminate after some
finite number of steps
55
Algorithm Components
1. The task the algorithm is used to address
(e.g. classification, clustering, etc.) 2. The
structure of the model or pattern we are fitting
to the data (e.g. a linear regression model) 3.
The score function used to judge the quality of
the fitted models or patterns (e.g. accuracy,
BIC, etc.) 4. The search or optimization method
used to search over parameters and/or structures
(e.g. steepest descent, MCMC, etc.) 5. The data
management technique used for storing, indexing,
and retrieving data (critical when data too large
to reside in memory)
56
Association Rules Support and Confidence
Customer buys both

Find all the rules Y ? Z with minimum confidence
and support
support, s, probability that a transaction
contains Y Z
confidence, c, conditional probability that a
transaction having Y Z also contains Z

Customer buys diapers
Customer buys beer

Let minimum support 50, and minimum confidence
50, we have
A ? C (50, 66.6)
C ? A (50, 100)

57
Mining Association RulesAn Example
Min. support 50 Min. confidence 50

For rule A ? C
support support(A C) 50
confidence support(A C)/support(A) 66.6
The Apriori principle
Any subset of a frequent itemset must be frequent

58
Mining Frequent Itemsets the Key Step

Find the frequent itemsets the sets of items
that have minimum support
A subset of a frequent itemset must also be a
frequent itemset
i.e., if AB is a frequent itemset, both A and
B should be a frequent itemset
Iteratively find frequent itemsets with
cardinality from 1 to k (k-itemset)
Use the frequent itemsets to generate association
rules.

59
The Apriori Algorithm Example
Database D
L1
C1
Scan D
C2
C2
L2
Scan D
C3
L3
Scan D
60
Association Rule Mining A Road Map

Boolean vs. quantitative associations (Based on
the types of values handled)
buys(x, SQLServer) buys(x, DMBook)
buys(x, DBMiner) 0.2, 60
age(x, 30..39) income(x, 42..48K)
buys(x, PC) 1, 75
Single dimension vs. multiple dimensional
associations (see ex. Above)
Single level vs. multiple-level analysis
What brands of beers are associated with what
brands of diapers?
Various extensions (thousands!)

61
(No Transcript)
62
Statistics
The subject of statistics concerns itself with
using data to make inferences and predictions
about the world Researchers assembled the vast
bulk of the statistical knowledge base prior to
the availability of significant computing Lots of
assumptions and brilliant mathematics took the
place of computing and led to useful and
widely-used tools Serious limits on the
applicability of many of these methods small
data sets, unrealistically simple models,
Produce hard-to-interpret outputs like p-values
and confidence intervals
63
Bayesian Statistics
The Bayesian approach has deep historical roots
but required the algorithmic developments of the
late 1980s before it was of any use The old
sterile Bayesian-Frequentist debates are a thing
of the past Most data analysts take a pragmatic
point of view and use whatever is most useful
64
Bayes Theorem
65
Bayes Theorem Example
66
Bayes Theorem for Densities
67
Hospital Example (0/27)
prior distribution
likelihood
posterior distribution
68
(No Transcript)
69
Unreasonable prior distribution implies
unreasonable posterior distribution
70
0.032
0.023
What to report? Mode? Mean? Median? Posterior
probability that theta exceeds 0.2? theta such
that Pr(theta gt theta) 0.05 theta such that
Pr(theta gt theta) 0.95
0.013
0.095
0.002
Posterior probability that theta is in
(0.002,0.095) is 90

Write a Comment

User Comments (0)