Title: Systematic Reviews:
1Systematic Reviews Methods and Procedures
George A. Wells Editor, Cochrane Musculoskeletal
Review Group Department of Epidemiology and
Community Medicine University of Ottawa Ottawa,
Ontario, Canada
2Meta-analysis
- Meta-analysis is a statistical analysis of a
collection of studies - Meta-analysis methods focus on contrasting and
comparing results from different studies in
anticipation of identifying consistent patterns
and sources of disagreements among these results - Primary objective
- Synthetic goal (estimation of summary effect)
vs - Analytic goal (estimation of differences)
3- Systematic Review
- the application of scientific strategies that
limit bias to the systematic assembly, critical
appraisal and synthesis of all relevant studies
on a specific topic - Meta-Analysis
- a systematic review that employs statistical
methods to combine and summarize the results of
several studies
4Features of narrative reviews and systematic
reviews
NARRATIVE SYSTEMATIC
QUESTION Broad Focused SOURCES/ Usually
unspecified Comprehensive SEARCH Possibly
biased explicit SELECTION Unspecified
biased?Criterion-based uniformly
applied APPRAISAL Variable Rigourous SYNTHESIS
Usually qualitative Quantitative INFERENCE
Sometimes Usually evidence-
evidence-based based
5Steps of a Cochrane Systematic Review
- Clearly formulated question
- Comprehensive data search
- Unbiased selection and extraction process
- Critical appraisal of data
- Synthesis of data
- Perform sensitivity and subgroup analyses if
appropriate and possible - Prepare a structured report
6- What is the study objective
- to validate results in a large population
- to guide new studies
- Pose question in both biologic and health care
terms specifying with operational definitions - population
- intervention
- outcomes (both beneficial and harmful)
7Inclusion Criteria
- Study design
- Population
- Interventions
- Outcomes
8Steps of a Cochrane Systematic Review
- Clearly formulated question
- Comprehensive data search
- Unbiased selection and extraction process
- Critical appraisal of data
- Synthesis of data
- Perform sensitivity and subgroup analyses if
appropriate and possible - Prepare a structured report
9- Need a well formulated and co-ordinated effort
- Seek guidance from a librarian
- Specify language constraints
- Requirements for comprehensiveness of search
depends on the field and question to be addressed - Possible sources include
- computerized bibliographic database
- review articles
- abstracts
- conference proceedings
- dissertations
- books
- experts
- granting agencies
- trial registries
- industry
- journal handsearching
10- Procedure
- usually begin with searches of biblographic
reports (citation indexes, abstract databases) - publications retrieved and references therein
searched for more references - as a step to elimination of publication bias need
information from unpublished research - databases of unpublished reports
- clinical research registries
- clinical trial registries
- unpublished theses
- conference indexes
Published Reports (publication bias ie.
tendency to publish statistically significant
results)
11Steps of a Cochrane Systematic Review
- Clearly formulated question
- Comprehensive data search
- Unbiased selection and extraction process
- Critical appraisal of data
- Synthesis of data
- Perform sensitivity and subgroup analyses if
appropriate and possible - Prepare a structured report
12Study Selection
- 2 independent reviewers select studies
- Selection of studies addressing the question
posed based on a priori specification of the
population, intervention, outcomes and study
design - Level of agreement kappa
- Differences resolved by consensus
- Specify reasons for rejecting studies
13Data Extraction
- 2 independent reviewers extract data using
predetermined forms - Patient characteristics
- Study design and methods
- Study results
- Methodologic quality
- Level of agreement kappa
- Differences resolved by consensus
14 Data Extraction .
- Be explicit, unbiased and reproducible
- Include all relevant measures of benefit and harm
of the intervention - Contact investigators of the studies for
clarification in published methods etc. - Extract individual patient data when published
data do not answer questions about intention to
treat analyses, time-to-event analyses,
subgroups, dose-response relationships
15Steps of a Cochrane Systematic Review
- Well formulated question
- Comprehensive data search
- Unbiased selection and extraction process
- Critical appraisal of data
- Synthesis of data
- Perform sensitivity and subgroup analyses if
appropriate and possible - Prepare a structured report
16Description of Studies
- Size of study
- Characteristics of study patients
- Details of specific interventions used
- Details of outcomes assessed
17Methodologic Quality Assessment
- Can use as
- threshold for inclusion
- possible explanation form heterogeneity
- Base quality assessments on extent to which bias
is minimized - Make quality assessment scoring systems
transparent and parsimonious - Evaluate reproducibility of quality assessment
- Report quality scoring system used
18Quality Assessment Example
indicates that randomization was appropriate (
eg
Random numbers were computer generated)
19Steps of a Cochrane Systematic Review
- Well formulated question
- Comprehensive data search
- Unbiased selection and extraction process
- Critical appraisal of data
- Synthesis of data
- Perform sensitivity and subgroup analyses if
appropriate and possible - Prepare a structured report
20Outcome
Discrete (event)
Continuous (measured)
Mean Standardized Difference Mean
Difference (MD) (SMD)
Odds Relative Risk Ratio Risk
Difference (OR) (RR) (RD)
(Basic Data)
(Basic Data)
Overall Estimate Fixed Effects Random Effects
Overall Estimate Fixed Effects Random Effects
21Effect measures discrete data
- P1 event rate in experimental group
- P2 event rate in control group
- RD Risk difference P2 - P1
- RR Relative risk P1 / P2
- RRR Relative risk reduction (P2-P1)/P2
- OR Odds ratio P1/(1-P1)/P2/(1-P2)
- NNT No. needed to treat 1 / (P2-P1)
22Example
- Experimental event rate 0.3
- Control event rate 0.4
- RD 0.4 - 0.3 0.1
- RR 0.3 / 0.4 0.75
- RRR (0.4 - 0.3) / 0.4 0.25
- OR (0.3/0.7)/(0.4/0.6) 0.64
- NNT 1 / (0.4 - 0.3) 10
23Discrete - Odds Ratio (OR)
Event No event Experimental a b
ne Control c d nc
Odds number of patients experiencing
event number of patients not experiencing
event Odds ratio Odds in Experimental
group Odds in Control group
Basic Data a/ne c/nc
24Discrete - Odds Ratio Example
Event No event Experimental 13 33
46 Control 7 31 38
Basic Data 13/46 7/38
25Discrete - Relative Risk (RR)
Event No event Experimental a b
ne Control c d nc
Risk number of patients experiencing
event number of patients Risk Ratio Risk in
Experimental group Risk in Control group
Basic Data a/ne c/nc
26Discrete - Relative Risk - Example
Event No event Experimental 13 33
46 Control 7 31 38
Basic Data 13/46 7/38
27Discrete - Risk Difference (RD)
Event No event Experimental a b
ne Control c d nc
Risk number of patients experiencing
event number of patients Risk
Difference (Risk in Experimental group) - (Risk
in Control group)
RD Pe- Pc
Basic Data a/ne c/nc
28Discrete - Risk Difference - Example
Event No event Experimental 13 33
46 Control 7 31 38
RD Pe- Pc 13/46 - 7/38 0.098
Basic Data 13/46 7/38
29Discrete - Odds Ratio
(O)
Event No event Experimental a b
ne Control c d nc
Estimator
Standard Error
100(1- ) CI
30Discrete - Relative Risk
(R)
Event No event Experimental a b
ne Control c d nc
Estimator
Standard Error
100(1- ) CI
31Discrete - Risk Difference
(D)
Event No event Experimental a b
ne Control c d nc
Estimator
Standard Error
100(1- ) CI
32When to use OR / RR / RD
OR vs RR Odds Ratio ? Relative Risk if event
occurs infrequently (i.e. a and c small
relative to b and d) RR a(cd) ? ad
OR (ab)c bc Odds Ratio gt Relative Risk if
event occurs frequently RD vs RR When
interpretation in terms of absolute difference is
better than in relative terms (eg. Interest in
absolute reduction in adverse events)
33(No Transcript)
34Continuous Data - Mean Difference (MD)
number mean standard deviation Experimental ne
se Control nc sc
35Continuous Data - Standardized Mean Difference
(SMD)
number mean standard deviation Experimental ne
se Control nc sc
36When to use MD / SMD
- Mean Difference
- When studies have comparable outcome measures
(ie. Same scale, probably same length of
follow-up) - A meta-analysis using MDs is known as a weighted
mean difference (WMD) - Standardized Mean Difference
- When studies use different outcome measurements
which address the same clinical outcome (eg
different scales) - Converts scale to a common scale number of
standard deviations
37Example Combining different scales for Swollen
Joint Count
38Sources of Variation over Studies
- True inter-study variation may exist
(fixed/random-effects model) - Sampling error may vary among studies (sample
size) - Characteristics may differ among studies
(population, intervention)
39Modelling Variation
- Parameter of interest (quantifies average
treatment effect) - Number of independent studies k
- Summary Statistic Yi (i1,2,,k)
- Large sample size asymptotic normal distribution
Fixed-effects model vs Random-effects model
40Fixed-Effects Model
- Outcome Yi from study i is a sample from a
distribution with mean - (ie. common mean across studies)
- Yi are independently distributed as N ( ,
) (i1,2,,k) where Var(Yi ) and
assume E(Yi)
41Fixed-Effects Model
x
42Random-Effects Model
- Outcome Yi from study i is a sample from a
distribution with mean - (ie. study-specific means)
- Yi are independently distributed as N ( ,
) (i1,2,,k) where Var(Yi ) and
assume E(Yi) - is a realization from a distribution of
effects with mean - are independently distributed as N ( ,
) (i1,2,,k) where - Var ( ) is the inter-study variation
- is the average treatment effect
43Random-Effects Model
x
44Random-Effects Model ..
Estimating Average Study Effect
- after averaging study-specific effects,
distribution of Yi is N ( , ) - although is parameter of interest, must
be considered and estimated
Estimating Study-Specific Effects
- distribution of conditional on observed
data, and is N (
) - where Fi is the shrinkage factor for the ith
study
45Modelling Variation
- Studies are stratified and then combined to
account for differences in sample size and study
characteristics - A weighted average of estimates from each study
is calculated - Question of whether a common or study-specific
parameter is to be estimated remains .
Procedure - perform test of homogeneity
- if no significant difference use fixed-effects
model - otherwise identify study characteristics that
stratifies studies into subsets with homogeneous
effects or use random effects model
46Fixed Effects Model
- Require from each study
- effect estimate and
- standard error of effect estimate
- Combine these using a weighted average
- pooled estimate sum of (estimate ? weight)
- sum of weights
- where weight 1 / variance of estimate
- Assumes a common underlying effect behind every
trial
47Fixed-Effects Model General Scheme
Study Measure Std Error Weight 1 Y1 s1 W1 2 Y
2 s2 W2 . . . . . . . . . . . . k Yk sk
Wk (no association Yi0)
Overall Measure
48Chi-Square Tests
1
2
1
If large association
2
If large heterogeneity
49Features in Graphic Display
- For each trial
- estimate (square)
- 95 confidence interval (CI) (line)
- size (square) indicates weight allocated
- Solid vertical line of no effect
- if CI crosses line then effect not significant
(pgt0.05) - Horizontal axis
- arithmetic RD, MD, SMD
- logarithmic OR, RR
- Diamond represents combined estimate and 95 CI
- Dashed line plotted vertically through combined
estimate
50Odds Ratio
Three methods for combining (1)
Mantel-Haenszel method (2) Petos method (3)
Maximum likelihood method Relative Risk Risk
Difference
51Peto Odds Ratio
Mantel-Haenszel Odds Ratio
52Relative Risk
53Risk Difference
54Weighted Mean Difference
- Standardized Mean Difference
55Weighted Mean Difference
Standardized Mean Difference
56Heterogeneity
- Define meaning of heterogeneity for each review
- Define a priori the important degree of
heterogeneity (in large data sets trivial
heterogeneity may be statistically significant) - If heterogeneity exists examine potential sources
(differences in study quality, participants,
intervention specifics or outcome
measurement/definition) - If heterogeneity exists across studies, consider
using random effects model - If heterogeneity can be explained using a priori
hypotheses, consider presenting results by these
subgroups - If heterogeneity cannot be explained, proceed
with caution with further statistical aggregation
and subgroup analysis
57Heterogeneity How to Identify it
- Common sense
- are the patients, interventions and outcomes in
each of the included studies sufficiently similar - Exploratory analysis of study-specific estimates
- Statistical tests
58Heterogeneity How to deal with it
Lau et al. 1997
59Heterogeneity Exploring it
- Subgroup analyses
- subsets of trials
- subsets of patients
- SUBGROUPS SHOULD BE PRE-SPECIFIED TO AVOID BIAS
- Meta-regression
- relate size of effect to characteristics of the
trials -
60Exploring Heterogeneity subgroup analysis
61Exploring Heterogeneity subgroup analysis
62Random Effects Model
- Assume true effect estimates really vary across
studies - Two sources of variation
- within studies (between patients)
- between studies (heterogeneity)
- What the software does
- Revise weights to take into account both
components of variation - weight 1
- varianceheterogeneity
- When heterogeneity exists we get
- a different pooled estimate (but not necessarily)
with a different interpretation - a wider confidence interval
- a larger p-value
63Random Effects Model
If is known then MLE of is
If is unknown three common methods of
inference can be used Restricted Maximum
Likelihood (REML) Bayesian Method of
Moments (MOM)
64Method of Moments (Random effects model)
Study Measure Weight (FE) Weight (RE) 1 Y1
W1 w1(w1-1 )-1 2 Y2 W2 w2(w2-1
)-1 . . . . . . . . . . . . k Yk Wk
wk(wk-1 )-1
Overall Measure
65Effect of model choice on study weights
Larger studies receive proportionally less
weight in RE model than in FE model
66Fixed vs Random Effects Discrete Data
Fixed Effects
Random Effects
67Fixed vs Random Effects Continuous Data
Fixed Effects
Random Effects
68Omission of Outlier - Chestnut Study
69Analysis
- Include all relevant and clinically useful
measures of treatment effect - Perform a narrative, qualitative summary when
data are too sparse, of too low quality or too
heterogeneous to proceed with a meta-analysis - Specify if fixed or random effects model is used
- Describe proportion of patients used in final
analysis - Use confidence intervals
- Include a power analysis
- Consider cumulative meta-analysis (by order of
publication date, baseline risk, study quality)
to assess the contribution of successive studies
70Steps of a Cochrane Systematic Review
- Well formulated question
- Comprehensive data search
- Unbiased selection and extraction process
- Critical appraisal of data
- Synthesis of data
- Perform sensitivity and subgroup analyses if
appropriate and possible - Prepare a structured report
71Subgroup Analyses
- Pre-specify hypothesis-testing subgroup analyses
and keep few in number - Label all a posteriori subgroup analyses
- When subgroup differences are detected, interpret
in light of whether they are - established a priori
- few in number
- supported by plausible causal mechanisms
- important (qualitative vs quantitative)
- consistent across studies
- statistically significant (adjusted for multiple
testing)
72Sensitivity Analyses
- Test robustness of results relative to key
features of the studies and key assumptions and
decisions - Include tests of bias due to retrospective nature
of systematic reviews (eg.with/without studies of
lower methodologic quality) - Consider fragility of results by determining
effect of small shifts in number of events
between groups - Consider cumulative meta-analysis to explore
relationship between effect size and study
quality, control event rates and other relevent
features - Test a reasonable range of values for missing
data from studies with uncertain results
73Funnel Plot
- Scatterplot of effect estimates against sample
size - Used to detect publication bias
- If no bias, expect symmetric, inverted funnel
- If bias, expect asymmetric or skewed shape
x x x x
x x x x x x x x
x x x
x x x x x x x
Suggestion of missing small studies
74Funnel Plot Example 1 Prophylaxis of NSAID
induced Gastric Ulcers
700
600
500
400
Sample Size
300
Intervention
200
100
H2-Blockers
0
1.2
1.0
.8
.6
.4
.2
0.0
Effect Size (RR)
75Funnel Plot Example 2 Alendronate for
Postmenopausal Osteoporosis
2500
2000
WMD of change in lumbar bone mineral density
1500
Sample Size
1000
500
0
0
5
10
Weighted Mean Difference
76Steps of a Cochrane Systematic Review
- Well formulated question
- Comprehensive data search
- Unbiased selection and extraction process
- Critical appraisal of data
- Synthesis of data
- Perform sensitivity and subgroup analyses if
appropriate and possible - Prepare a structured report
77Presentation of Results
- Include a structured abstract
- Include a table of the key elements of each study
- Include summary data from which the measures are
computed - Employ informative graphic displays representing
confidence intervals, group event rates, sample
sizes etc.
78Interpretation of Results
- Interpret results in context of current health
care - State methodologic limitations of studies and
review - Consider size of effect in studies and review,
their consistency and presence of dose-response
relationship - Consider interpreting results in context of
temporal cumulative meta-analysis - Interpret results in light of other available
evidence - Make recommendations clear and practical
- Propose future research agenda (clinical and
methodological requirements)
79Generic Inferential Framework
80Generic inferential framework
- (1) Conceptually, think of a generic effect
size statistic T - (2) corresponding effect size parameter ?
- (3) associated standard error SE(T), square root
of variance - (4) for some effect sizes, some suitable
transformation may be needed to make inference
based on normal distribution theory
81Generic inferential framework ...
- (A) Fixed-Effects Model (FEM)
- Assume a common effect size
- Obtain average effect size as a weighted mean
(unbiased) - Optimal weight is reciprocal of variance
(inverse variance weighted method)
82Generic inferential framework ...
- Variances inversely proportional to within-study
sample sizes - what is the effect of larger studies in
calculating weights? - may also weigh by quality index, q, scaled from
0 to 1
83Generic inferential framework ...
- Average effect size has conditional variance (a
function of conditional variances of each effect
size, quality index, ) - e.g.. V 1/total weight
- Multiply the resulting standard error by
appropriate critical value (1.96, 2.58, 1.645) - Construct confidence interval and/or test
statistic
84Generic inferential framework ...
- Test the homogeneity assumption using a weighted
effect size sums of squares of deviations, Q - If Q exceeds the critical value of chi-square at
k-1 d.f. (k number of studies), then observed
between-study variance significantly greater than
what would be expected under the null hypothesis
85Generic inferential framework ...
- When within-study sample sizes are very large, Q
may be rejected even when individual effect size
estimates do not differ much - One can take different courses of action when Q
is rejected (see next page)
86Generic inferential framework ...
- Methodologic choices in dealing with
heterogeneous data
87Generic inferential framework ...
- (B) Random-Effects Model (REM)
- Total variability of an observed study effect
size reflects within and between variance (extra
variance component) - If between-studies variance is zero, equations of
REM reduce to those of FEM - Presence of a variance component which is
significantly different from zero may be
indicative of REM
88Generic inferential framework ...
- Once significance of variance component is
established (e.g.. Q test for homogeneity of
effect size), - its magnitude should be estimated
- variance components can be estimated in many
ways! - the most commonly used method is the so-called
the DerSimonian-Laird method which is based on
method-of-moments approach - Compute random effects weighted mean as an
estimate of the average of the random effects in
the population - construct confidence interval and conduct
hypothesis tests as before (new variance and thus
new weights!!!)
89Correlation Coefficient
90Example Correlation coefficient
- A measure of association more popular in
cross-sectional observational studies than in
RCTs is Pearsons correlation coefficient, r
given by - X and Y must be continuous (e.g. blood pressure
and weight) - r lies between -1 to 1
- not available in RevMan / MetaView at this time
91Correlation coefficient (contd)
- Following the generic framework discussed
earlier - the effect size statistic is r
- the corresponding effect size parameter is the
underlying population correlation coefficient, ? - in this case, a suitable transformation is needed
to achieve approximate normality of effect size - inference is conducted on the scale of the
transformed variable and final results are
back-transformed to the original scale
92Correlation coefficient (contd)
- Assuming X and Y have a bivariate normal
distribution, the Fishers Z transformed variable - has, for large sample, an approximate normal
distribution with mean of - and a variance of
- Hence, weighting factor associated with Z is W
1/Var n-3.
93Correlation coefficient (contd)
- meta-analysis is carried out on Z-transformed
measures and final results are transformed back
to the scale of correlation using
94Numerical Example
- Source Fleiss J., Statistical Methods in Medical
Research 1993 2 121 -- 145. - correlation coefficients reported by 7
independent studies in education are included in
the meta-analysis - Comparison association between a characteristic
of the teacher and the mean measure of his or her
students achievement
95Example Fleiss (1993)
__________________________________________ Study
n r Z W WZ
WZ2
1 15 -0.073 -0.073 12
-0.876 0.064 2 16 0.308 0.318 13
4.134 1.315 3 15 0.481 0.524 12
6.288 3.295 4 16 0.428 0.457 13
5.941 2.715 5 15 0.180 0.182 12
2.184 0.397 6 17 0.290 0.299 14
4.186 1.252 7 __ 15 0.400 0.424 _ 12
___5.088 2.157__ Sum 88 26.945
11.195
Z Fishers Z-transformation of r W
n-3
Q 2.94 on 6 df is not statistically significant.
96Results and discussions
- No evidence for heterogeneous association across
studies - Fixed effect analysis may be undertaken
- Questions
- Would a random effect analysis as shown earlier
produce a different numerical value for the
combined correlation coefficient? - How would the weights be modified to carry out a
REM?
97Results and discussions (contd)
- the weighted mean of Z is
- the approximate standard error of the combined
mean is
98Results and discussions (contd)
- Test of significance is carried out using
- this value exceeds the critical value 1.96
(corresponding to 5 level of significance), so
we conclude that average value of Z (hence the
average correlation) is statistically significant
99Results and discussions (contd)
- 95 confidence interval for ? is
- Transforming back to the original scale, a 95 CI
for the parameter of interest, ?, is - again confirming a significant association
100Critical Appraisal of a Systematic Review
101(A) The Message
- Does the review set out to answer a precise
question about patient care? - Should be different from an uncritical
encyclopedic presentation
102(B) The Validity
- Have studies been sought thoroughly
- Medline and other relevant bibliographic database
- Cochrane controlled clinical trials register
- Foreign language literature
- "Grey literature" (unpublished or un-indexed
reports theses, conference proceedings, internal
reports, non-indexed journals, pharmaceutical
industry files) - Reference chaining from any articles found
- Personal approaches to experts in the field to
find unpublished reports - Hand searches of the relevant specialized
journals.
103Validity (contd)
- Have inclusion and exclusion criteria for studies
been stated explicitly, taking account of the
patients in the studies, the interventions used,
the outcomes recorded and the methodology?
104Validity (contd)
- Have the authors considered the homogeneity of
the studies the idea that the studies are
sufficiently similar in their design,
interventions and subjects to merit combination. - this is done either by eyeballing graphs like the
forest plot or by applications of chi-square
tests (Q test)
105(C) The Utility
- The various studies may have used patients of
different ages or social classes, but if the
treatment effects are consistent across the
studies, then generalisation to other groups or
populations is more justified.
106Utility (contd)
- Be wary of sub-group analyses where the authors
attempt to draw new conclusions by comparing the
outcomes for patients in one study with the
patients in another study - Be wary of "data-dredging" exercises, testing
multiple hypotheses against the data, especially
if the hypotheses were constructed after the
study had begun data collection.
107Utility (contd)
- One may also want to ask
- Were all clinically important outcomes
considered? - Are the benefits worth the harms and costs?