Title: Effect Size and Meta-Analysis
1Effect Size and Meta-Analysis
- Effect size helps evaluate the size of a
difference, such as the difference between two
means. - Meta-analysis is used to combine results across
diverse studies on a given topic.
2Topic 58 Introduction to Effect Size (d)
- Suppose that Experimenter A administered a
new treatment for Depression (Treatment X) to an
experimental group, while the control group
received a standard treatment. Furthermore,
suppose that Experimenter A used a 20 item
true-false depression scale (with possible raw
scores from 0 to 20) and obtained the results on
the posttest shown here. Note that the
difference between the two means is 5 raw score
points.
3Topic 58 Introduction to Effect Size (d)
- Suppose that Experimenter B administered
Treatment Y to an experimental group while
treating the control group with the standard
treatment. -
- Furthermore, suppose Experimenter B used a
30-item scale with choices from Strongly agree
to Strongly disagree (with possible scores from
0 to 120) and obtained the results shown here,
which show a difference of 10 raw score points in
favor of the experimental group.
4Topic 58 Introduction to Effect Size (d)
- Which treatment is superior?
- Treatment X, which resulted in a 5-point raw
score difference between the two means, or - Treatment Y, which resulted in a 10-point raw
score difference between the two means? - Of course, the answer is not clear because the
two experimenters used different measurement
scales (0 to 20 versus 0 to 120)
5Topic 58 Introduction to Effect Size (d)
- In Experiment A, one standard-deviation
unit equals 4.00 raw-score points. Dividing the
difference between the means (5.00) by the size
of the standard-deviation unit for Experiment A
(4.00 points) yields an answer of 1.25. This
value is known as d and is obtained by applying
the formula to the right, in which me stands for
the mean of the experimental group, and mc stands
for the mean of the control group.
6Topic 58 Introduction to Effect Size (d)
- Using the same formula for Experiment B,
the difference between the means is divided by
the standard deviation (10.00/14.00), yielding d
0.71, which is almost three-quarters of the way
above 0.00 on the three-point scale. - The following is what is now known about the
differences in the two experiments when both are
expressed on a common (i.e., standardized) scale
called d.
7Topic 58 Introduction to Effect Size (d)
-
- Remember that the two raw score differences
are not directly comparable because different
measurement scales were used (0 to 20 points
versus 0 to 120 points). By examining the
standardized values of d, which range from 0.00
to 3.00, a meaningful comparison of the results
of the two experiments can be made.
8Topic 58 Introduction to Effect Size (d)
- Important definition Effect size refers to the
magnitude (i.e., size) of a difference when it is
expressed on a standardized scale. The statistic
d is one of the most popular statistics for
describing the effect size of the difference
between two means. In the next topic, the
interpretation of d is discussed in more detail.
In topic 60, an alternative statistic for
expressing effect size is described.
9Topic 59 Interpretation of Effect Size (d)
- In the previous topic, effect size expressed as
d was introduced. The two examples in that topic
had values of d of 0.71 and 1.25. Obviously, the
experiment with a value of 1.25 had a larger
effect than the one with a value of 0.71. - While there are no universally accepted
standards for describing values of d in words,
many researchers use Cohens suggestions (1) a
value of d of about 0.20 (one-fifth of a standard
deviation) is small, (2) a value of 0.50
(one-half of a standard deviation) is medium,
and (3) a value of 0.80 (eight-tenths of a
standard deviation) is large. - Keep in mind that in terms of values of d,
an experimental group can rarely exceed a control
group by more than 3.00 because the effective
range of standard-deviation units is only three
on each side of the mean. Thus, for most
practical purposes, 3.00 or -3.00 is the maximum
value of d.
10Topic 59 Interpretation of Effect Size (d)
-
-
-
- Using the labels in Table 1, the value of d of
0.71 in the previous topic would be described as
being closer to large than medium, while the
value of 1.25 would be described as being between
very large and extremely large.
11Topic 59 Interpretation of Effect Size (d)
- The labels being discussed should not be used
arbitrarily without consideration of the full
context in which the values of d were obtained
and the possible implications of the results.
This leads to two principles (1) a small effect
size might represent an important result, and (2)
a large effect size might represent an
unimportant result.
12Topic 60 Effect Size and Correlation (r)
-
- Cohens d is so widely used as a measure of
effect size that some researchers use the term
effect size and d interchangeably -- as
though they are synonyms. However, effect size
refers to any statistic that describes the size
of a difference on a standardized metric. -
13Topic 60 Effect Size and Correlation (r)
- In addition to d, a number of other measures of
effect size have been proposed. One that is very
widely reported is effect-size r, which is
simply the Pearson Correlation Coefficient (r),
which was described in Topic 53. As outlined in
that topic, r indicates the direction and
strength of a relationship between two variables
expressed on a scale that ranges from -1.00 to
1.00, where 0.00 indicates no relationship.
Values of r are interpreted by first squaring
them (r2). - For example, when r 0.50, r2 0.25 (0.50 x
0.50 0.25). Then, the value of r2 should be
multiplied by 100. Thus, 0.25 x 100 25.
This indicates that the value of r of 0.50 is 25
greater than 0.00 on a scale that extends up to a
maximum possible value of 1.00.
14Topic 60 Effect Size and Correlation (r)
- In basic studies, the choice values of d (which
can range from -3.00 to 3.00) and reporting
correlation coefficients and the associated
values of r2 (which can range from 0.00 to 1.00)
is usually quite straightforward. If a
researcher wants to determine which of two groups
is superior on average, a comparison of means
using d is usually the preferred method of
analysis. -
- On the other hand, if there is one group of
participants with two scores per participant and
if the goal is to determine the degree of
relationship between the two sets of scores, then
r and r2 should be used. For instance, if a
vocabulary knowledge test and a reading
comprehension test were administered to a group
of students, it would not be surprising to obtain
a correlation coefficient as high as 0.70, which
indicates a substantial degree of relationship
between two variables (i.e., there is a strong
tendency for students who score high on
vocabulary knowledge to score high on reading
comprehension). -
- As described in Topic 53, for interpretive
purposes, 0.70 squared equals 0.49, which is
equivalent to 49. Knowing this allows a
researcher to say that the relationship between
the two variables is 49 higher than a
relationship of 0.00.
15Topic 60 Effect Size and Correlation (r)
- When reviewing a body of literature of a given
topic, some studies present means and values of d
while other studies on the same topic present
values of r, depending on the specific research
purposes and research designs. When interpreting
such a set of studies, it can be useful to think
in terms of the equivalent of d and r. Table 1
shows the equivalents for selected values.
16Topic 61 Intro to Meta-Analysis
- Meta-analysis is a set of statistical methods
for combining the results of previous studies. - Meta-analysis provides a statistical method that
can synthesize multiple studies on a given topic. - The differences in the results of each study
contained in the meta-analysis are subject to the
many types of errors, such as - Random sampling errors
- Random errors of measurement
- Systematic errors known to one or more of the
researchers - Systematic errors of which the researchers are
unaware - The results of any one experiment should be
interpreted with caution. - The main focus of the results in a meta-analysis
is based on a mathematical synthesis of the
statistical results of the studies included in
the analysis. - The synthesis can be gathered by averaging the
results of the four mean differences.
17Example Results of Meta-Analysis of Two
Experiments
- __________________________________________________
____________ - Experimental Group Control Group Mean
Difference - ________________________________________________
________ - Researcher m 22.00 m 19.00 mdiff 3.00
- W __________________________________________
______________ - Researcher m 20.00 m 18.00 mdiff 2.00
- X _______________________________________________
_________ - Researcher m23. 00 m 17.00 mdiff 6.00
- Y _______________________________________________
_________ - Researcher m 15.00 m 16.00 mdiff -1.00
- Z _______________________________________________
_________ - The best estimate of the effectiveness of the
program is 2.50 points based on sample of 400
students.
18Two Important Characteristics of Meta- Analysis
- Statistics based on larger samples yield more
reliable results. - It is important to remember that more reliable
results do not necessarily mean more valid
results. - A systematic bias that skews the results will
yield invalid outcomes no matter how big the
sample size is. - Meta-analysis typically synthesizes the results
of studies conducted by independent researchers. - Since the researchers are not working together,
if one researcher makes an error, the effects of
his or her erroneous results will be moderated
when they are averaged with the other results.
19Topic 62 Meta- Analysis and Effect Size
- In a meta- analysis, it is difficult to find even
one perfectly strict replication of a study, for
studies often differ in that various researchers
frequently use different measures of the same
variable. - For example, Experimenter A used a test with
possible score values from 200-800, while
Experimenter B used a test with possible scores
values from 0-50. - __________________________________________________
______________ - Experimental Group. Control Group Mean
Difference - ________________________________________________
_________ - Exp. A m 500.00 m 400.00 mdifference
- N50 sd 200.00 sd 200.00 100.00
- Exp B m 24.00 m 22.00 mdifference
- N50 sd3.00 sd 3.00 2.00
- D divide m difference by the standard deviation
(sd) - Exp A d 100.0/200.00 .50
- Exp B d 2.00/3.00 .67 (had a larger effect
than Exp A)
20Topic 62 Meta- Analysis and Effect Size
- In the previous study, the average of the mean
difference lacks meaning because the results are
expressed on different scales. - The answer to this problem is to use a measure of
effect size, - Cohens d expressed on a standardized scale
that ranges from -3.00 to 3.00 - Calculating d for all studies then averaging the
values of d allows one to gather a meaningful
result - Once you gather this information, you can gauge
the strength of this meta- analysis by comparing
the results to the Table 1 of Topic 59 - R is also expressed on a standardized scale,
-1.00 to 1.00 - R values can also be averaged while weighting the
avg. to take into account varying sample size - Consumers of research should look to see
whether a meta-analysis is based on weighted
averages, which is always desirable.
21Topic 63 Meta- Analysis Strengths and Weaknesses
- Strengths
- Produce results based on large combined samples,
such large sample yield very reliable results
(may lack validity if meta- analysis contains
serious methodological flaws) - Can be used to synthesize the results of studies
conducted by independent researchers - Meta- analyses results in objective conclusions
(obtain results mathematically) - Demonstrates what can be obtained objectively
which can be compared and contrasted with more
subjective qualitative literature reviews on the
same research topic
22Topic 63 Meta- Analysis Strengths and Weaknesses
- Weaknesses
- Researcher may not be careful in selection of
studies to include in a meta- analysis, which
will lead to results that are difficult to
interpret or even meaningless - Moderator variable variable on which the studies
are divided into subgroups in a study which
separate analyses are conducted for various
subgroups - Moderates the results so that the results for
subgroups are different from the grand combined
result - Publication bias
- The body of published research available on a
topic for a meta- analysis might be biased toward
studies that have statistically significant
results. -