Title: Simpsons Paradox
1Simpsons Paradox
- Michael Kuykindall
- Faculty Advisor Engin Sungur
2Outline
- Introduction
- Simpsons Paradox
- Data
- Results
- Conclusion
3What is Simpsons Paradox?
- Simpsons Paradox occurs when an association
between two variables is reversed upon observing
a third variable.
4According to Colin R. Blyth Simpsons Paradox can
be defined mathematically as follows
- P(AB)
- while at the same time
- P(ABC) P(A BC)
- P(ABC) P(ABC).
- This is important because
- P(AB) An average of P(ABC) and P(ABC)
- P(AB) An average of P(ABC) and P(ABC),
- Which is easily seen to be true when all the
conditional events have positive probabilities - P(AB)P(CB)P(ABC) P(CB)P(ABC)
- P(AB)P(CB)P(ABC) P(CB)P(ABC)
5Colin R. Blyths example of Simpsons Paradox
- A doctor was planning to try a new treatment on
patients mostly local (C) and a few in Chicago
(C). A statistician advised him to use a table
of random numbers and as each C patient became
available, assign him to the new treatment with
probability .91, leave him to the standard
treatment with probability .09 and the same for
C patient with probability .01 and .99
respectively. When the doctor returned with the
data the statistician told him that the new
treatment was obviously a very bad one, and
criticized him for having continued trying it on
so many patients.
6- The doctor replied that he continued because the
new treatment was obviously a very good one,
having nearly doubled the recovery rate in both
cities.
7- In this example A Alive, B New Treatment, C
Local Patient P( ) refers to the probability
for a patient chosen at random from among those
recorded in the table, and coincides with
proportions in that table, now being taken as a
total available population. In that example we
have - P(AB) .11
- P(ABC) .10P(ABC).05,
- P(ABC) .95P(ABC).95.
- The initially surprising fact that an average of
.10 and .95 is so much smaller than the average
of .05 and .50 is easily explained by showing the
numerical values in - .11 .99(.10) .01(.95)
- .46 .10(.05) .90(.50)
8Survival rate
Treatment
Patient Location
9Smokers Example
- In England a study was conducted to examine the
survival rates of smokers and non-smokers. The
result implied a significant positive correlation
between smoking survival rates because only 24
of smokers died as compared to 31 of
non-smokers. When the data were broken down by
age group in a contingency table, it was found
that there were more older people in the
non-smoker group. Thus age played a very
significant role in the outcome but since it was
overlooked the researchers were left with
deceiving results. (Appleton French, 1996).
10Survival rate
Smoker or Non Smoker
Age Group
11Death Penalty Example
- Effects of racial characteristics on whether
individuals convicted of homicide receive the
death penalty. - The variables death penalty verdict, having
categories yes, no. The race of the defendant
and the race of the victim, each having
categories European American or African American. - Data 326 defendants were recorded as being
indicted for homicide in 20 Florida counties
during 1976-1977.
12Frequencies for Death Penalty Verdict and
Defendant's Race
- About 12 of European American defendants and
about 10 African American defendants receive the
death penalty. Ignoring victims race, the
percentage of yes death penalty verdicts was
lower for African Americans than for European
Americans.
13Death Penalty Verdict by Defendant's Race and
Victim's Race
- When victim is European American, the death
penalty was imposed about 5 percentage points
more often for African American defendants than
for European American defendants. When the
victim is African American, the death penalty was
imposed over 5 percentage points more often than
for European American defendants.
14Death Penalty Verdict by Defendant's Race and
Victim's Race
- Controlling for victims race, the percentage of
yes death penalty verdicts was higher for African
American than for European Americans. The
direction of the association is reversed.
15Odds Ratios for Death Penalty (P), Victim's Race
(V), and Defendant's Race (D)
- The estimated odds of the death penalty were 1.18
times as high for European American defendants as
for African American defendants. But, when the
victim was European American, the estimated odds
of the death penalty were .67 times as high for
European American defendants as for African
American defendant when the victim was African
American, the estimated odds were .79 times as
high for European American defendants as for
African American defendants.
16- The odds of having killed a European American are
estimated to be 25.99 times higher for European
American defendants than for African American
defendants. - The odds ratios relating death penalty verdict
and victims race indicate the death penalty was
more likely when the victim was European American
than when the victim was African American. - So European Americans tend to kill other European
Americans, and killing a European American is
more likely to result in the death penalty.
17Percent Receiving Death Penalty
- Percent receiving death penalty by defendants
race, controlling and ignoring victims race. - Each observation is represented by a letter
giving the level of the victims race. - Surrounding each observation is a circle having
area proportional to the number of observations
at that combination of defendants race and
victims race. - The largest circles occur when European Americans
kill other European Americans or African American
kill other African Americans. - These cause the marginal results whereby European
Americans are more likely to receive the death
penalty.
20
EA
15
Marginal effect
EA
x
10
x
AA
5
AA
0
European American
African American
x marginal effect of defendant's race,
ignoring victim's race.
18Death Penalty
Race of Defendant
Race of Victim
19NLSY79 Example
- DATA The National Longitudinal Survey Handbook
2001 - The NLSY79 is a nationally representative sample
of 12,686 young men and women who were 14 to 22
years of age when first surveyed in 1979. During
the years since that first interview, these young
people typically have finished their schooling,
moved out of their parents homes, made decisions
on continuing education and training, entering
the labor market, served in the military, married
and started their own families. Data collected
from the NLSY79 respondents chronicle these
changes providing researchers with a unique
opportunity to study in detail the life course
experiences of a large group of adult
representatives of all men and women born in the
late 1950s and early 1960s and living in the
United States when the survey began.
20Variables
- Dependent Highest grade completed
- Independent Race (Hispanic, Black, Non-Black
Non-Hispanic), Highest grade Completed by mother
21Analysis Tools
- SAS ANOVA Linear Model (Calculate Parameter
Estimates, Hypothesis test type for F test Type
III - SAS Graphs Box Plots
22Highest grade completed by subjects
-
-
- Dependent Variable HIGHEST GRADE COMPLETED (REV)
1998
-
Standard
- Parameter Estimate
Error t Value
Pr t
-
- Intercept
13.52547170 B 0.03722366 363.36
- Hispanic 1 -1.13028058 B
0.07076467 -15.97 - Black 2 -0.69496322 B
0.06083837 -11.42 - Non B-H 3 0.00000000 B
. .
.
- Source DF Type III SS
Mean Square F Value Pr F
-
- Race 2 1757.477371
878.738686 149.57
23Highest grad completed by subjects their
mothers
-
- Dependent Variable HIGHEST GRADE COMPLETED (REV)
1998 -
-
Standard
- Parameter Estimate
Error t Value Pr
t
-
- Intercept 9.804807070 B
0.10941378 89.61 - Hispanic 0.168059825 B
0.07579780 2.22 0.0266
- Black -0.297132340
B 0.05879723 -5.05 - Non B-H 0.000000000 B
. .
.
- HGC By Mom 0.316727781
0.00870516 36.38 - Source DF Type III
SS Mean Square F Value Pr
F -
- Race 2
216.301255 108.150627 21.84
- HGC By MOM 1 6556.118704
6556.118704 1323.79
24(No Transcript)
25(No Transcript)
26Race
Childs Education
Mother Education
27CONCLUSION!!!!
- Simpsons paradox is a rare phenomenon! It does
not occur often! Thus statisticians must be
trained academically ethically well enough to
make sure that if it has occurred they will
detect and correct it. This is where practice,
critical thinking skills, and repetition come
into play!
28Sources
- Agresti, Alan. Categorical Data Analysis. John
Wiley Son, Inc. Canada.1990 (135-138) - Blyth, Colin R.. Journal of the American
Statistical Association, Vol. 67, No. 338. (Jun.,
1972), pp. 364-366.
29Acknowledgements
- Engin Sungur
- Jon Anderson
- Laura Argys
- Josephine Myers-Kuykindall