Title: LSP 121
1LSP 121
2Simpsons Paradox
- It is well accepted knowledge that the larger the
data set, the better the results - Simpsons Paradox demonstrates that a great deal
of care has to be taken when combining smaller
data sets into a larger one - Sometimes the conclusions from the larger data
set are opposite the conclusion from the smaller
data sets
3Example Simpsons Paradox
Baseball batting statistics for two players
First Half Second Half Total Season
Player A .400 .250 .264
Player B .350 .200 .336
How could Player A beat Player B for both halves
individually, but then have a lower total season
batting average?
4Example Continued
We werent told how many at bats each player had
First Half Second Half Total Season
Player A 4/10 (.400) 25/100 (.250) 29/110 (.264)
Player B 35/100 (.350) 2/10 (.200) 37/110 (.336)
Player As dismal second half and Player Bs
great first half had higher weights than the
other two values.
5Another Example
Average college physics grades for students in an
engineering program taken HS physics no HS
physics Number of Students 50 5 Average
Grade 80 70
Average college physics grades for students in a
liberal arts program taken HS physics no HS
physics Number of Students 5 50 Average
Grade 95 85 It appears that in both classes,
taking high school physics improves your college
physics grade by 10.
6Example continued
In order to get better results, lets combine our
datasets. In particular, lets combine all the
students that took high school physics. More
precisely, combine the students in the
engineering program that took high school physics
with those students in the liberal arts program
that took high school physics. Likewise,
combine the students in the engineering program
that did not take high school physics with those
students in the liberal arts program that did
not take high school physics. But be careful!
You cant just take the average of the two
averages, because each dataset has a different
number of values!!
7Example continued
Average college physics grades for students who
took high school physics
Students AvgGrades Weighted Grade Engineering 50
80 50/558072.7 Lib Arts 5 95 5/55958.6 T
otal 55 Average (72.7 8.6) 81.3 Average
college physics grades for students who did not
take high school physics
Students AvgGrades Weighted Grade Engineering 5
70 5/55706.4 Lib Arts 50 85 50/558577.3 T
otal 55 Average (6.4 77.3) 83.7 Did the
students that did not have high school physics
actually do better?
8The Problem
- Two problems with combining the data
- There was a larger percentage of one type of
student in each table - The engineering students had a more rigorous
physics class than the liberal arts students,
thus there is a hidden variable - So be very careful when you combine data into a
larger set