Title: Statistical Confirmations of Steroid Use?
1Statistical Confirmations ofSteroid Use?
- Andy Dolphin
- Raytheon Company
- August 6, 2008
2Outline
- Background
- Adjusting Statistics
- Quantifying Players Abilities
- The Mitchell Report Sample
- Culling the Statistical Records
3Background
- Astronomy
- Stellar populations in nearby galaxies
- Data analysis techniques
- Sports
- Analysis and prediction of team performances
- Baseball player projections and analysis
- Coauthor of The Book Playing the Percentages in
Baseball - Consultant for Cleveland Indians
4Adjusting Player Stats
- We need a way to determine if a players
performance has improved or degraded. - Critical aspect
- We dont care if a players performance is better
than other years - We do care if his performance is better than
would be expected.
5Adjusting Player Stats
- Factors affecting a players performance
- Age
- Home ballpark
- Strength of league
- Usage (relief vs. starting pitchers)
- Teammates (players do not face them)
- For consistency, player statistics adjusted to
age 25 and to the NL strength of their rookie
seasons.
6Raw vs. Adjusted Stats
7Year-to-Year Correlations
- By comparing adjusted metrics over many seasons,
one can determine how much players deviate from
average career trajectory. - For both hitters and pitchers, multiplying number
of PAs by 0.9?year gives fairly constant
prediction accuracy.
8Raw vs. Adjusted Stats
9Characterizing Player Ability
- Need a metric that includes entire effect on
games outcome. - For example, OBP considers a walk and home run as
equals. - Solution each outcome is scored based on its
average effect on the teams winning probability,
relative to an out. - A single is worth about 0.07 wins.
- This metric tends to be about 1/10 of batting
average.
10Characterizing Player Ability
- Need a metric that is indicative of players
ability. - For example, a pitchers win-loss record heavily
depends on run support and fielding. - Solution adjust outcome rates to reflect the
degree in which they are indicative of a players
abilities (regression towards mean). - For example, a hitter retains about 40 of his
single-hitting rate from year to year, compared
with under 20 for a pitcher.
11Player Career Trajectories
- Selected players listed in the Mitchell Report
12Career Trajectory Roger Clemens
13Career Trajectory Andy Pettitte
14Career Trajectory Barry Bonds
15Career Trajectory Rafael Palmeiro
16Mitchell Report Sample
- The Mitchell Report identified players suspected
of steroid use, as well as specific years in
which purchases could be tracked. - Do players show better performance in these
seasons, compared with their career baseline? - Do players listed show more deviation than
average over their careers?
17Mitchell Report Single Seasons
- 32 hitters played 63 seasons with 300 PA
- Average improvement 3.4 1.2
- The only statistically-significant sample came
from the BALCO-tied players, who averaged about a
10 increase in production. - 16 pitchers played 35 seasons with 300 PA
- Average improvement 3.3 1.5
18Mitchell Report Single Seasons
- Problems with this analysis
- Mitchell Report specifically identifies years in
which players purchased drugs from particular
sources, not the entire time of use. - Significant performance swings can be masked by
the statistical uncertainties with even a full
season of data. - There is likely a correlation between injuries
and steroid usage that needs to be accounted for.
19Mitchell Report Careers
- Do players listed in Mitchell Report have larger
than average variation from typical career
trajectory? - Hitters variation/avg 1.10 0.08
- Again, only statistically significant sample
comes from BALCO players - Pitchers variation/avg 1.09 0.12
20Spotting Unusual Players from the Statistical
Record
- Instead of looking at specific players for signs
of improvement, what if we look for players based
on unusually large deviations from average career
trajectory? - This helps avoid selection biases.
- Three-year period with performance significantly
better than career average and previous three
years - Significant dispersion compared with average
career trajectory.
21Players with Unusual Profiles,1975-2007
- Batters
- 1975-1977 Rod Carew
- 1996-2000 Ken Caminiti
- 2000-2002 Jason Giambi
- 2000-2002 Sammy Sosa
- 2001-2003 Bret Boone
- 2001-2004 Barry Bonds
- Pitchers
- 1986-1989 Mike Scott
- 1993-1998 Greg Maddux
- 1999-2003 Pedro Martinez
- 2002-2004 Jason Schmidt
- Baseline of 1 positive per decade prior to
2000 - 2000s appear to be a very different era
22MLB Average Dispersions from Baseline
Batters
Pitchers
- Overall, players show average tendencies.
23Summary
- The player seasons implicated for steroid use in
the Mitchell Report were better than career
baseline at about the 2-sigma level. - Large number of significant deviations from
career baseline over last 10 years, especially
among hitters. - League-wide, players generally are within
historical norms of baseline performance thus it
is unlikely that a large number of players are
achieving a significant benefit from steroid use.