Title: Analysis of Real-World Data
1Analysis of Real-World Data
- Static Stability Factor
- and the Risk of Rollover
- April 11, 2001
2References
- Federal Register, June 1, 2000
- Description of the original linear regression
analysis - Federal Register, January 12, 2001
- Description of the updated linear regression
analysis - Comparison with logistic regression analysis
3Need to Specify
- Vehicles
- Calendar years
- States
- Crash types
- Variables
- Statistical model
4Criteria for Selecting Vehicles
- Reliable estimate of the Static Stability Factor
(SSF) - Model years 1988 and later
- Sources include
- Vehicles tested by the agency
- Passenger cars tested by General Motors
5Vehicles Selected
- 100 vehicle model groups, including
- 36 cars
- 30 SUVs
- 13 vans
- 21 pickup trucks
6Criteria for Selecting Calendar Years
- Vehicle Identification Numbers (VINs) for that
year had been decoded and included in the State
Data System (SDS) - Wanted multiple years to maximize data available
for analysis
7Calendar Years Selected
- 1994-1997 for the original linear regression
analysis - 1994-1998 for the updated linear regression
analysis and the logistic regression analysis
8Criteria for Selecting States
- Part of the SDS
- Provided 1994-1998 calendar year data
- Include VIN on the crash file
- Identify rollover occurrence even if it is not
the first harmful event in the crash
9States Selected
- Florida
- Maryland
- Missouri
- North Carolina
- Pennsylvania
- Utah
10Other SDS VIN States
- VIN available for fatalities only
- Kansas
- VIN added in 1998
- Georgia
- Incomplete rollover information
- New Mexico
- Ohio
11Criteria for Selecting Crashes
- Single-vehicle crashes of study vehicles
- Excluded crashes with other participants
- Pedestrian, pedalcyclist, animal, or train
- Excluded certain unusual situations
- No driver, parked vehicle, pulling a trailer, or
emergency use (ambulance, fire, police, or
military)
12Crashes Selected
- 241,036 single-vehicle crashes, including
- 48,996 rollovers
- This is 0.20 rollovers per single-vehicle crash,
consistent with the national estimate from the
General Estimates System for these calendar years
and vehicle groups
13Criteria for Selecting Variables
- Variables describing purpose of study
- Rollover (yes or no)
- SSF (study values range from 1.00 to 1.53)
- Confounding factors
- Environmental and driver factors that describe
how the vehicle was used - Want variables correlated with rollover risk,
including travel speed
14Variables Selected
- Rollover
- SSF
- Dichotomous variables based on
- Environmental factors (light condition, weather,
urbanization, speed limit, road grade, road
curve, road condition, surface condition) - Driver factors (sex, age, insurance coverage,
alcohol/drug use) - Number of occupants in the vehicle
15Summary of Available Data
- Six states
- Five calendar years (1994-1998)
- 100 vehicle groups with a reliable estimate of
SSF - 14 confounding variables, including
- 10 available in all six states
- 241,036 single-vehicle crashes, including
- 48,996 rollovers
16Limitations
- Pennsylvania dropped key road use variables
(grade and curve) from its electronic file in
1998, so 1998 Pennsylvania data were not used
here - Some variables were not available for all six
states (urbanization, road condition, insurance
coverage, and number of occupants in vehicle) - Could not be used in analysis of combined data
- Were used in logistic analysis of individual
states - Reporting practices vary by state
17Statistical Models
- Linear model of summarized data
- Logistic models of individual crashes
18Preparing Data for the Linear Model
- Limited to state-vehicle groups with at least 25
observations - 518 state-vehicle groups used in analysis
- Percentage involvement calculated for each
variable, for each state-vehicle group - Values ranged from 0 to 1
- For example
- Rollover risk described by rollovers per
single-vehicle crash - Urbanization described by percent of crashes on
rural roads
19Specifying Linear Model Form
- Dependent variable LOG(rollover risk)
- Rollover risk set at 0.0001 for state-vehicle
groups with no rollovers so they can be included
in model - Five dummy variables used to capture
state-to-state differences in reporting practices - Missouri used as baseline case
- Linear regression of the rollover variable as a
function of the summarized explanatory variables
and the state dummy variables
20Fitting the Linear Model
- Each summary data point was weighted by the
sample size, capped at 250 as a trade-off between
two considerations - Sample size affects reliability of estimates
- Model should fit over entire range of SSF
- Stepwise procedure used forward variable
selection and a significance level of 0.15 for
entry and removal from the model
21Results of the Linear Model
- Model selected six confounding factors (DARK,
FAST, CURVE, MALE, YOUNG, and DRINK) and all five
state dummies - R2 0.88 for the model of rollover risk as a
function of state, road use variables, and SSF - SSF variable coefficient was
- Important in terms of the size of the estimated
effect - Highly significant in the model (Plt0.0001)
22Predictions from the Linear Model
- Model describes rollover risk as a function of
the explanatory variables and can be used to - Estimate rollover risk as a function of the SSF
for any mix of road-use conditions - Adjust the observed rollover rate for each
summary data point to account for differences in
vehicle use - Next graph shows results for average conditions
observed in the study data as a whole - Rollover risk is estimated as 0.20 in both the
adjusted and the unadjusted data
23Fit of Linear Model
24Interpreting the Linear Model
- Estimated rollover risk given a single-vehicle
crash is halved when the SSF increases by 0.21 - For example, a vehicle with an SSF of 1.00 has
twice the estimated rollover risk of a vehicle
with an SSF of 1.21
25Specifying Logistic Model Forms
- Variables used
- Individual explanatory variables or
- Scenario risk variable
- Approach used with states
- Model each state, and average the results or
- Model pooled data with dummy variables to capture
state-to-state reporting differences
26Concept of Scenario Risk
- Data divided into cells defined by explanatory
variables - For each cell, scenario risk is rollovers per
single-vehicle crash - For each crash, scenario risk is adjusted to
reflect rollovers per single-vehicle crash for
all other crashes in the cell - Idea is to use scenario risk in the logistic
model in place of all the explanatory variables
27Fitting the Logistic Models
- Models from individual states were based on the
explanatory variables available in that state - Models from pooled data were limited to the
explanatory variables available in all six states
28Results of the Logistic Models
- The models from the six individual states and the
two models based on pooled data all fit the data
well - These models were consistent in showing a large
and significant effect for SSF
29Predictions from the Logistic Models
- Logistic models describe the change in the
log(odds) of rollover as a function of the change
in the SSF - Results can be used to predict the absolute
rollover risk as a function of the SSF for a
given set of conditions - Here, estimates of average SSF and odds of
rollover are based on the data as a whole - The four summary models produce similar results
30Comparison of Linearand Logistic Models
- Linear and logistic models both suggest SSF has a
large effect on rollover risk - Next graph compares results of linear model with
results of logistic model from pooled data with
individual explanatory variables
31Predictions from the Models
32Conclusions
- Advantages of linear model of summary data
- All summary data can be shown
- Simpler to explain
- Advantages of logistic analysis
- Includes full range of values and interactions
because not restricted to averages for each
vehicle group - Better for measuring effects of explanatory
variables because most were significant in the
models - In this analysis, logistic analysis appeared to
confirm the general pattern of the linear results