Analysis of Real-World Data - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Analysis of Real-World Data

Description:

Passenger cars tested by General Motors. 5. Vehicles Selected. 100 vehicle model groups, including: ... confounding factors (DARK, FAST, CURVE, MALE, YOUNG, ... – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 33
Provided by: suepa
Learn more at: https://www.nhtsa.gov
Category:
Tags: analysis | cars | data | fast | real | world

less

Transcript and Presenter's Notes

Title: Analysis of Real-World Data


1
Analysis of Real-World Data
  • Static Stability Factor
  • and the Risk of Rollover
  • April 11, 2001

2
References
  • Federal Register, June 1, 2000
  • Description of the original linear regression
    analysis
  • Federal Register, January 12, 2001
  • Description of the updated linear regression
    analysis
  • Comparison with logistic regression analysis

3
Need to Specify
  • Vehicles
  • Calendar years
  • States
  • Crash types
  • Variables
  • Statistical model

4
Criteria for Selecting Vehicles
  • Reliable estimate of the Static Stability Factor
    (SSF)
  • Model years 1988 and later
  • Sources include
  • Vehicles tested by the agency
  • Passenger cars tested by General Motors

5
Vehicles Selected
  • 100 vehicle model groups, including
  • 36 cars
  • 30 SUVs
  • 13 vans
  • 21 pickup trucks

6
Criteria for Selecting Calendar Years
  • Vehicle Identification Numbers (VINs) for that
    year had been decoded and included in the State
    Data System (SDS)
  • Wanted multiple years to maximize data available
    for analysis

7
Calendar Years Selected
  • 1994-1997 for the original linear regression
    analysis
  • 1994-1998 for the updated linear regression
    analysis and the logistic regression analysis

8
Criteria for Selecting States
  • Part of the SDS
  • Provided 1994-1998 calendar year data
  • Include VIN on the crash file
  • Identify rollover occurrence even if it is not
    the first harmful event in the crash

9
States Selected
  • Florida
  • Maryland
  • Missouri
  • North Carolina
  • Pennsylvania
  • Utah

10
Other SDS VIN States
  • VIN available for fatalities only
  • Kansas
  • VIN added in 1998
  • Georgia
  • Incomplete rollover information
  • New Mexico
  • Ohio

11
Criteria for Selecting Crashes
  • Single-vehicle crashes of study vehicles
  • Excluded crashes with other participants
  • Pedestrian, pedalcyclist, animal, or train
  • Excluded certain unusual situations
  • No driver, parked vehicle, pulling a trailer, or
    emergency use (ambulance, fire, police, or
    military)

12
Crashes Selected
  • 241,036 single-vehicle crashes, including
  • 48,996 rollovers
  • This is 0.20 rollovers per single-vehicle crash,
    consistent with the national estimate from the
    General Estimates System for these calendar years
    and vehicle groups

13
Criteria for Selecting Variables
  • Variables describing purpose of study
  • Rollover (yes or no)
  • SSF (study values range from 1.00 to 1.53)
  • Confounding factors
  • Environmental and driver factors that describe
    how the vehicle was used
  • Want variables correlated with rollover risk,
    including travel speed

14
Variables Selected
  • Rollover
  • SSF
  • Dichotomous variables based on
  • Environmental factors (light condition, weather,
    urbanization, speed limit, road grade, road
    curve, road condition, surface condition)
  • Driver factors (sex, age, insurance coverage,
    alcohol/drug use)
  • Number of occupants in the vehicle

15
Summary of Available Data
  • Six states
  • Five calendar years (1994-1998)
  • 100 vehicle groups with a reliable estimate of
    SSF
  • 14 confounding variables, including
  • 10 available in all six states
  • 241,036 single-vehicle crashes, including
  • 48,996 rollovers

16
Limitations
  • Pennsylvania dropped key road use variables
    (grade and curve) from its electronic file in
    1998, so 1998 Pennsylvania data were not used
    here
  • Some variables were not available for all six
    states (urbanization, road condition, insurance
    coverage, and number of occupants in vehicle)
  • Could not be used in analysis of combined data
  • Were used in logistic analysis of individual
    states
  • Reporting practices vary by state

17
Statistical Models
  • Linear model of summarized data
  • Logistic models of individual crashes

18
Preparing Data for the Linear Model
  • Limited to state-vehicle groups with at least 25
    observations
  • 518 state-vehicle groups used in analysis
  • Percentage involvement calculated for each
    variable, for each state-vehicle group
  • Values ranged from 0 to 1
  • For example
  • Rollover risk described by rollovers per
    single-vehicle crash
  • Urbanization described by percent of crashes on
    rural roads

19
Specifying Linear Model Form
  • Dependent variable LOG(rollover risk)
  • Rollover risk set at 0.0001 for state-vehicle
    groups with no rollovers so they can be included
    in model
  • Five dummy variables used to capture
    state-to-state differences in reporting practices
  • Missouri used as baseline case
  • Linear regression of the rollover variable as a
    function of the summarized explanatory variables
    and the state dummy variables

20
Fitting the Linear Model
  • Each summary data point was weighted by the
    sample size, capped at 250 as a trade-off between
    two considerations
  • Sample size affects reliability of estimates
  • Model should fit over entire range of SSF
  • Stepwise procedure used forward variable
    selection and a significance level of 0.15 for
    entry and removal from the model

21
Results of the Linear Model
  • Model selected six confounding factors (DARK,
    FAST, CURVE, MALE, YOUNG, and DRINK) and all five
    state dummies
  • R2 0.88 for the model of rollover risk as a
    function of state, road use variables, and SSF
  • SSF variable coefficient was
  • Important in terms of the size of the estimated
    effect
  • Highly significant in the model (Plt0.0001)

22
Predictions from the Linear Model
  • Model describes rollover risk as a function of
    the explanatory variables and can be used to
  • Estimate rollover risk as a function of the SSF
    for any mix of road-use conditions
  • Adjust the observed rollover rate for each
    summary data point to account for differences in
    vehicle use
  • Next graph shows results for average conditions
    observed in the study data as a whole
  • Rollover risk is estimated as 0.20 in both the
    adjusted and the unadjusted data

23
Fit of Linear Model
24
Interpreting the Linear Model
  • Estimated rollover risk given a single-vehicle
    crash is halved when the SSF increases by 0.21
  • For example, a vehicle with an SSF of 1.00 has
    twice the estimated rollover risk of a vehicle
    with an SSF of 1.21

25
Specifying Logistic Model Forms
  • Variables used
  • Individual explanatory variables or
  • Scenario risk variable
  • Approach used with states
  • Model each state, and average the results or
  • Model pooled data with dummy variables to capture
    state-to-state reporting differences

26
Concept of Scenario Risk
  • Data divided into cells defined by explanatory
    variables
  • For each cell, scenario risk is rollovers per
    single-vehicle crash
  • For each crash, scenario risk is adjusted to
    reflect rollovers per single-vehicle crash for
    all other crashes in the cell
  • Idea is to use scenario risk in the logistic
    model in place of all the explanatory variables

27
Fitting the Logistic Models
  • Models from individual states were based on the
    explanatory variables available in that state
  • Models from pooled data were limited to the
    explanatory variables available in all six states

28
Results of the Logistic Models
  • The models from the six individual states and the
    two models based on pooled data all fit the data
    well
  • These models were consistent in showing a large
    and significant effect for SSF

29
Predictions from the Logistic Models
  • Logistic models describe the change in the
    log(odds) of rollover as a function of the change
    in the SSF
  • Results can be used to predict the absolute
    rollover risk as a function of the SSF for a
    given set of conditions
  • Here, estimates of average SSF and odds of
    rollover are based on the data as a whole
  • The four summary models produce similar results

30
Comparison of Linearand Logistic Models
  • Linear and logistic models both suggest SSF has a
    large effect on rollover risk
  • Next graph compares results of linear model with
    results of logistic model from pooled data with
    individual explanatory variables

31
Predictions from the Models
32
Conclusions
  • Advantages of linear model of summary data
  • All summary data can be shown
  • Simpler to explain
  • Advantages of logistic analysis
  • Includes full range of values and interactions
    because not restricted to averages for each
    vehicle group
  • Better for measuring effects of explanatory
    variables because most were significant in the
    models
  • In this analysis, logistic analysis appeared to
    confirm the general pattern of the linear results
Write a Comment
User Comments (0)
About PowerShow.com