Title: Exploratory Analysis of Crash Data
1Exploratory Analysis of Crash Data
2Sampling Frame
Sampling frame the sampling frame is the list of
the population (this is a general term) from
which the sample is drawn. It is important to
understand how the sampling frame defines the
population represented. Example If the study
seeks to identify the safety effects of traffic
signals, the sample frame should include a sample
of signalized intersections in a given
geographical area. If a control group is
included, the sampling frame will include sites
categorized under this group.
Sig Int 2
Sig Int 1
Unsig Int 2
Unsig Int 1
Unsig Int 7
Sig Int 9
Signalized
Unsignalized
3Sampling Frame
Map crashes for Year 1
Map crashes for Year 2
4Sampling Frame
Number of Crashes for Year 1
0
3
10
5
2
0
7
1
1
4
2
0
11
2
6
3
1
0
8
10
5
1
2
0
4
6
1
3
Number of Crashes for Year 2
6
0
3
7
5Sampling Frame
Signalized Intersections Database
ex Nb of lanes, actuated signals, exclusive
left-turn lane, etc.
6Sampling Frame
Signalized Intersections Database
Crash Count
1
0
Intersection 1
Year
2
1
Crash Count
3
6
Intersection 9
Year
2
1
7Sampling Frame
Unsignalized Intersections Database
ex Nb of lanes, actuated signals, exclusive
left-turn lane, etc.
8Histograms
9Ogives
Source Washington et al. (2003)
10Box Plots
11Scatter Diagrams
12Scatter Diagrams
13Bar and Line Charts
Source Washington et al. (2003)
14Two by Two Tables
15Confidence Intervals
Statistics are usually calculated from samples,
such as the sample average X, variance s2, the
standard deviation s, are used to estimate the
population parameters. For instance X is used as
an estimate of the population µx s2 is used as an
estimate of the population variance s2
Interval estimates, defined as Confidence
Intervals, allow inferences to be drawn about the
population by providing an interval, a lower and
upper value, within which the unknown parameter
will lie with a prescribed level of confidence.
In other words, the true value of the population
is assumed to be located within the estimated
interval.
16Confidence Intervals
Confidence Interval for µ and known s2
95 CI
Any CI
90 CI
17Confidence Intervals
Compute the 95 confidence interval for the mean
vehicular speed. Assume the data is normally
distributed. The sample size is 1,296 and the
sample mean X is 58.86. Suppose the population
standard deviation (s) has previously been
computed to be 5.5.
18Confidence Intervals
Compute the 95 confidence interval for the mean
vehicular speed. Assume the data is normally
distributed. The sample size is 1,296 and the
sample mean X is 58.86. Suppose the population
standard deviation (s) has previously been
computed to be 5.5.
Answer
19Confidence Intervals
Confidence Interval for µ and unknown s2
95 CI
Any CI
90 CI
Only valid if n gt 30
20Confidence Intervals
Same example Compute the 95 confidence interval
for the mean vehicular speed. Assume the data is
normally distributed. The sample size is 1,296
and the sample mean X is 58.86. Now, suppose a
sample standard deviation (s) has previously been
computed to be 4.41.
Answer
21Confidence Intervals
Confidence Interval for a Population Proportion
The relative frequency in a population may
sometimes be of interest. The confidence interval
can be computed using the following equation
Where, p is an estimator of the proportion in a
population and, q 1 p. Normal approximation
is only good when np gt 5 and nq gt 5.
22Confidence Intervals
A transportation agency located in a small city
is interested to know the percentage of people
who were involved in a collision during the last
calendar year. A random sample is conducted using
1000 drivers. From the sample, it was found that
110 drivers were involved in at least one
collision. Compute the 90 CI.
23Confidence Intervals
A transportation agency located in a small city
is interested to know the percentage of people
who were involved in a collision during the last
calendar year. A random sample is conducted using
1,000 drivers. From the sample, it was estimated
that 110 drivers were involved in at least one
collision. Compute the 90 CI.
Answer
24Population Proportion
25Confidence Intervals
Confidence Interval Population Variance
When the population variance is of interest, the
confidence interval can be computed using the
following equation
Where, X 2 is Chi-Square with n-1 degrees of
freedom Assumption the population is normally
distributed.
26Confidence Intervals
Taking the same example before on the vehicular
speed, compute the confidence interval (95) for
variance for the speed distribution. A sample of
100 vehicles has shown a variance equal to 19.51
mph.
27Confidence Intervals
Taking the same example before on the vehicular
speed, compute the confidence interval (95) for
variance for the speed distribution. A sample of
100 vehicles has shown a variance equal to 19.51
mph.
Taken from Chi-Square Table
Answer
28The Chi-Square Goodness-of -fit
Non-parametric test useful for observations that
are assumed to be normally distributed. Need to
have more than 5 observations per cell. The test
statistic is
If the value on the right-hand side is less than
the Chi-Square with n-1 degrees of freedom, the
observed and estimated values are the same. If
not, the observed and estimated values are not
the same. You can also perform this test for
two-way contingency tables.