NA3873 Lecture 3: Descriptive Statistics: Measures of Location and Variability

1 / 37

About This Presentation

Title:

NA3873 Lecture 3: Descriptive Statistics: Measures of Location and Variability

Description:

Actual calculation outside scope of class. Kurtosis. Some software packages provide kurtosis ... In practice, we rarely calculate statistics by hand. In MS ... –

Number of Views:342

Avg rating:3.0/5.0

Slides: 38

Provided by: tas88

Category:

more less

Transcript and Presenter's Notes

Title: NA3873 Lecture 3: Descriptive Statistics: Measures of Location and Variability

1
NA387(3) Lecture 3 Descriptive
StatisticsMeasures of Location and Variability

(Devore, Ch. 1.3-1.4)

2
Topics

Branches of Statistics (review from last lecture)
Conducting a Statistical Analysis
Measures of Location
Measures of Dispersion
Outliers and Box Plots

3
I. Branches of Statistics-Recap

1. Descriptive Statistics
Summarize or describe important features in a
data set without attempting to infer conclusions.
Describe data samples using terms such as
X-bar (sample mean) and s (standard deviation).
These statistics are used to estimate the
population mean (m) and population sigma s.

4
I. Branches of Statistics-Recap

2. Inferential Statistics
Use sample of data to draw conclusions (make
inferences).
Example Suppose you sample ten bottles from each
of two different filling machines.
Machine A averages 12.10 oz and B averages 12.12
oz.
Based on inferential statistics, you might
conclude that the two machines are not different.

5
Location and Dispersion

Most common descriptive statistics are related to
either measuring location or dispersion
(variation).
Location central tendency
Dispersion spread of distribution

6
Example

Classic example to demonstrate these concepts
Outcomes of Throwing Darts
On or Off Location
Low or High Dispersion

7
Lecture Exercise Identify On/Off Target
High/Low Dispersion for each
x
x
x
x
x
x
A. _________
B. __________
x
x
x
x
x
x
D. __________
C. __________
8
II. Measures of Location

Mean
Median
Trimmed Mean

9
The Mean
The average (mean) of the n numbers
Population mean
10
Mean (Average, Expected Value)

The Mean (also known as the average or the
Expected Value) is a measure of the center (of
mass, or of gravity) of a distribution.
Typical notation used to represent the mean of a
sample of data is the Greek letter mu or the
Latin letter m, or E(X), read Expected value of
X
Example suppose five students take a test and
their scores are 70, 68, 71, 69 and 98.
Mean (7068716998)/5 75.2

11
Median
The sample median, is the middle value in a
set of data that is arranged in ascending order.
For an even number of data points the median is
the average of the middle two.
Population median
12
Median

Median (also known as the 50th percentile) is the
middle observation in a data set.
Rank the data set and select the middle value.
If odd number of observations, the middle value
is observation N 1 / 2.
If even number of observations, the middle value
is extrapolated as midway between observation
numbers N / 2 and N / 2 1.
Prior data values 68, 69, 70, 71, and 98.
Median is 70.
If another student with a score of 60 was
included, the new median would result in 69.5 (69
70 / 2).

13
Mean Vs. Median

Which is a better measure of location for the
following set of test scores?
68, 70, 69, 71, and 98
Mean 75.2 Median 70.0

14
Trimmed Mean

Trimmed Mean is a compromise between mean and
median.
10 Trimmed Mean
First, eliminate smallest 10 of values and
largest 10 of values.
Then, re-compute the mean.
Trimmed means gaining popularity
Less sensitive than the mean to outliers, but not
as robust as the median value.

15
Trimmed Mean (Example from Devore Textbook)

Variable life (hours) of incandescent lamps.
Sample size 20
How many values will be trimmed in 10 TM?
Mean 965.0 Median 1009.5 Trim
Mean 971.4
How are these values impacted by sample size, by
distribution?
What might be some useful applications?

16
III. Measures of Dispersion

Range
Standard Deviation
Variance

17
Range

The Range is the maximum value in a data set
minus the minimum value.
Example Test Scores 70, 68, 71, 69 and 98.
Range 98 - 68 30.
Note the range is often preferred over the
standard deviation for small data sets (e.g., if
of observations for a sample data set lt 10).

18
Standard Deviation

Sample Standard deviation (StDev), S measures the
dispersion of the individual observations from
the mean.
For a sample data set, standard deviation is also
referred to as the sample standard deviation or
the root-mean-square Srms
Units for S are the same as for the variable
being analyzed.
E.g., if we measure mpg, then S will be in mpg.

19
Why divide by n-1?

n 1 is often referred to as the degrees of
freedom.
Variety of reasons
Corrects underestimating bias Xis are closer
to the sample mean (X-bar) than population mean
(m).
Since we use a statistic (X-bar) in our standard
deviation calculation, we have placed a
restriction on one of the Xis.
Suppose you have 4 values. If you are told the
mean 4, X1 3 X2 5 and X32 then X4 is
restricted or can be calculated based on the
mean, X1 , X2 , and X3.

20
Effects of Extreme Values

Test scores 70, 68, 71, 69 and 98,
sample standard deviation is 12.79.
Suppose you exclude the score of 98,
sample standard deviation is reduced to 1.3!
Standard deviation may be severely influenced by
extreme values in sample data set (Note these
values may not necessarily be mistakes).
We may reduce the effects of any individual
observation by increasing the sample size.

21
Variance

Variance is the square of the standard deviation.
Represents the average squared deviation of each
observation from the sample mean.
Prior Example where std deviation 12.79
Variance (12.79)2 163.72

22
Properties of s2
Let x1, x2,,xn be any sample and c be any
nonzero constant.
where is the sample variance of the xs and
is the sample variance of the ys.
23
Why Use Variance

Variance is often used because of its additive
properties.
Suppose you are assembling two independent wood
blocks, each has a std deviation of 2 mm.

s2AB s2A s2B sAB sA sB
Not True!
Basic Algebra!! a2 b2 ? (ab)2 Example 22
22 4 4 8 2 2 4, 42 16
24
Three Different Shapes for a Population
Distribution
symmetric
positive skew
negative skew
25
Skewness

Some software packages provide skewness
Skewness is a measure of relative (a)symmetry.
Zero skewness symmetric
Positive skewness longer right tail
Negative skewness longer left tail
Actual calculation outside scope of class

26
Kurtosis

Some software packages provide kurtosis
Kurtosis (K) is a measure of the peakedness of
a distribution (relative to normal).
K 3 ? normal, bell-shaped distribution
(mesokurtic) --(Note some software normal0)
K lt 3 (or negative relative to 0) ? flatter peak,
fatter shoulders, shorter tails
K gt 3 (or positive relative to 0) ? more peaked
than normal with longer tails

Actual calculation outside scope of class
27
Using Software to Calculate Descriptive Statistics

In practice, we rarely calculate statistics by
hand.
In MS Excel, can use these functions
Mean ? average(array)
Median ? median(array)
Std Dev? stdev(array)
Variance? var(array)
Range ? max(array)-min(arrary)

28
Minitab Results

All advanced statistical software will
automatically compute descriptive statistics.

Descriptive Statistics Score Variable
N Mean Median TrMean StDev
SE Mean Score 16 82.78
83.50 83.32 9.17 2.29 Variable
Minimum Maximum Score 63.00
95.00
29
V. Box Plots
Q3 75th Percentile Median 50th Percentile Q1
25th Percentile fs Q3 Q1 Upper Limit Q3
1.5 fs Lower Limit Q1 1.5 fs
Extreme Outlier(s)
Mild Outlier(s)

Upper Whisker Highest value within upper limit

Third quartile (Q3) or Upper fourth
Median
First quartile (Q1) or Lower fourth

Lower Whisker Lowest value within lower limit
30
Upper and Lower Fourths
After the n observations in a data set are
ordered from smallest to largest, the lower
(upper) fourth is the median of the smallest
(largest) half of the data, where the median
is included in both halves if n is odd. A
measure of the spread that is resistant to
outliers is the fourth spread fs upper
fourth lower fourth.
31
Box Plot differences in notation/calculation

Minitab calculates quartiles (Q1, Q3)
Some textbooks (including Devore) refer to lower
and upper fourths
Roughly the same, but with some differences
Lower fourthmedian of the smallest n/2 obs, n
even OR median of the smallest (n1)/2, n odd
Q1 observation at position (n1)/4 (if not an
integer then interpolate)
Upper fourth median of the largest n/2 obs, n
even OR median of the largest (n1)/2, n odd
Q3 observation at position 3(n1)/4 (if not an
integer then interpolate)

32
Box Plot Information

Box Plots Show
Location line for median
Note some software will also include a dot for
mean.
Dispersion box shows the 25th 75th percentile
value range.
Departures from symmetry one box or whisker can
be larger than the other side suggesting a lack
of symmetry.
Identification of mild and extreme outliers.

33
Box Plot - MPG Example
34
Box Plots Vs. Histogram

Note wider box to left of median in box plot
suggests more spread to left than right.
Similar pattern is shown in the histogram.

Median 20.1
Median 20.1
35
Multiple Box Plot Example

For MPG data, suppose you also collected data for
tire pressures (grouped as normal or low)
Does this stratification variable help explain
bi-modal distribution?

36
Outliers
Any observation farther than 1.5fs from the
closest fourth is an outlier. An outlier is
extreme if it is more than 3fs from the nearest
fourth, and it is mild otherwise.
37
Boxplots
upper fourth
lower fourth
median
extreme outliers
mild outliers

Write a Comment

User Comments (0)