Title: Numerical Descriptive Measures
1- Chapter 3
- Numerical Descriptive Measures
2Summary Measures
Central Tendency
Variation
Mean
Mode
Quartile
Median
Range
Variance
Standard Deviation
Geometric Mean
Coefficient of Variation
3Measures of Central Tendency
- Various ways to describe the central, most common
or middle value in a distribution or set of data - The Mean (Arithmetic Mean)
- The Median
- The Mode
- The Geometric Mean
4The Mean
- Equals the sum of all observations or values
divided by the number of values - The Most Common Measure of Central Tendency
- Generally called the Average in common usage
- Population mean vs. Sample mean
5The Mean
- Population mean µ (mu)
- Recall N population size
- Sample mean X (x-bar)
- Recall n sample size
- S Sum
- Sx Sum of all values of x
6The Mean
- Affected by Extreme Values (Outliers)
- Note how a difference of one value affects the
mean in the example below
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10 12
14
Mean 5
Mean 6
7The Median
- The Middle number
- The most Robust measure of Central Tendency
- Not affected by extreme values
- In an ordered array, the Median (n1)/2 ordered
observation. - If n or N is odd, the median is the middle number
- If n or N is even, the median is the average of
the 2 middle numbers
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10 12
14
Median 5
Median 5
8Mode
- The Value that Occurs Most Often
- Not Affected by Extreme Values
- There may be gt1 Modes
- There may be no Mode
- Can be used for Numerical or Categorical Data
0 1 2 3 4 5 6
0 1 2 3 4 5 6 7 8 9 10 11
12 13 14
No Mode
Mode 9
9Example
Jim has 20 problems to do for homework. Some are
harder than others and take more time to solve.
We take a random sample of 9 problems. Find the
mean, median and mode for the number of minutes
Jim spends on his homework.
10Solution Mean
Sample size (n) 9 Problems 1 through 9 x1,
x2, x3 x9, respectively. Sx (12 4 3 8
7 5 4 9 11) 63 minutes Sx/n 63/9
7 minutes
11Solution Median
Place the data in ascending order as at
right. (n1)/2 (91)/2 5 The 5th ordered
observation is 7 and so is the Median.
12Solution Mode
Since the data is already arranged in order from
smallest to largest we will keep it that
way. Only the value 4 occurs gt1 time. The Mode
is 4.
13Solution Excel/PHStat
- Equations
- Mean (AVERAGE)
- Median (MEDIAN)
- Mode (MODE)
- Excel worksheet
14Approximating the Mean from a Frequency
Distribution
- Used when the only source of data is a frequency
distribution
15Example
- X ((153) (256) (355) (454)
(552))/20 - (45 150 175 180 110)/20
- 660/20
- 33
16Measures of Variation
Variation
Variance
Standard Deviation
Coefficient of Variation
Range
Population Variance
Population Standard Deviation
Sample Variance
Sample Standard Deviation
Interquartile Range
17Range
- Difference between the Largest and the Smallest
Observations - Ignores the distribution of data
Range 12 - 7 5
Range 12 - 7 5
7 8 9 10 11 12
7 8 9 10 11 12
18Quartiles
- Quartiles Split Ordered Data into 4 equal
portions - Q1 and Q3 are Measures of Non-central Location
- Q2 the Median
25
25
25
25
19Quartiles
- Each Quartile has position and value
- With the data in an ordered array, the position
of Qi is - The value of Qi is the value associated with that
position in the ordered array - Example
Data in Ordered Array 11 12 13 16 16
17 18 21 22
20Quartiles Example
Find the 1st and 3rd Quartiles in the ordered
observations at right. Position of Q1 1(91)/4
2.5 The 2.5th observation (44)/2 4
Position of Q3 3(91)/4 3(Q1) 7.5 The
7.5th observation (911)/2 10
21Interquartile Range (IQR)
- The difference between Q1 and Q3
- The middle 50 of the values
- Also Known as Midspread
- Resistant to extreme values
- Example
- Q1 12.5, Q3 17.5
- 17.5 12.5 5
- IQR 5
11 12 13 16 16 17 17 18 21
22Range and IQR Example
Find the Range and the Interquartile Range in
this distribution. Range largest smallest
12 3 9. Position of Q1 1(91)/4 2.5 The
2.5th observation (44)/2 4 Position of Q3
3(91)/4 3(Q1) 7.5 The 7.5th observation
(911)/2 10 IQR 10 4 6
23Variance
- Shows Variation about the Mean
- Population Variance (s2)
- Sample Variance (S2)
24Standard Deviation (SD)
- Most Important Measure of Variation
- The square root of the variance
- Shows Variation about the Mean
- Population Standard
Deviation (s) - Sample Standard Deviation (S)
25Variance and Standard Deviation
- Both measure the average scatter about the mean
- Variance computations produce squared units
which makes interpretation more difficult - For example, 2 is meaningless.
- Since it is the square root of the Variance, the
Standard Deviation is expressed in the same units
as the original data - Therefore, the Standard Deviation is the most
commonly used measure of variation
26A trivial Standard Deviation Example
Sample values 3, 4, 5 Sample mean 4 n 3
- S (3-4)2 (4-4)2 (5-4)2
- 3-1
- S -12 02 12
- 2
- S 2/2 1
27Comparing Standard Deviations
- Greater S (or s) more dispersion of data
Data A
Mean 15.5 s 3.338
11 12 13 14 15 16 17 18
19 20 21
Data B
Mean 15.5 s .9258
11 12 13 14 15 16 17 18
19 20 21
Data C
Mean 15.5 s 4.57
11 12 13 14 15 16 17 18
19 20 21
28Approximating SD from a Frequency Distribution
- Used when the raw data are not available and the
only source of data is a frequency distribution
29Approximating SD from a Frequency Distribution
X 33 S2 (3(15-33)2 6(25-33)2 5(35-33)2
4(45-33)2 2(55-33)2)/ 20-1 S2 (972 384 20
576 968)/19 2920/19 S2 153.7 S 12.4
30PhStat
- Dispersion
- Qi QUARTILE
- S2 VAR
- S STDEV
- s2 VARP
- s STDEVP
- Excel worksheet
31Coefficient of Variation
- Measure of Relative Variation
- Shows Variation Relative to the Mean
- Used to Compare Two or More Sets of Data Measured
in Different Units -
- S Sample Standard Deviation
- X Sample Mean
32Comparing Coefficientof Variation
- Stock A
- Average price last year 50
- Standard deviation 2
- Stock B
- Average price last year 100
- Standard deviation 5
- Coefficient of Variation
- Stock A
- Stock B
33The Z - Score
- Z Score difference between the value and the
mean, divided by the standard deviation - Useful in identifying outliers (extreme values)
- Outliers are values in a data set that are
located far from the mean - The larger the Z Score, the larger the distance
from the mean - A Z Score is considered an outlier if it is lt
-3.0 or gt 3.0 - Formulae
Z X - X S
Z X - µ s
OR
34Z Score Example
- If our sample mean 25, and our sample standard
deviation is 10, is a value of 65 an outlier? - 4.0 gt 3.0, so 65 is an outlier.
Z X - X S
65 - 25 40 4.0 10 10
35Shape of a Distribution
36The Empirical Rule
- For most data sets
- 68 of the Observations Fall Within () 1
Standard Deviation Around the Mean - 95 of the Observations Fall Within () 2 SD
Around the Mean - 99.7 of the Observations Fall Within () 3 SD
Around the Mean - More accurate for fairly symmetric data sets
37The Empirical Rule
68 of the Observations Fall Within () 1
Standard Deviation Around the Mean
68
-1sd µ 1sd
38The Empirical Rule
95 of the Observations Fall Within () 2 SD
Around the Mean
95
-2sd -1sd µ 1sd
2sd
39The Empirical Rule
99.7 of the Observations Fall Within () 3 SD
Around the Mean
99.7
-3sd -2sd -1sd µ 1sd
2sd 3sd
40The Bienayme-Chebyshev Rule
- The Percentage of Observations Contained Within
Distances of k Standard Deviations Around the
Mean Must Be at Least - Applies regardless of the shape of the data set
41The Bienayme-Chebyshev Rule
- At least () 75 of the observations must be
contained within distances of 2 SD around the
mean - At least () 88.89 of the observations must be
contained within distances of 3 SD around the
mean - At least () 93.75 of the observations must be
contained within distances of 4 SD around the mean
42The Bienayme-Chebyshev Rule
75
- 2sd µ 2sd
43The Bienayme-Chebyshev Rule
75
88.89
- 3sd - 2sd µ 2sd
3sd
44The Bienayme-Chebyshev Rule
75
88.89
93.75
- 4sd - 3sd - 2sd µ
2sd 3sd 4sd
45Shape of a Distribution
- Describe How Data are Distributed
- Measures of Shape
- Symmetric or Skewed (asymmetric)
Right-Skewed
Left-Skewed
Symmetric
Mean lt Median lt Mode
Mean Median Mode
Mode lt Median lt Mean
46The Box Plot (Box-and-Whisker)
- 5 number summary
- Median, Q1, Q3, Xsmallest, Xlargest
- Box Plot
- Graphical display of data using 5-number summary
12
4
6
8
10
X
largest
X
Median
smallest
47Distribution Shape Box Plot
Right-Skewed
Left-Skewed
Symmetric
48Correlation Coefficient
- Correlation Coefficient r
- Unit Free
- Measures the strength of the linear relationship
between 2 quantitative variables - Ranges between 1 and 1
- The Closer to 1, the stronger the negative
linear relationship becomes - The Closer to 1, the stronger the positive linear
relationship becomes - The Closer to 0, the weaker any linear
relationship becomes
49Scatter Plots of Data with Various Correlation
Coefficients
Y
Y
Y
X
X
X
r -1
r -.6
r 0
Y
Y
X
X
r 1
r .6
50Pitfalls in Numerical Descriptive Measures and
Ethical Issues
- Data Analysis is Objective
- Should report the summary measures that best meet
the assumptions about the data set - Data Interpretation is Subjective
- Should be done in a fair, neutral and clear
manner - Ethical Issues
- Should document both good and bad results
- Presentation should be fair, objective and
neutral - Should not use inappropriate summary measures to
distort the facts