Numerical Descriptive Measures - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

Numerical Descriptive Measures

Description:

Numerical Descriptive Measures Summary Measures Measures of Central Tendency Various ways to describe the central, most common or middle value in a distribution or ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 51
Provided by: cobJmuEd6
Category:

less

Transcript and Presenter's Notes

Title: Numerical Descriptive Measures


1
  • Chapter 3
  • Numerical Descriptive Measures

2
Summary Measures
Central Tendency
Variation
Mean
Mode
Quartile
Median
Range
Variance
Standard Deviation
Geometric Mean
Coefficient of Variation
3
Measures of Central Tendency
  • Various ways to describe the central, most common
    or middle value in a distribution or set of data
  • The Mean (Arithmetic Mean)
  • The Median
  • The Mode
  • The Geometric Mean

4
The Mean
  • Equals the sum of all observations or values
    divided by the number of values
  • The Most Common Measure of Central Tendency
  • Generally called the Average in common usage
  • Population mean vs. Sample mean

5
The Mean
  • Population mean µ (mu)
  • Recall N population size
  • Sample mean X (x-bar)
  • Recall n sample size
  • S Sum
  • Sx Sum of all values of x

6
The Mean
  • Affected by Extreme Values (Outliers)
  • Note how a difference of one value affects the
    mean in the example below

0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10 12
14
Mean 5
Mean 6
7
The Median
  • The Middle number
  • The most Robust measure of Central Tendency
  • Not affected by extreme values
  • In an ordered array, the Median (n1)/2 ordered
    observation.
  • If n or N is odd, the median is the middle number
  • If n or N is even, the median is the average of
    the 2 middle numbers

0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10 12
14
Median 5
Median 5
8
Mode
  • The Value that Occurs Most Often
  • Not Affected by Extreme Values
  • There may be gt1 Modes
  • There may be no Mode
  • Can be used for Numerical or Categorical Data

0 1 2 3 4 5 6
0 1 2 3 4 5 6 7 8 9 10 11
12 13 14
No Mode
Mode 9
9
Example
Jim has 20 problems to do for homework. Some are
harder than others and take more time to solve.
We take a random sample of 9 problems. Find the
mean, median and mode for the number of minutes
Jim spends on his homework.
10
Solution Mean
Sample size (n) 9 Problems 1 through 9 x1,
x2, x3 x9, respectively. Sx (12 4 3 8
7 5 4 9 11) 63 minutes Sx/n 63/9
7 minutes
11
Solution Median
Place the data in ascending order as at
right. (n1)/2 (91)/2 5 The 5th ordered
observation is 7 and so is the Median.
12
Solution Mode
Since the data is already arranged in order from
smallest to largest we will keep it that
way. Only the value 4 occurs gt1 time. The Mode
is 4.
13
Solution Excel/PHStat
  • Equations
  • Mean (AVERAGE)
  • Median (MEDIAN)
  • Mode (MODE)
  • Excel worksheet

14
Approximating the Mean from a Frequency
Distribution
  • Used when the only source of data is a frequency
    distribution

15
Example
  • X ((153) (256) (355) (454)
    (552))/20
  • (45 150 175 180 110)/20
  • 660/20
  • 33

16
Measures of Variation
Variation
Variance
Standard Deviation
Coefficient of Variation
Range
Population Variance
Population Standard Deviation
Sample Variance
Sample Standard Deviation
Interquartile Range
17
Range
  • Difference between the Largest and the Smallest
    Observations
  • Ignores the distribution of data

Range 12 - 7 5
Range 12 - 7 5
7 8 9 10 11 12
7 8 9 10 11 12
18
Quartiles
  • Quartiles Split Ordered Data into 4 equal
    portions
  • Q1 and Q3 are Measures of Non-central Location
  • Q2 the Median

25
25
25
25
19
Quartiles
  • Each Quartile has position and value
  • With the data in an ordered array, the position
    of Qi is
  • The value of Qi is the value associated with that
    position in the ordered array
  • Example

Data in Ordered Array 11 12 13 16 16
17 18 21 22
20
Quartiles Example
Find the 1st and 3rd Quartiles in the ordered
observations at right. Position of Q1 1(91)/4
2.5 The 2.5th observation (44)/2 4
Position of Q3 3(91)/4 3(Q1) 7.5 The
7.5th observation (911)/2 10
21
Interquartile Range (IQR)
  • The difference between Q1 and Q3
  • The middle 50 of the values
  • Also Known as Midspread
  • Resistant to extreme values
  • Example
  • Q1 12.5, Q3 17.5
  • 17.5 12.5 5
  • IQR 5

11 12 13 16 16 17 17 18 21
22
Range and IQR Example
Find the Range and the Interquartile Range in
this distribution. Range largest smallest
12 3 9. Position of Q1 1(91)/4 2.5 The
2.5th observation (44)/2 4 Position of Q3
3(91)/4 3(Q1) 7.5 The 7.5th observation
(911)/2 10 IQR 10 4 6
23
Variance
  • Shows Variation about the Mean
  • Population Variance (s2)
  • Sample Variance (S2)

24
Standard Deviation (SD)
  • Most Important Measure of Variation
  • The square root of the variance
  • Shows Variation about the Mean
  • Population Standard
    Deviation (s)
  • Sample Standard Deviation (S)

25
Variance and Standard Deviation
  • Both measure the average scatter about the mean
  • Variance computations produce squared units
    which makes interpretation more difficult
  • For example, 2 is meaningless.
  • Since it is the square root of the Variance, the
    Standard Deviation is expressed in the same units
    as the original data
  • Therefore, the Standard Deviation is the most
    commonly used measure of variation

26
A trivial Standard Deviation Example
Sample values 3, 4, 5 Sample mean 4 n 3
  • S (3-4)2 (4-4)2 (5-4)2
  • 3-1
  • S -12 02 12
  • 2
  • S 2/2 1

27
Comparing Standard Deviations
  • Greater S (or s) more dispersion of data

Data A
Mean 15.5 s 3.338
11 12 13 14 15 16 17 18
19 20 21
Data B
Mean 15.5 s .9258
11 12 13 14 15 16 17 18
19 20 21
Data C
Mean 15.5 s 4.57
11 12 13 14 15 16 17 18
19 20 21
28
Approximating SD from a Frequency Distribution
  • Used when the raw data are not available and the
    only source of data is a frequency distribution

29
Approximating SD from a Frequency Distribution
X 33 S2 (3(15-33)2 6(25-33)2 5(35-33)2
4(45-33)2 2(55-33)2)/ 20-1 S2 (972 384 20
576 968)/19 2920/19 S2 153.7 S 12.4
30
PhStat
  • Dispersion
  • Qi QUARTILE
  • S2 VAR
  • S STDEV
  • s2 VARP
  • s STDEVP
  • Excel worksheet

31
Coefficient of Variation
  • Measure of Relative Variation
  • Shows Variation Relative to the Mean
  • Used to Compare Two or More Sets of Data Measured
    in Different Units
  • S Sample Standard Deviation
  • X Sample Mean

32
Comparing Coefficientof Variation
  • Stock A
  • Average price last year 50
  • Standard deviation 2
  • Stock B
  • Average price last year 100
  • Standard deviation 5
  • Coefficient of Variation
  • Stock A
  • Stock B

33
The Z - Score
  • Z Score difference between the value and the
    mean, divided by the standard deviation
  • Useful in identifying outliers (extreme values)
  • Outliers are values in a data set that are
    located far from the mean
  • The larger the Z Score, the larger the distance
    from the mean
  • A Z Score is considered an outlier if it is lt
    -3.0 or gt 3.0
  • Formulae

Z X - X S
Z X - µ s
OR
34
Z Score Example
  • If our sample mean 25, and our sample standard
    deviation is 10, is a value of 65 an outlier?
  • 4.0 gt 3.0, so 65 is an outlier.

Z X - X S
65 - 25 40 4.0 10 10
35
Shape of a Distribution
36
The Empirical Rule
  • For most data sets
  • 68 of the Observations Fall Within () 1
    Standard Deviation Around the Mean
  • 95 of the Observations Fall Within () 2 SD
    Around the Mean
  • 99.7 of the Observations Fall Within () 3 SD
    Around the Mean
  • More accurate for fairly symmetric data sets

37
The Empirical Rule
68 of the Observations Fall Within () 1
Standard Deviation Around the Mean
68
-1sd µ 1sd
38
The Empirical Rule
95 of the Observations Fall Within () 2 SD
Around the Mean
95
-2sd -1sd µ 1sd
2sd
39
The Empirical Rule
99.7 of the Observations Fall Within () 3 SD
Around the Mean
99.7
-3sd -2sd -1sd µ 1sd
2sd 3sd
40
The Bienayme-Chebyshev Rule
  • The Percentage of Observations Contained Within
    Distances of k Standard Deviations Around the
    Mean Must Be at Least
  • Applies regardless of the shape of the data set

41
The Bienayme-Chebyshev Rule
  • At least () 75 of the observations must be
    contained within distances of 2 SD around the
    mean
  • At least () 88.89 of the observations must be
    contained within distances of 3 SD around the
    mean
  • At least () 93.75 of the observations must be
    contained within distances of 4 SD around the mean

42
The Bienayme-Chebyshev Rule
75
- 2sd µ 2sd
43
The Bienayme-Chebyshev Rule
75
88.89
- 3sd - 2sd µ 2sd
3sd
44
The Bienayme-Chebyshev Rule
75
88.89
93.75
- 4sd - 3sd - 2sd µ
2sd 3sd 4sd
45
Shape of a Distribution
  • Describe How Data are Distributed
  • Measures of Shape
  • Symmetric or Skewed (asymmetric)

Right-Skewed
Left-Skewed
Symmetric
Mean lt Median lt Mode
Mean Median Mode

Mode lt Median lt Mean
46
The Box Plot (Box-and-Whisker)
  • 5 number summary
  • Median, Q1, Q3, Xsmallest, Xlargest
  • Box Plot
  • Graphical display of data using 5-number summary

12
4
6
8
10
X
largest
X
Median
smallest
47
Distribution Shape Box Plot
Right-Skewed
Left-Skewed
Symmetric
48
Correlation Coefficient
  • Correlation Coefficient r
  • Unit Free
  • Measures the strength of the linear relationship
    between 2 quantitative variables
  • Ranges between 1 and 1
  • The Closer to 1, the stronger the negative
    linear relationship becomes
  • The Closer to 1, the stronger the positive linear
    relationship becomes
  • The Closer to 0, the weaker any linear
    relationship becomes

49
Scatter Plots of Data with Various Correlation
Coefficients
Y
Y
Y
X
X
X
r -1
r -.6
r 0
Y
Y
X
X
r 1
r .6
50
Pitfalls in Numerical Descriptive Measures and
Ethical Issues
  • Data Analysis is Objective
  • Should report the summary measures that best meet
    the assumptions about the data set
  • Data Interpretation is Subjective
  • Should be done in a fair, neutral and clear
    manner
  • Ethical Issues
  • Should document both good and bad results
  • Presentation should be fair, objective and
    neutral
  • Should not use inappropriate summary measures to
    distort the facts
Write a Comment
User Comments (0)
About PowerShow.com