4: Summary Statistics - PowerPoint PPT Presentation

About This Presentation
Title:

4: Summary Statistics

Description:

Chapter 4: Summary Statistics In Chapter 3 we used stemplots to look at shape, central location, and spread of a distribution. In this chapter we use numerical ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 36
Provided by: sjsuEdufa4
Learn more at: https://www.sjsu.edu
Category:

less

Transcript and Presenter's Notes

Title: 4: Summary Statistics


1
Chapter 4 Summary Statistics
2
In Chapter 3
  • we used stemplots to look at shape, central
    location, and spread of a distribution.
  • In this chapter we use numerical summaries to
    look at central location and spread.

3
Summary Statistics
  • Central location statistics
  • Mean
  • Median
  • Mode
  • Spread statistics
  • Range
  • Interquartile range (IQR)
  • Variance and standard deviation
  • Shape statistics exist but are seldom used in
    practice (not covered)

4
Notation
  • n ? sample size
  • X ? variable (e.g., ages of subjects)
  • xi ? value for individual i
  • ? ? sum all values (capital sigma)
  • Example Let X AGE (n 10)
  • 21 42 5 11 30 50 28 27 24 52 
  • x1 21, x2 42, , x10 52
  • ?xi x1 x2 x10 21 42 52 290

5
Central Location Sample Mean
Most common measure of central location
  • For the data on the previous slide

6
Example Sample Mean
The mean is the gravitational center of a batch
of numbers
7
Gravitational Center
A skew tips the distribution causing the mean to
shift toward the tail
8
Uses of the Sample Mean
  • Predicts value of an observation drawn at random
    from the sample
  • Predicts value of an observation drawn at random
    from the population
  • Predicts population mean µ

9
Population Mean
  • Same operation as sample mean applied to entire
    population (N population size)
  • Not readily (never?) available in practice but
    conceptually important

10
Central Location Median
  • The value with a depth of (n1) / 2
  • When n is even ? median is obvious
  • When n is even ? average the two middle values
  • Example (below) Depth (M) (101) / 2 5.5
  • Median Average (27 and 28) 27.5

05 11 21 24 27 28 30 42 50
52?median Average the adjacent values M 27.5
11
More Examples
  • Example A 2 4 6
  • Median 4
  • Example B 2 4 6 8
  • Median 5 (average of 4 and 6)
  • Example C 6 2 4
  • Median ? 2
  • (Values must be ordered first)

12
The Median is Robust
The median is resistant to skews and outlier
This data set has x-bar 1636 1362 1439
1460 1614 1666 1792 1867
Same data set with a data entry error
highlighted. 1362 1439 1460 1614 1666
1792 9867 This data has x-bar 2743
The median is 1614 in both instances,
13
Mode
  • Most frequent value in the dataset
  • This data set has a mode of 7 4, 7, 7, 7, 8, 8,
    9
  • This data set has no mode 4, 6, 7, 8 (each
    point appears once)
  • The mode is useful only in large data sets with
    repeating values

14
Comparison of Mean, Median, Mode
Mean gets pulled by tail mean median ?
symmetrical mean gt median ? positive skew mean
lt median ? negative skew
15
Spread extent to which data vary around middle
point
Site 1 Site 2---------------
422 82 23234 8636689
240 4 5 5 6
86 10particulates in air (µg/m3)
Site 1 exhibits much greater spread (visually)
16
Spread Range
  • Range maximum minimum
  • Illustrative example
  • Site 1 range 68 22 46
  • Site 2 range 40 32 8
  • The sample range is not a good measure of spread
    tends to underestimate population range
  • Always supplement the range with at least one
    addition measure of spread

Site 1 Site 2---------------- 422
82 23234 8636689 240
4 5 5 6 86
10
17
Spread Interquartile Range
  • Quartile 1 (Q1) marks bottom quarter of data
    middle of the lower half of the data set
  • Quartile 3 (Q3) marks top quarter of data
    middle of the top half of data set
  • Interquartile Range (IQR) Q3 Q1 covers
    middle 50 of \distribution

05 11 21 24 27 28 30 42 50 52
? ? ? Q1
median Q3 Q1 21, Q3 42, and IQR 42
21 21
18
Five-Point Summary
  • Q0 (the minimum)
  • Q1 (25th percentile)
  • Q2 (median)
  • Q3 (75th percentile)
  • Q4 (the maximum)

05 11 21 24 27 28 30 42 50 52
? ? ? Q1
median Q3 5 point summary 5, 21, 27.5,
42, 52
19
Quartiles Tukeys HingesData metabolic rates
(cal/day), n 7
1362 1439 1460 1614 1666 1792 1867
?median
  • When n is odd, include the median in both
    halves
  • Bottom half 1362 1439 1460 1614
  • Top half 1614 1666 1792 1867

20
4.6 Boxplots
  • Draw box from Q1 to Q3
  • Draw line for median.
  • Calculate fencesFenceLower Q1
    1.5(IQR)FenceUpper Q3 1.5(IQR)
  • Do not draw fences
  • Any values outside the fences (outside values).
    are plotted separately.
  • Determine most extreme values still inside the
    fences (inside values)
  • Draw whiskers quartiles to inside values

21
Example 1 Boxplot
Data 05 11 21 24 27 28 30 42 50
52
  • 5 pt summary 5, 21, 27.5, 42, 52Box from 21
    to 42 with line _at_ 27.5
  • IQR 42 21 21.FU Q3 1.5(IQR) 42
    (1.5)(21) 73.5FL Q1 1.5(IQR) 21
    (1.5)(21) 10.5
  • No values above upper fence None values below
    lower fence
  • Upper inside value 52Lower inside value
    5Draws whiskers

22
Example 2 Boxplot
Data 3 21 22 24 25 26 28 29 31
51
  • 5-point summary 3, 22, 25.5, 29, 51 hinges at
    22 and 29
  • IQR 29 22 7FU Q3 1.5(IQR) 29
    (1.5)(7) 39.5FL Q1 1.5(IQR) 22
    (1.5)(7) 11.6
  • One upper outside value (51)One lower outside
    value (3)
  • Upper inside value is 31Lower inside value is
    21Draw whiskers

23
Example 3 Boxplot
Seven metabolic rates (cal / day) 1362 1439
1460 1614 1666 1792 1867
  • 5-point summary 1362, 1449.5, 1614, 1729, 1867
  • IQR 1729 1449.5 279.5
  • FU Q3 1.5(IQR) 1729 1.5(279.5)
    2148.25
  • FL Q1 1.5(IQR) 1449.5 1.5(279.5)
    1030.25
  • 3. None outside
  • 4. Inside values 1867 and 1362

24
Boxplots Interpretation
  • Central location position of median and box
    (IQR)
  • Spread Hinge-spread (IQR), whisker spread, range
  • Shape symmetry of median within box and box
    within whiskers, tail length (kurtosis), outside
    values

25
Spread Standard Deviation
  • The standard deviation is the most popular
    measure of spread
  • s population standard deviation
  • s sample standard deviation
  • Based on deviations around the mean

26
Deviations
Deviation distance from the mean
This example shows a deviation of -3 for the data
point 33 It show a deviation of 4 for data point
40
27
Sum of squares
28
Sum of Squares (SS), variance (s2), Standard
Deviation (s)
29
Variance Standard Deviation
Sample variance
Standard deviation
30
Interpretation of Sample Standard Deviation s
  • Measure spread
  • Estimator of population standard deviation ?
  • 68-95-99.7 rule (Normal distributions)
  • Chebychevs rule (all distributions)

31
68-95-99.7 Rule
  • Applies to Normal distributions only!
  • 68 of values within µ s
  • 95 within µ 2s
  • 99.7 within µ 3s

Example Normal distribution with µ 30 and s
10 68 of values in 30 10 20 to 40 95 in
30 (2)(10) 10 to 50 99.7 in 30 (3)(10) 0
to 60
32
Chebychevs Rule
  • Applies to all distributions
  • At least 75 of the values within µ 2s
  • Example Distribution with µ 30 and s 10 has
    at least 75 of the values within 30 (2)(10)
    30 20 10 to 50

33
Rounding
  • There is no single rule for rounding.
  • The number of significant digits should reflect
    the precision of the measurement
  • Use judgment and be kind to your reader
  • Rough guide carry at least four significant
    digits during calculations round as final step

34
Choosing Summary Statistics
  • Always report a measure of central location, a
    measure of spread, and the sample size
  • Symmetrical distributions ? report the mean and
    standard deviation
  • Asymmetrical distributions ? report the 5-point
    summaries (or median and IQR)

35
Software and Calculators
Use em
Write a Comment
User Comments (0)
About PowerShow.com