Describing distributions with numbers - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Describing distributions with numbers

Description:

... a graph of five number ... the sum of square by n-1, where n is the number of all ... is just the positive square root of the variance. ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 24
Provided by: haiyi
Learn more at: http://www.cs.gsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Describing distributions with numbers


1
Chapter 2
Describing distributions with numbers
2
Chapter Outline
  • 1. Measuring center the mean
  • 2. Measuring center the median
  • 3. Comparing the mean and the median
  • 4. Measuring spread the quartiles
  • 5. The five-number summary and boxplots
  • 6. Measuring spread the standard deviation
  • 7. Choosing measures of center and spread

3
Measuring center the mean
  • Notation
  • It is simply the ordinary arithmetic average.
  • Suppose that we have n observations (data size,
    number of individuals).
  • Observations are denoted as x1, x2, x3, xn.

4
Measuring center the mean
  • How to get ?
  • Example 2.1 (P.33)

5
Measuring center the median
  • Notation M
  • Median M is the midpoint of a distribution? half
    the observations are smaller than M and the
    other half are larger than M.

6
Measuring center the median
  • How to find M?
  • 1. Sort all observations in increasing order
    (This step is important!!!)
  • 2. If n is odd, observation is M. if n is
    even, average of two center values is M.
  • Note that is the location of the median
    in the ordered list, not the median value.

7
Measuring center the median
  • Examples
  • Case 1. 11, 21, 13, 24, 15, 26, 17
  • Case 2. 11, 21, 13, 24, 15, 26
  • Example 2.2, 2.3 (P.35)

8
Mean vs. Median
  • Median is more resistant than the mean.
  • The mean and median of a symmetric distribution
    are close together. If the distribution is
    exactly symmetric, the mean and median are
    exactly the same. In a skewed distribution, the
    mean is farther out in the long tail than is the
    median.
  • Example
  • 1, 2, 3, 4, 5, 6, 10000

9
Inference
  • Strongly skewed distributions are reported with
    median than the mean.

10
Measuring Spread The Quartiles
  • The quartiles mark out the middle half of the
    distribution.

11
Calculating the Quartiles
  • Step1.
  • Arrange the observations in increasing order
    and locate the median M in the ordered list of
    observations.
  • Step2.
  • The first quartile Q1 is the median of the
    observations whose position in the ordered list
    is to the left of the location of the overall
    median.
  • Step3.
  • The third quartile Q3 is the median of the
    observations whose position in the ordered list
    is to the right of the location of the overall
    median.

12
Measuring spread the quartiles
  • Example 2.4 (P. 37)
  • Example 2.5 (P. 38)
  • Note
  • (1) It is important to sort data first before
  • we try to find quartiles!
  • (2) Quartiles are resistant.

13
The five-number summary and boxplots
  • The five-number summary
  • Minimum, Q1, M, Q3, Maximum.
  • Boxplot is a graph of five number
  • summary.
  • Boxplots are most useful for side-by-side
    comparison of several distributions.

14
Boxplot
  • 1. A boxplot is a graph of the five-number
    summary
  • 2. A central box spans the quartiles
  • 3. A line in the box marks the median
  • 4. Lines extended from the box out to the minimum
    and maximum
  • 5. Range maximum - minimum

15
The five-number summary and boxplot
  • Figure 2.2(P.39) side-by-side boxplots comparing
    the distributions of earning for two levels of
    education.

16
The five-number summary and boxplots
17
Inference
  • Boxplot also gives an indication of the symmetry
    or skewness of a distribution.
  • -- In a symmetric distribution Q1 and Q3
  • are equally distant from the median,
  • but in case of right skewed one the
  • third quartile would be further above the
  • median than the first quartile bellow it.

18
Measuring spread the standard deviation
  • It says how far the observations are from their
    mean.
  • The variance s2 of a set of observations is an
    average of the squares of the deviations of the
    observations from their mean.
  • Notation s2 for variance and s for standard
    deviation

19
Why (n-1) ?
  • As the sum of the deviations
  • always equals 0, so the knowledge of (n-1) of
    them determines the last one.
  • --- Only (n-1) of the squared deviations are
    variable but not the last one, so we average by
    dividing the total by (n-1).
  • The number (n-1) is called the degrees of
    freedom of the variance or standard deviation

20
Measuring spread the standard deviation
  • To find the variance and the standard deviation
  • 1. Find the mean of the data set
  • 2. Subtract the mean from each number (we call
    that deviation)
  • 3. Square each result
  • 4. Sum all the square
  • 5. Divide the sum of square by n-1, where n is
    the number of all observations. Now you get
    variance
  • 6. Standard deviation is just the positive square
    root of the variance.

21
Measuring spread the standard deviation
  • Example 2.6 (P.42)

22
Properties of s2 and s
  • s measures spread about the mean and should be
    used only when the mean is chosen as the measure
    of center.
  • s 0 and s0 only when each of the observation
    values does not differ from each other.
  • S is not resistant.

23
Choosing measures of center and spread
  • With a skewed distribution or with a distribution
    with extreme outliers, five-number summary is
    better.
  • With a symmetric distribution (without outliers),
    mean and standard deviation are better.
Write a Comment
User Comments (0)
About PowerShow.com