Describing Distributions with Numbers - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Describing Distributions with Numbers

Description:

Median = 5 (avg. of 4 and 6) Example 3 data: 6 2 4. Median 2 ... IQR= Q3-Q1 = 13.5-6 = 7.5. M. Q1 Q3. Min Max. The five-number summary & boxplots. 5# summary: ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 49
Provided by: stevegi1
Category:

less

Transcript and Presenter's Notes

Title: Describing Distributions with Numbers


1
Chapter 12
  • Describing Distributions with Numbers

2
Today Math Cookies
  • Pick up one cookie a handout
  • DO NOT EAT IT (yet). You may eat it later once we
    have collected cookie data.

No clickerstoday
3
Counting rule
  • Anything brown counts for a chip.
  • Carefully count front and back surfaces.

4
Review
  • Categorical variables
  • pie bar graphs
  • Quantitative variables
  • stemplots histograms
  • Good bad graphs
  • Shape
  • symmetric, skewed

5
Quick math overview
  • ? sum
  • These expressions are algebraically equivalent

6
Examples
7
Turning Data Into Information
  • Center of the data
  • mean
  • median
  • mode
  • Spread of the data (variability)
  • variance
  • standard deviation
  • range
  • interquartile range

8
Centers of Data
  • Average - a single data value that represents all
    of the data
  • mean (arithmetic average)
  • median
  • mode

9
Mean ( )
  • Traditional measure of center
  • Sum the values and divide by the number of values

10
Sample mean
  • Grades 68, 79, 60, 72, 77, 76, 69, 70, 60, 79.

11
Median (M)
  • A resistant measure of the datas center
  • Median - the center of value of ordered (ranked)
    data
  • If n is odd, the median is the middle ordered
    value
  • If n is even, the median is the average of the
    two middle ordered values
  • Median 1/2(n1)th position in ordered set

12
Median
  • Example 1 data 2 4 6
  • Median (M) 4
  • Example 2 data 2 4 6 8
  • Median 5 (avg. of 4
    and 6)
  • Example 3 data 6 2 4
  • Median ? 2
  • (order the values 2 4 6 , so Median 4)

13
Sample median
  • Grades 68, 79, 60, 72, 77, 76, 69, 70, 60, 79.
  • rank data
  • 60, 60, 68, 69, 70, 72, 76, 77, 79, 79
  • Find position
  • (1/2)(n1) 11/2 51/2th position
  • Locate median
  • M (7072)/2 71

14
  • Example
  • minutes waiting for the PRT (n8)
  • x 5, 11, 9, 15, 33, 3, 7, 12

Median RANK DATA FIRST! 3, 5, 7, 9, 11, 12, 15,
33
Median is 1/2(n1)th position (81)/2
41/2 41/2 th position is half-way between 9 and
11. (911)/2 10 Median10
15
Comparing the Mean Median
  • The mean and median of data from a symmetric
    distribution should be close together. The
    actual (true) mean and median of a symmetric
    distribution are exactly the same.
  • In a skewed distribution, the mean is farther out
    in the long tail than is the median the mean is
    pulled in the direction of the possible
    outlier(s).

16
Mean vs. Median
  • Which should we use?
  • Symmetric or approx symmetric use mean
  • Significantly skewed used median
  • affected by outliers (extreme values)

17
(No Transcript)
18
Outliers?
  • If it is a mistake and is documented, we can
    eliminate it
  • If it is not a mistake, do not eliminate it
  • A statistic is robust if it is not led too far
    astray by a few outliers. Means (and standard
    deviations) are not robust.

19
Mode
  • Observed value that occurs with the greatest
    frequency
  • Note if no mode, write none not 0
  • If two modes bimodal

20
Sample mode
  • Grades 60, 60, 68, 69, 70, 72, 76, 77, 79, 79
  • There are two modes, so this data is bimodal

21
Measures of Dispersion
  • spread - A general term referring to how spread
    out or variable a set of numbers is.
  • Very large spread
  • 0, 100, 9999, 100000
  • No spread
  • 12, 12, 12, 12, 12

22
Spread or Variability
  • If all values are the same, then they all equal
    the mean. There is no spread.
  • Variability exists when some values are different
    from (above or below) the mean.
  • We will discuss the following measures of spread
    range, interquartile range, variance, standard
    deviation.

23
Range
  • One way to measure spread is to give the smallest
    (minimum) and largest (maximum) values in the
    data set
  • Range max ? min
  • ( the values range from min to max )
  • The range is strongly affected by outliers, and
    is rarely used

24
sample range
  • highest data value - lowest data value
  • Grades 60, 60, 68, 69, 70, 72, 76, 77, 79, 79
  • Range max-min 79-6019

25
Quartiles
  • Three numbers that divide the ordered data into
    four equal-sized groups.
  • Q1 has 25 of the data below it.
  • Q2 has 50 of the data below it. (Median)
  • Q3 has 75 of the data below it.

26
QuartilesUniform Histogram
27
Obtaining the Quartiles
  • Order the data.
  • For Q2, just find the median.
  • For Q1, look at the lower half of the data
    values, those to the left of the median find
    the median of this lower half.
  • For Q3, look at the upper half of the data
    values, those to the right of the median find
    the median of this upper half.

28
Interquartile Range (IQR)
  • Used to measure dispersion (spread) with the
    median
  • Sample IQR Q3-Q1

29
  • minutes waiting for the PRT (n8)
  • 3, 5, 7, 9, 11, 12, 15, 33
  • Recall Median is half-way between 9 and 11
  • M10
  • Q1 position is half-way between 5 and 7
  • Q1 6
  • Q3 is half-way between 12 and 15
  • Q3 131/2
  • IQR Q3-Q1 13.5-6 7.5

30
The five-number summary boxplots
M Q1 Q3 Min
Max
  • 5 summary
  • Min
  • Q1
  • M
  • Q3
  • Max

31
Boxplot(from Five-Number Summary)
  • Central box spans Q1 and Q3.
  • A line in the box marks the median M.
  • Lines extend from the box out to the minimum and
    maximum.

32
106 13.53
33
PRT example 5 summary and boxplot
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29 30 31 32 33
33
OUTLIER BOX PLOTS Whats the fastest you have
ever driven a car? ____ mph.
Males (87 Students)
110 95 120 55 150
Females (102 Students)
89 80 95 30 130
  • Outliers greater than 1.5(IQR) below Q1 or above
    Q3

34
Standard deviation?
The length of human pregnancies has a mean, or
average, of 266 days for the entire population of
women. It also has a standard deviation of 16
days. What do you think is meant by the term
standard deviation?
35
Variance and Standard Deviation
  • When variability exists, each data value has an
    associated deviation from the mean
  • What is a typical deviation from the mean?
    (standard deviation)
  • Small values of this typical deviation indicate
    small spread in the data
  • Large values of this typical deviation indicate
    large spread in the data

36
Variance
  • Find the mean
  • Find the deviation of each value from the mean
  • Square the deviations
  • Sum the squared deviations
  • Divide the sum by n-1
  • (gives typical squared deviation from mean)

37
Variance Formula
38
Standard Deviation Formulatypical deviation from
the mean
standard deviation square root of the
variance
39
  • Let's say I have two classes, class A and class
    B, and I want to know how my students are doing.
    I take a random sample of 10 grades from each
    class. Suppose the average in both classes turns
    out to be 71. We might infer that class A and
    class B are very similar, right?

40
  • class A 68, 79, 60, 72, 77, 76, 69, 70, 60, 79.
  • class B 99, 99, 98, 96, 97, 97, 30, 35, 20, 39.
  • Something is going terribly wrong in class B!
    Some students are doing exceptionally well and
    some are failing. Using only the sample mean, I
    would think the classes are performing about the
    same.

41
Calculating s
  • class A 68, 79, 60, 72, 77, 76, 69, 70, 60, 79.

42
(No Transcript)
43
(No Transcript)
44
sample SD for class A
Sample std dev is approx 7
45
  • class B 99, 99, 98, 96, 97, 97, 30, 35, 20, 39
  • s 35
  • (5 times as much as class A!)

46
Choosing a Summary
  • Outliers affect the values of the mean and
    standard deviation.
  • The five-number summary should be used to
    describe center and spread for skewed
    distributions, or when outliers are present.
  • Use the mean and standard deviation for
    reasonably symmetric distributions that are free
    of outliers.

47
Distn of calories in popular candy bars
48
Todays concepts
  • Numerical Summaries
  • Center (mean, median)
  • Spread (variance, std. dev., range, IQR)
  • Five-number summary Boxplots
  • Choosing mean versus median
  • Choosing standard deviation versus five-number
    summary
Write a Comment
User Comments (0)
About PowerShow.com