Summarizing Data Numerically - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

Summarizing Data Numerically

Description:

Will every waffle take the same amount of time to cook? ... We cover the average in this section, variability in the next. How to Describe Data ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 54
Provided by: Michael1748
Category:

less

Transcript and Presenter's Notes

Title: Summarizing Data Numerically


1
Chapter 5
  • Summarizing Data Numerically

2
Assignment 8
  • Read Chapter 5 pages 299-313
  • LDI 5.1-5.7
  • EX 5.1-5.9

3
Chapter 4 Quiz
  • Take practice quizzes (optional)
  • Take chapter 4 reading quiz (BB)
  • Review homework

4
Wendall Zurkowitz, slave to the waffle light.
5
Will every waffle take the same amount of time to
cook?Two things Wendall would like to know
What is the average amount of time to cook and
how much variability is there in the cooking
time. We cover the average in this section,
variability in the next.
6
How to Describe Data
  • What is the Shape?
  • What is the Center?
  • What is the Spread in the Data?
  • Are there any Outliers?

7
Measurement of Center
  • If we take a sample of n values and calculate
    what we have come to know as the average we have
    calculated the arithmetic mean of the data.
  • This measure of center is a statistic since it
    comes from a sample.

8
The Sample Mean
  • The sample mean is a statistic. The purpose for
    its existence is to estimate the parameter, the
    population mean.
  • The sample mean is denoted by

9
The Population Mean
  • The population mean is a parameter. The
    population mean is denoted by

10
Example
  • Lets find the sample mean of the AGE data.

11
Is the mean always the center?
  • Suppose that a sample of 100 is obtained from a
    population
  • Can the mean be larger than the maximum value or
    smaller than the minimum value?
  • Can the mean be the same as the max or min value?
  • Can the mean be the exact middle point of the
    distribution?
  • Can the mean not be equal to any of the data
    collected?

12
(No Transcript)
13
  • Lets Do It! 5.2 Combining Means
  • We have seven students. The mean score for three
    of these students is 54 and the mean score for
    the four other students is 76.
  • What is the mean score for all seven students?

14
(No Transcript)
15
The Median!
  • The median of a set n observations, ordered from
    smallest to largest, is a value such that at
    least half of the observations are less than or
    equal to that value and at least half of the
    observations are greater than or equal to that
    value.

16
Find the Median of the AGE data
  • Use your TI and the 1-varstat

17
  • Lets Do It! 5.3 Median Number of Children per
    Household
  • Find the median number of children in a
    household from this sample of 10 households, that
    is, find the median of
  • Observation Number 1 2 3 4 5
    6 7 8 9 10
  • Number of Children 2, 3, 0, 1, 4, 0,
    3, 0, 1, 2
  • (a) Order the observations from smallest to
    largest
  • (b) Calculate (n1)/2 _________________
  • (c) Median ______________
  • What happens to the median if the fifth
    observation in the first list was incorrectly
    recorded as 40 instead of 4?
  • (e) What happens to the median if the third
    observation in the first list was incorrectly
    recorded as -20 instead of 0?
  • Note The median is resistantthat is, it does
    not change, or changes very little, in response
    to extreme observations.

18
The Mode
  • To find the middle or measure of center of
    categorical (qualitative) data we are forced to
    use the Mode. It can also be used with numerical
    (quantitative) data, but it is not a good measure
    of center.
  • The mode of a set of data is the most frequently
    occurring value, the value with the highest
    frequency.

19
Example
  • Find the mode for the following data(a) 1, 2,
    3, 2, 2, 4, 5, 4, 1, 1, 4, 1, 1, 6, 7(b) 1, 4,
    3, 4, 2, 4, 5, 4, 1, 1, 4, 1, 1, 6

20
  • The mode can be computed for qualitative data.
  • The modal race category is white.

21
Consider the following data 2, 2, 2, 20, 34, 45,
210What are the mode, median, mean?
22
(No Transcript)
23
  • Lets Do It! 5.5 Attend Graduate School? When do
    undergraduates make the decision to continue
    their education and attend graduate school? An
    undergraduate attending a four-year college with
    a semester system (versus a quarter system) would
    have a total of eight semesters of classes
    (excluding any summer sessions). A sample of 18
    senior undergraduates who would be graduating and
    attending graduate school were asked the
    following question "In which semester 1, 2, 3,
    4, 5, 6, 7, or 8 did you decide you would
    continue your education and attend graduate
    school?" The responses are given below

(a) Construct a frequency plot of these
data. (b) Obtain the following sample statistics
for these data. Minimum ___________ Maximum
______________ Median _____________ Mean
_____________ (c) How do the two measures of
center, the median and the mean, compare? Select
one i. Median gt Mean ii. Median lt
Mean iii. Median Mean
24
(No Transcript)
25
Homework 9
  • Read pages 314-340
  • LDI 5.8 - 5.12 all, 5.14, 5.15, 5.17
  • EX 5.10 - 5.21

26
Measures of Variation
  • Now that we can measure the center of a
    distribution, we need to know something about the
    spread or variability of the data.
  • There are (as with the average) several popular
    ways of doing this measurement.

27
Why Measure Variation?
  • Consider the following plots
  • They both have mean of 60, but are they the same
    distribution?

28
The Range
  • Our first crude estimate of the variation of a
    data set is the range which is simply max min.
  • Again, this measure is very limited in its
    ability to describe the spread in a data set.

29
Example
  • Consider these distributions
  • They have the same range of 30 20 or 10, yet
    they have very different variation.

30
Quartiles
  • Recall that the median is the middle number of a
    distribution. This means that 50 of the data
    will fall below this value. We can chop the data
    into four equal pieces by finding the median of
    the lower 50 and the upper 50. These values are
    called the Quartiles.

31
Find the Quartiles for AGE
  • Q1 is the first quartile, 25 of the data fall
    below this value and 75 above it. It is the
    median of the data that fall below the median
  • MED is the second quartile, 50 of the data fall
    below this value and 50 above it.
  • Q3 is the third quartile, 75 of the data fall
    below this value and 25 fall above it. It is the
    median of the data that fall above the median

32
5-Number Summary and Boxplots
  • The 5-number summary is simplyMinQ1MedQ3Max
  • A Boxplot is a plot of these points. Draw a
    boxplot of the AGE data (page 283)

33
InterQuartile Range
  • The InterQuartile Range or IQR is simply the
    difference between Q3 and Q1
  • IQR Q3Q1
  • Find the IQR for the AGE data.

34
Lets Do It
  • LDI 5.8

35
1.5xIQR Rule
  • Any value of the data that falls 1.5xIQR above Q3
    or 1.5xIQR below Q1 is a considered an outlier.
  • Do modified boxplot of AGE data by hand
  • Do boxplots on TI-83

36
Lets Do It
  • LDI 5.9
  • LDI 5.10
  • LDI 4.23 Use data to make side-by-side
    comparative boxplots.

37
I could have sworn you said eleven
steps.
38
Homework 10
  • Finish reading
  • LDI 5.15, 5.17
  • EX 5.22, 5.27, 5.29, 5.35, 5.39, 5.41, 5.43,
    5.50, 5.55

39
Standard Deviation
  • We want a way to measure spread based upon the
    mean. To do this we will find the average
    distance from the mean of our data. Well,
    actually we find the sum of the squared
    deviations and then divide by n 1 and then take
    the square root.

40
Sample Standard Deviation Formula
  • The TI-83 calculates sample standard deviation of
    data.

41
Population Standard Deviation
  • The TI-83 calculates the population standard
    deviation of data.

42
Find the Stan. Dev.
  • Lets do this small data set by hand1, 4, 2, 3,
    9, 7, 2, 4, 5, 1, 8, 8, 7
  • Lets verify our result on the TI

43
Interpretation of SD
  • The standard deviation is roughly the average
    distance of the observations from the mean. The
    more spread out the data are from the mean the
    larger the standard deviation will be.
  • Since the standard deviation is a distance, it is
    always a positive number that carries the same
    units as the mean.

44
Same Means (x 4) Different Standard Deviations
s 0
s 3.0
s 0.8
s 1.0
Frequency
Standard Deviation Increases as Data Gets More
Spread
45
Which Distribution has a larger standard
deviation?
46
  • Lets Do It! 5.15 Standard Deviation for Age Use
    the ages of the subjects from your class.
  • (a) Find the standard deviation for these data.
  • (b) Complete the sentence
  • On average, the ages of these subjects are about
    _______ years from their mean of ____ years.
  • (c) How many of the 20 subjects had ages within
    one standard deviation of the mean
  • (d) How many of the 20 subjects had ages within
    two standard deviations of the mean?

47
Linear Transformations
  • Linear transformations of data can be used to
    change the units of data. For example, you
    collect a set of temperature data in Celsius
  • 40, 41, 39, 41, 41, 40, 38
  • Find the mean and standard deviation for this
    data.

48
What about Fahrenheit?
  • Recall how to convert from Celsius to
    Fahrenheitconvert our data using this
    formula then find the new mean and standard
    deviation.

49
Linear Transformation Rules
  • If X represents the original values, x is the
    average of the original values, and sx is the
    standard deviation of the original values, and if
    the new values are a linear transformation of X,
    YaXb, then the new mean is given by
    and the new standard deviation by

50
Lets Do It
  • LDI 5.17

51
Important Transformation
  • We want to be able to standardize are data to the
    same scale so we can compare data that might be
    in differing units. For example, compare SAT and
    ACT scores or IQ scores from differing age groups.

52
The Z score
53
Examples
  • Standardize the AGE data
  • What are the mean and standard deviation for
    these transformed data?
  • Will this always happen? Why?
Write a Comment
User Comments (0)
About PowerShow.com