Title: Summarizing Data Numerically
1Chapter 5
- Summarizing Data Numerically
2Assignment 8
- Read Chapter 5 pages 299-313
- LDI 5.1-5.7
- EX 5.1-5.9
3Chapter 4 Quiz
- Take practice quizzes (optional)
- Take chapter 4 reading quiz (BB)
- Review homework
4Wendall Zurkowitz, slave to the waffle light.
5Will every waffle take the same amount of time to
cook?Two things Wendall would like to know
What is the average amount of time to cook and
how much variability is there in the cooking
time. We cover the average in this section,
variability in the next.
6How to Describe Data
- What is the Shape?
- What is the Center?
- What is the Spread in the Data?
- Are there any Outliers?
7Measurement of Center
- If we take a sample of n values and calculate
what we have come to know as the average we have
calculated the arithmetic mean of the data. - This measure of center is a statistic since it
comes from a sample.
8The Sample Mean
- The sample mean is a statistic. The purpose for
its existence is to estimate the parameter, the
population mean. - The sample mean is denoted by
9The Population Mean
- The population mean is a parameter. The
population mean is denoted by
10Example
- Lets find the sample mean of the AGE data.
11Is the mean always the center?
- Suppose that a sample of 100 is obtained from a
population - Can the mean be larger than the maximum value or
smaller than the minimum value? - Can the mean be the same as the max or min value?
- Can the mean be the exact middle point of the
distribution? - Can the mean not be equal to any of the data
collected?
12(No Transcript)
13- Lets Do It! 5.2 Combining Means
- We have seven students. The mean score for three
of these students is 54 and the mean score for
the four other students is 76. - What is the mean score for all seven students?
14(No Transcript)
15The Median!
- The median of a set n observations, ordered from
smallest to largest, is a value such that at
least half of the observations are less than or
equal to that value and at least half of the
observations are greater than or equal to that
value.
16Find the Median of the AGE data
- Use your TI and the 1-varstat
17- Lets Do It! 5.3 Median Number of Children per
Household - Find the median number of children in a
household from this sample of 10 households, that
is, find the median of - Observation Number 1 2 3 4 5
6 7 8 9 10 - Number of Children 2, 3, 0, 1, 4, 0,
3, 0, 1, 2 - (a) Order the observations from smallest to
largest - (b) Calculate (n1)/2 _________________
- (c) Median ______________
- What happens to the median if the fifth
observation in the first list was incorrectly
recorded as 40 instead of 4? - (e) What happens to the median if the third
observation in the first list was incorrectly
recorded as -20 instead of 0? - Note The median is resistantthat is, it does
not change, or changes very little, in response
to extreme observations.
18The Mode
- To find the middle or measure of center of
categorical (qualitative) data we are forced to
use the Mode. It can also be used with numerical
(quantitative) data, but it is not a good measure
of center. - The mode of a set of data is the most frequently
occurring value, the value with the highest
frequency.
19Example
- Find the mode for the following data(a) 1, 2,
3, 2, 2, 4, 5, 4, 1, 1, 4, 1, 1, 6, 7(b) 1, 4,
3, 4, 2, 4, 5, 4, 1, 1, 4, 1, 1, 6
20- The mode can be computed for qualitative data.
- The modal race category is white.
21Consider the following data 2, 2, 2, 20, 34, 45,
210What are the mode, median, mean?
22(No Transcript)
23- Lets Do It! 5.5 Attend Graduate School? When do
undergraduates make the decision to continue
their education and attend graduate school? An
undergraduate attending a four-year college with
a semester system (versus a quarter system) would
have a total of eight semesters of classes
(excluding any summer sessions). A sample of 18
senior undergraduates who would be graduating and
attending graduate school were asked the
following question "In which semester 1, 2, 3,
4, 5, 6, 7, or 8 did you decide you would
continue your education and attend graduate
school?" The responses are given below
(a) Construct a frequency plot of these
data. (b) Obtain the following sample statistics
for these data. Minimum ___________ Maximum
______________ Median _____________ Mean
_____________ (c) How do the two measures of
center, the median and the mean, compare? Select
one i. Median gt Mean ii. Median lt
Mean iii. Median Mean
24(No Transcript)
25Homework 9
- Read pages 314-340
- LDI 5.8 - 5.12 all, 5.14, 5.15, 5.17
- EX 5.10 - 5.21
26Measures of Variation
- Now that we can measure the center of a
distribution, we need to know something about the
spread or variability of the data. - There are (as with the average) several popular
ways of doing this measurement.
27Why Measure Variation?
- Consider the following plots
- They both have mean of 60, but are they the same
distribution?
28The Range
- Our first crude estimate of the variation of a
data set is the range which is simply max min. - Again, this measure is very limited in its
ability to describe the spread in a data set.
29Example
- Consider these distributions
- They have the same range of 30 20 or 10, yet
they have very different variation.
30Quartiles
- Recall that the median is the middle number of a
distribution. This means that 50 of the data
will fall below this value. We can chop the data
into four equal pieces by finding the median of
the lower 50 and the upper 50. These values are
called the Quartiles.
31Find the Quartiles for AGE
- Q1 is the first quartile, 25 of the data fall
below this value and 75 above it. It is the
median of the data that fall below the median - MED is the second quartile, 50 of the data fall
below this value and 50 above it. - Q3 is the third quartile, 75 of the data fall
below this value and 25 fall above it. It is the
median of the data that fall above the median
325-Number Summary and Boxplots
- The 5-number summary is simplyMinQ1MedQ3Max
- A Boxplot is a plot of these points. Draw a
boxplot of the AGE data (page 283)
33InterQuartile Range
- The InterQuartile Range or IQR is simply the
difference between Q3 and Q1 - IQR Q3Q1
- Find the IQR for the AGE data.
34Lets Do It
351.5xIQR Rule
- Any value of the data that falls 1.5xIQR above Q3
or 1.5xIQR below Q1 is a considered an outlier. - Do modified boxplot of AGE data by hand
- Do boxplots on TI-83
36Lets Do It
- LDI 5.9
- LDI 5.10
- LDI 4.23 Use data to make side-by-side
comparative boxplots.
37I could have sworn you said eleven
steps.
38Homework 10
- Finish reading
- LDI 5.15, 5.17
- EX 5.22, 5.27, 5.29, 5.35, 5.39, 5.41, 5.43,
5.50, 5.55
39Standard Deviation
- We want a way to measure spread based upon the
mean. To do this we will find the average
distance from the mean of our data. Well,
actually we find the sum of the squared
deviations and then divide by n 1 and then take
the square root.
40Sample Standard Deviation Formula
- The TI-83 calculates sample standard deviation of
data.
41Population Standard Deviation
- The TI-83 calculates the population standard
deviation of data.
42Find the Stan. Dev.
- Lets do this small data set by hand1, 4, 2, 3,
9, 7, 2, 4, 5, 1, 8, 8, 7 - Lets verify our result on the TI
43Interpretation of SD
- The standard deviation is roughly the average
distance of the observations from the mean. The
more spread out the data are from the mean the
larger the standard deviation will be. - Since the standard deviation is a distance, it is
always a positive number that carries the same
units as the mean.
44Same Means (x 4) Different Standard Deviations
s 0
s 3.0
s 0.8
s 1.0
Frequency
Standard Deviation Increases as Data Gets More
Spread
45Which Distribution has a larger standard
deviation?
46- Lets Do It! 5.15 Standard Deviation for Age Use
the ages of the subjects from your class. - (a) Find the standard deviation for these data.
- (b) Complete the sentence
- On average, the ages of these subjects are about
_______ years from their mean of ____ years. - (c) How many of the 20 subjects had ages within
one standard deviation of the mean - (d) How many of the 20 subjects had ages within
two standard deviations of the mean?
47Linear Transformations
- Linear transformations of data can be used to
change the units of data. For example, you
collect a set of temperature data in Celsius - 40, 41, 39, 41, 41, 40, 38
- Find the mean and standard deviation for this
data.
48What about Fahrenheit?
- Recall how to convert from Celsius to
Fahrenheitconvert our data using this
formula then find the new mean and standard
deviation.
49Linear Transformation Rules
- If X represents the original values, x is the
average of the original values, and sx is the
standard deviation of the original values, and if
the new values are a linear transformation of X,
YaXb, then the new mean is given by
and the new standard deviation by
50Lets Do It
51Important Transformation
- We want to be able to standardize are data to the
same scale so we can compare data that might be
in differing units. For example, compare SAT and
ACT scores or IQ scores from differing age groups.
52The Z score
53Examples
- Standardize the AGE data
- What are the mean and standard deviation for
these transformed data? - Will this always happen? Why?