Title: Chapter 2 Describing, Exploring, and Comparing Data Sections 2'52'7
1Chapter 2--Describing, Exploring, and Comparing
Data(Sections 2.5-2.7)
2Section 2.5Measures of Variation
3Range Definition
- The range of a set of data is the difference
between the highest value and the lowest value - Highest Value Lowest Value
- Not very useful as a variation
4Standard Deviation Definition
- A measure of variation of all values from the
mean - Value is usually positive
- Zero if all data is the same
- Can increase dramatically with the inclusion of
outliners - Units are the same as the units of the original
data values
5Sample Standard Deviation Formula
6Population Standard Deviation Formula
This formula is similar to Formula 2-4, but
instead the population mean and population size
are used
7Sample Standard Deviation Example
8Definition Variance
- The variance of a set of values is a measure of
variation equal to the square of the standard
deviation. - Sample variance Square of the sample standard
deviation (s) - Population variance Square of the population
standard deviation (?) - Units are squared
- Will use in Chapter 8 and 11
9Variance Standard Deviation Squared
10Round Off Rule
- Carry one more decimal place than is present in
the original set of data. - Round only the final answer, not values in the
middle of a calculation.
11Comparing VariationDifferent Populations
- Method of comparing standard deviations from
different data type and units - Called Coefficient of variation (CV)
12Example of Comparing
13Interpreting and Understanding StDev
- Interpretation more important than finding the
number - Small deviation values closer together
- Large deviation values farther apart
- Need develop a sense of what the values mean
- Usual?
- Unusual?
14Example of Interpretation
- Nail Machine A
- Mean length 2 inches
- S 0.25 inch
- Nail Machine B
- Mean length 2 inches
- S 0.001 inch
15Why Standard Deviation
- Mean is 66 for each class
- Range
- Class 1 82-4240
- Class 2 82-4240
- Class 3 67-652
- Standard Deviation
- Class 1 16.2
- Class 2 21.9
- Class 3 0.7
- Source http//mathcentral.uregina.ca/rr/
database/RR.09.95/weston2.html
16Use (n-1) in Standard Deviation Formula?
- We are comparing one variable to another
- 10 samples, we compare one to the nine
- Using n-1 gives us larger deviation
- Using n underestimates the deviation
17Range Rule of Thumb
For estimating a value of the standard deviation
(s) Where range (highest value) (lowest
value)
18Empirical Rule Defined
For data sets having a distribution that is
approximately bell shaped, the following
properties apply
- About 68 of all values fall within 1 standard
deviation of the mean
- About 95 of all values fall within 2 standard
deviations of the mean
- About 99.7 of all values fall within 3 standard
deviations of the mean
19Empirical Rule
20Section 2.6Measures of Relative Standing
21Z-score Definition
- The number of standard deviations that a given
value x is above or below the mean. - Also known as standard score
Sample
Population
Round to 2 decimal places
22Interpreting Z-scores
Whenever a value is less than the mean, its
corresponding z score is negative Ordinary
values z score between 2 and 2 sd Unusual
Values z score lt -2 or z score gt 2 sd
23Quartiles Definition
Q1 (First Quartile) separates the bottom 25 of
sorted values from the top 75.
Q2 (Second Quartile) same as the median
separates the bottom 50 of sorted values from
the top 50.
Q3 (Third Quartile) separates the bottom 75 of
sorted values from the top 25.
24Quartiles
Q1, Q2, Q3 divides ranked scores into four
equal parts
25Percentiles
Just as there are quartiles separating data into
four parts, there are 99 percentiles denoted P1,
P2, . . . P99, which partition the data into 100
groups.
26Finding the Percentile of a Given Score
27Converting from the kth Percentile to the
Corresponding Data Value
n total number of values in the data set k
percentile being used L locator that gives the
position of a value Pk kth percentile
28Other Statistics
Interquartile Range (or IQR) Q3 - Q1
10 - 90 Percentile Range P90 - P10
29Section 2.7Exploratory Data Analysis (EDA)
30Preliminary Definitions
- Exploratory Data AnalysisThe process of using
statistical tools (such as graphs, measures of
center, and measures of variation) to investigate
data sets in order to understand their important
characteristics. - OutlinersA value that is located very far away
from almost all the other values.
31Outliner Issues
- An outlier can have a dramatic effect on the mean
- An outlier have a dramatic effect on the
standard deviation - An outlier can have a dramatic effect on the
scale of the histogram so that the true nature
of the distribution is totally obscured
32Boxplots
- For a set of data, the 5-number summary consists
of the minimum value the first quartile Q1 the
median (or second quartile Q2) the third
quartile, Q3 and the maximum value - A boxplot ( or box-and-whisker-diagram) is a
graph of a data set that consists of a line
extending from the minimum value to the maximum
value, and a box with lines drawn at the first
quartile, Q1 the median and the third quartile,
Q3
33Constructing Boxplots
- Find the 5-number summary
- Minimum value 0
- Q1 86.5
- Median (Q2) 170
- Q3 251.5
- Maximum value 491
- Construct scale with max and min at ends (round
up and down to whole number) - Construct box extending from Q1 and Q3
- Draw line at median
- Draw line extending from min to max
34Boxplot Example
35Another Example
Variable Mean StDev Minimum Q1
Median Q3 Maximum Class 1 66.00 16.25
42.00 50.00 70.00 80.00 82.00 Class 2
66.00 21.91 42.00 42.00 82.00 82.00
82.00 Class 3 66.000 0.707 65.00 65.50
66.00 66.50 67.00
36Class 1
82 78 70 58 42
37Class 2
82 82 82 42 42
38Class 3
67 66 66 66 65