Chapter 2 Describing, Exploring, and Comparing Data Sections 2'52'7

1 / 38
About This Presentation
Title:

Chapter 2 Describing, Exploring, and Comparing Data Sections 2'52'7

Description:

The range of a set of data is the difference between the highest value and ... A boxplot ( or box-and-whisker-diagram) is a graph of a data set that consists ... –

Number of Views:137
Avg rating:3.0/5.0
Slides: 39
Provided by: davenportu
Category:

less

Transcript and Presenter's Notes

Title: Chapter 2 Describing, Exploring, and Comparing Data Sections 2'52'7


1
Chapter 2--Describing, Exploring, and Comparing
Data(Sections 2.5-2.7)
  • MATH320

2
Section 2.5Measures of Variation
3
Range Definition
  • The range of a set of data is the difference
    between the highest value and the lowest value
  • Highest Value Lowest Value
  • Not very useful as a variation

4
Standard Deviation Definition
  • A measure of variation of all values from the
    mean
  • Value is usually positive
  • Zero if all data is the same
  • Can increase dramatically with the inclusion of
    outliners
  • Units are the same as the units of the original
    data values

5
Sample Standard Deviation Formula
6
Population Standard Deviation Formula
This formula is similar to Formula 2-4, but
instead the population mean and population size
are used
7
Sample Standard Deviation Example
8
Definition Variance
  • The variance of a set of values is a measure of
    variation equal to the square of the standard
    deviation.
  • Sample variance Square of the sample standard
    deviation (s)
  • Population variance Square of the population
    standard deviation (?)
  • Units are squared
  • Will use in Chapter 8 and 11

9
Variance Standard Deviation Squared
10
Round Off Rule
  • Carry one more decimal place than is present in
    the original set of data.
  • Round only the final answer, not values in the
    middle of a calculation.

11
Comparing VariationDifferent Populations
  • Method of comparing standard deviations from
    different data type and units
  • Called Coefficient of variation (CV)

12
Example of Comparing
13
Interpreting and Understanding StDev
  • Interpretation more important than finding the
    number
  • Small deviation values closer together
  • Large deviation values farther apart
  • Need develop a sense of what the values mean
  • Usual?
  • Unusual?

14
Example of Interpretation
  • Nail Machine A
  • Mean length 2 inches
  • S 0.25 inch
  • Nail Machine B
  • Mean length 2 inches
  • S 0.001 inch

15
Why Standard Deviation
  • Mean is 66 for each class
  • Range
  • Class 1 82-4240
  • Class 2 82-4240
  • Class 3 67-652
  • Standard Deviation
  • Class 1 16.2
  • Class 2 21.9
  • Class 3 0.7
  • Source http//mathcentral.uregina.ca/rr/
    database/RR.09.95/weston2.html

16
Use (n-1) in Standard Deviation Formula?
  • We are comparing one variable to another
  • 10 samples, we compare one to the nine
  • Using n-1 gives us larger deviation
  • Using n underestimates the deviation

17
Range Rule of Thumb
For estimating a value of the standard deviation
(s) Where range (highest value) (lowest
value)
18
Empirical Rule Defined
For data sets having a distribution that is
approximately bell shaped, the following
properties apply
  • About 68 of all values fall within 1 standard
    deviation of the mean
  • About 95 of all values fall within 2 standard
    deviations of the mean
  • About 99.7 of all values fall within 3 standard
    deviations of the mean

19
Empirical Rule
20
Section 2.6Measures of Relative Standing
21
Z-score Definition
  • The number of standard deviations that a given
    value x is above or below the mean.
  • Also known as standard score

Sample
Population
Round to 2 decimal places
22
Interpreting Z-scores
Whenever a value is less than the mean, its
corresponding z score is negative Ordinary
values z score between 2 and 2 sd Unusual
Values z score lt -2 or z score gt 2 sd
23
Quartiles Definition
Q1 (First Quartile) separates the bottom 25 of
sorted values from the top 75.
Q2 (Second Quartile) same as the median
separates the bottom 50 of sorted values from
the top 50.
Q3 (Third Quartile) separates the bottom 75 of
sorted values from the top 25.
24
Quartiles
Q1, Q2, Q3 divides ranked scores into four
equal parts
25
Percentiles
Just as there are quartiles separating data into
four parts, there are 99 percentiles denoted P1,
P2, . . . P99, which partition the data into 100
groups.
26
Finding the Percentile of a Given Score
27
Converting from the kth Percentile to the
Corresponding Data Value
n total number of values in the data set k
percentile being used L locator that gives the
position of a value Pk kth percentile
28
Other Statistics
Interquartile Range (or IQR) Q3 - Q1
10 - 90 Percentile Range P90 - P10
29
Section 2.7Exploratory Data Analysis (EDA)
30
Preliminary Definitions
  • Exploratory Data AnalysisThe process of using
    statistical tools (such as graphs, measures of
    center, and measures of variation) to investigate
    data sets in order to understand their important
    characteristics.
  • OutlinersA value that is located very far away
    from almost all the other values.

31
Outliner Issues
  • An outlier can have a dramatic effect on the mean
  • An outlier have a dramatic effect on the
    standard deviation
  • An outlier can have a dramatic effect on the
    scale of the histogram so that the true nature
    of the distribution is totally obscured

32
Boxplots
  • For a set of data, the 5-number summary consists
    of the minimum value the first quartile Q1 the
    median (or second quartile Q2) the third
    quartile, Q3 and the maximum value
  • A boxplot ( or box-and-whisker-diagram) is a
    graph of a data set that consists of a line
    extending from the minimum value to the maximum
    value, and a box with lines drawn at the first
    quartile, Q1 the median and the third quartile,
    Q3

33
Constructing Boxplots
  • Find the 5-number summary
  • Minimum value 0
  • Q1 86.5
  • Median (Q2) 170
  • Q3 251.5
  • Maximum value 491
  • Construct scale with max and min at ends (round
    up and down to whole number)
  • Construct box extending from Q1 and Q3
  • Draw line at median
  • Draw line extending from min to max

34
Boxplot Example
35
Another Example
Variable Mean StDev Minimum Q1
Median Q3 Maximum Class 1 66.00 16.25
42.00 50.00 70.00 80.00 82.00 Class 2
66.00 21.91 42.00 42.00 82.00 82.00
82.00 Class 3 66.000 0.707 65.00 65.50
66.00 66.50 67.00
36
Class 1
82 78 70 58 42
37
Class 2
82 82 82 42 42
38
Class 3
67 66 66 66 65
Write a Comment
User Comments (0)
About PowerShow.com