Methods for Describing Sets of Data - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Methods for Describing Sets of Data

Description:

Empirical Rule (mound-shaped and symmetric. Chebyshev's. Rule(any data set) 35 ... the empirical rule for mounded distributions. 40. Methods for Detecting ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 44
Provided by: anna227
Category:

less

Transcript and Presenter's Notes

Title: Methods for Describing Sets of Data


1
Chapter 2
  • Methods for Describing Sets of Data

2
Memorial University of Newfoundland
Describing Data
Data
Qualitative Data
Quantitative Data
Many
Graphical Methods
Graphical Methods
Numerical Methods
Numerical Methods
Histogram
Summary Table
Dot Plot
Stem- -Leaf
Bar Graph
Pie Chart
3
Describing Qualitative Data
  • Qualitative data are nonnumeric in nature
  • Best described by using Classes
  • 2 descriptive measures
  • class frequency number of data points in a
    class
  • class relative class frequency
  • frequency n
  • class percentage class relative frequency x 100

4
Describing Qualitative Data Displaying
Descriptive Measures
  • Summary Table(22 adult, three types of aphasiacs)

Class Frequency
Class percentage class relative frequency x 100
5
Describing Qualitative Data Qualitative Data
Displays
  • Bar Graph

6
Describing Qualitative Data Qualitative Data
Displays
  • Pie chart

7
Memorial University of Newfoundland
Dot Plot
1. Condenses data by grouping the same values
together 2. Numerical value is located by a dot
on horizontal axis 3. Data 21, 24, 24, 26,
27, 27, 30, 32, 38, 41
20
25
30
35
40
45
8
Memorial University of Newfoundland
Stem-and-Leaf Display
  • Divide each
  • observation into stem
  • value and leaf value
  • Stem value defines class
  • Leaf value defines frequency (count)

2
144677
3
028
4
1
2. Data 21, 24, 24, 26, 27, 27, 30, 32, 38, 41
9
Memorial University of Newfoundland
Stem-and-Leaf Display
Raw Data 21, 24, 24, 26, 27, 27, 30, 32, 38,
41 Using MINITAB, we get Stem-and-Leaf Display
C1 Stem-and-leaf of C1 N10 Leaf Unit 1.0
3 2 144
(3) 2 677
4 3 02
2 3 8
1 4 1
10
Memorial University of Newfoundland
Stem-and-Leaf Display
Raw Data 2.1, 2.4, 2.4, 2.6, 2.7, 2.7, 3.0, 3.2,
3.8, 4.1 Using MINITAB, we get Stem-and-Leaf
Display C2 Stem-and-leaf of C1 N10 Leaf Unit
0.1
3 2 144
(3) 2 677
4 3 02
2 3 8
1 4 1
11
Memorial University of Newfoundland
Stem-and-Leaf Display
Raw Data 210, 240, 240, 260, 270, 270, 300, 320,
380, 410 Using MINITAB, we get Stem-and-Leaf
Display C3 Stem-and-leaf of C1 N10 Leaf Unit
10
3 2 144
(3) 2 677
4 3 02
2 3 8
1 4 1
12
Memorial University of Newfoundland
Histogram
  • Condenses data by grouping similar values into
    classes in a graph
  • May show frequencies (counts) or relative
    frequencies (proportions)
  • Must first develop a frequency distribution table

13
Memorial University of Newfoundland
Frequency Distribution Table Example
Raw Data 24, 26, 24, 21, 27, 27, 30, 41, 32, 38
Class
Midpoint
Frequency
15 but lt 25
20
3
Width
25 but lt 35
30
5
35 but lt 45
40
2
(Upper Lower Boundaries) / 2
Boundaries
14
Memorial University of Newfoundland
Relative Frequency Distribution Tables
Percentage Distribution
Relative Frequency Distribution
Class
Prop.
Class

15 but lt 25
.3
15 but lt 25
30.0
25 but lt 35
.5
25 but lt 35
50.0
35 but lt 45
.2
35 but lt 45
20.0
15
Memorial University of Newfoundland
Histogram
Class
Freq.
15 but lt 25
3
Count
25 but lt 35
5
5
35 but lt 45
2
4
Frequency Relative Frequency Percent
3
Bars Touch
2
1
0
0 15 25 35 45 55
Lower Boundary
16
Summation Notation
  • Used to simplify summation instructions
  • Each observation in a data set is identified by a
    subscript
  • x1, x2, x3, x4, x5, . xn
  • Notation used to sum the above numbers together
    is

17
Summation Notation
  • Data set of 1, 2, 3, 4
  • Are these the same? and

18
Memorial University of Newfoundland
Numerical Methods for Quantitative Data
Numerical Data
Properties
Central
Variation
Shape
Tendency
Skew
Mean
Range
Median
Variance
Mode
Standard Deviation
19
Numerical Measures of Central Tendency
  • Central Tendency tendency of data to center
    about certain numerical values
  • 3 commonly used measures of Central Tendency
  • Mean
  • Median
  • Mode

20
Numerical Measures of Central Tendency
  • The Mean
  • Arithmetic average of the elements of the data
    set
  • Sample mean denoted by
  • Population mean denoted by
  • Calculated as
  • and

21
Numerical Measures of Central Tendency
  • The Median
  • Middle number when observations are arranged in
    order
  • Median denoted by m
  • Identified as the observation if n is odd, and
    the mean of the and observations if n is even

22
Numerical Measures of Central Tendency
  • The Mode
  • The most frequently occurring value in the data
    set
  • Data set can be multi-modal have more than one
    mode
  • Data displayed in a histogram will have a modal
    class the class with the largest frequency

23
Memorial University of Newfoundland
Mode Example
No ModeRaw Data 10.3 4.9 8.9 11.7 6.3 7.7 One
ModeRaw Data 6.3 4.9 8.9 6.3 4.9 4.9 More
Than 1 ModeRaw Data 21 28 28 41 43 43
24
Numerical Measures of Central Tendency
  • The Data set 1 5 6 8 3 9 11 8 12
  • The median is 3 ???
  • The ordered data 1 3 5 6 8 8 9 11 12
  • Mean
  • Median is the or 5th observation, 8
  • Mode is 8

25
Shape
1. Describes how data are distributed 2.
Measured by skew (symmetry)
Symmetric
Left-Skewed
Right-Skewed
Relative frequency
Mean

Median

Mode
Mean


Median


Mode
Mode

Median

Mean
26
Numerical Measures of Variability
  • Variability the spread of the data across
    possible values
  • 3 commonly used measures of Variability
  • Range
  • Variance
  • Standard Deviation

27
Numerical Measures of Variability
  • The Range
  • Largest measurement minus the smallest
    measurement
  • Loses sensitivity when data sets are large
  • These 2 distributionshave the same range.
  • How much does therange tell you about the data
    variability?

28
Numerical Measures of Variability
  • The Sample Variance (s2)
  • The sum of the squared deviations from the mean
    divided by (n-1). Expressed as units squared
  • Why square the deviations? The sum of the
    deviations from the mean is zero

29
Equivalent Formula
30
Another Equivalent Formula
31
Variance Example
  • Raw Data 10.3 4.9 8.9 11.7 6.3 7.7

32
Numerical Measures of Variability
  • The Sample Standard Deviation (s)
  • The positive square root of the sample variance
  • Expressed in the original units of measurement
  • s ? range/4, we stress that this is no substitute
    for calculating the exact value of s when
    possible

33
Memorial University of Newfoundland
Standard Notation
Measure
Sample
Population
Mean
?
x
?
Stand. Dev.
s
?
2
2
Variance

s
?
Size
n
N
34
Interpreting the Standard Deviation
  • How many observations fit within n s of the
    mean?

35
Interpreting the Standard Deviation
  • You have purchased compact fluorescent light
    bulbs for your home. Average life length is 500
    hours, standard deviation is 24, and frequency
    distribution for the life length is mound shaped.
    One of your bulbs burns out at 450 hours. Would
    you send the bulb back for a refund?

36
Numerical Measures of Relative Standing
  • Descriptive measures of relationship of a
    measurement to the rest of the data
  • Common measures
  • percentile ranking or percentile score
  • z-score

37
Numerical Measures of Relative Standing
  • Percentile rankings make use of the pth
    percentile
  • The median is an example of percentiles.
  • Median is the 50th percentile 50 of
    observations lie above it, and 50 lie below it
  • For any p, the pth percentile has p of the
    measures lying below it, and (100-p) above it

38
Numerical Measures of Relative Standing
  • z-score the distance between a measurement x
    and the mean, expressed in standard units
  • Use of standard units allows comparison across
    data sets

39
Numerical Measures of Relative Standing
  • More on z-scores
  • Z-scores follow the empirical rule for mounded
    distributions

40
Methods for Detecting Outliers
  • Outlier an observation that is unusually large
    or small relative to the data values being
    described
  • Causes
  • Invalid measurement
  • Misclassified measurement (different
    population)
  • A rare (chance) event
  • 2 detection methods
  • Box Plots
  • z-scores

41
Methods for Detecting Outliers
  • Box Plots
  • based on quartiles, values that divide the
  • dataset into 4 groups
  • Lower Quartile QL 25th percentile
  • Middle Quartile median
  • Upper Quartile QU 75th percentile
  • Interquartile Range (IQR) QU - QL
  • Lower inner fence QL 1.5(IQR)
  • Upper inner fence QU 1.5(IQR)
  • Lower outer fence QL 3(IQR)
  • Upper outer fence QU 3(IQR)

42
Methods for Detecting Outliers 0
  • Box Plots
  • Not on plot inner and outer fences, which
    determine potential outliers(), beyond the outer
    fences(0) are probably outliers.

43
Methods for Detecting Outliers
  • Rules of thumb
  • Suspect Outliers
    Highly Suspect outliers
  • Box Plots Data points between Data
    points beyond
  • inner outer fences
    outer fences
  • Z-scores 2 lt z lt 3
    z gt 3
Write a Comment
User Comments (0)
About PowerShow.com