Title: Methods for Describing Sets of Data
1Chapter 2
- Methods for Describing Sets of Data
2Memorial University of Newfoundland
Describing Data
Data
Qualitative Data
Quantitative Data
Many
Graphical Methods
Graphical Methods
Numerical Methods
Numerical Methods
Histogram
Summary Table
Dot Plot
Stem- -Leaf
Bar Graph
Pie Chart
3Describing Qualitative Data
- Qualitative data are nonnumeric in nature
- Best described by using Classes
- 2 descriptive measures
- class frequency number of data points in a
class - class relative class frequency
- frequency n
- class percentage class relative frequency x 100
4Describing Qualitative Data Displaying
Descriptive Measures
- Summary Table(22 adult, three types of aphasiacs)
Class Frequency
Class percentage class relative frequency x 100
5Describing Qualitative Data Qualitative Data
Displays
6Describing Qualitative Data Qualitative Data
Displays
7Memorial University of Newfoundland
Dot Plot
1. Condenses data by grouping the same values
together 2. Numerical value is located by a dot
on horizontal axis 3. Data 21, 24, 24, 26,
27, 27, 30, 32, 38, 41
20
25
30
35
40
45
8Memorial University of Newfoundland
Stem-and-Leaf Display
- Divide each
- observation into stem
- value and leaf value
- Stem value defines class
- Leaf value defines frequency (count)
2
144677
3
028
4
1
2. Data 21, 24, 24, 26, 27, 27, 30, 32, 38, 41
9Memorial University of Newfoundland
Stem-and-Leaf Display
Raw Data 21, 24, 24, 26, 27, 27, 30, 32, 38,
41 Using MINITAB, we get Stem-and-Leaf Display
C1 Stem-and-leaf of C1 N10 Leaf Unit 1.0
3 2 144
(3) 2 677
4 3 02
2 3 8
1 4 1
10Memorial University of Newfoundland
Stem-and-Leaf Display
Raw Data 2.1, 2.4, 2.4, 2.6, 2.7, 2.7, 3.0, 3.2,
3.8, 4.1 Using MINITAB, we get Stem-and-Leaf
Display C2 Stem-and-leaf of C1 N10 Leaf Unit
0.1
3 2 144
(3) 2 677
4 3 02
2 3 8
1 4 1
11Memorial University of Newfoundland
Stem-and-Leaf Display
Raw Data 210, 240, 240, 260, 270, 270, 300, 320,
380, 410 Using MINITAB, we get Stem-and-Leaf
Display C3 Stem-and-leaf of C1 N10 Leaf Unit
10
3 2 144
(3) 2 677
4 3 02
2 3 8
1 4 1
12Memorial University of Newfoundland
Histogram
- Condenses data by grouping similar values into
classes in a graph - May show frequencies (counts) or relative
frequencies (proportions) - Must first develop a frequency distribution table
13Memorial University of Newfoundland
Frequency Distribution Table Example
Raw Data 24, 26, 24, 21, 27, 27, 30, 41, 32, 38
Class
Midpoint
Frequency
15 but lt 25
20
3
Width
25 but lt 35
30
5
35 but lt 45
40
2
(Upper Lower Boundaries) / 2
Boundaries
14Memorial University of Newfoundland
Relative Frequency Distribution Tables
Percentage Distribution
Relative Frequency Distribution
Class
Prop.
Class
15 but lt 25
.3
15 but lt 25
30.0
25 but lt 35
.5
25 but lt 35
50.0
35 but lt 45
.2
35 but lt 45
20.0
15Memorial University of Newfoundland
Histogram
Class
Freq.
15 but lt 25
3
Count
25 but lt 35
5
5
35 but lt 45
2
4
Frequency Relative Frequency Percent
3
Bars Touch
2
1
0
0 15 25 35 45 55
Lower Boundary
16Summation Notation
- Used to simplify summation instructions
- Each observation in a data set is identified by a
subscript - x1, x2, x3, x4, x5, . xn
- Notation used to sum the above numbers together
is
17Summation Notation
- Data set of 1, 2, 3, 4
- Are these the same? and
18Memorial University of Newfoundland
Numerical Methods for Quantitative Data
Numerical Data
Properties
Central
Variation
Shape
Tendency
Skew
Mean
Range
Median
Variance
Mode
Standard Deviation
19Numerical Measures of Central Tendency
- Central Tendency tendency of data to center
about certain numerical values - 3 commonly used measures of Central Tendency
- Mean
- Median
- Mode
20Numerical Measures of Central Tendency
- The Mean
- Arithmetic average of the elements of the data
set - Sample mean denoted by
- Population mean denoted by
- Calculated as
- and
21Numerical Measures of Central Tendency
- The Median
- Middle number when observations are arranged in
order - Median denoted by m
- Identified as the observation if n is odd, and
the mean of the and observations if n is even
22Numerical Measures of Central Tendency
- The Mode
- The most frequently occurring value in the data
set - Data set can be multi-modal have more than one
mode - Data displayed in a histogram will have a modal
class the class with the largest frequency
23Memorial University of Newfoundland
Mode Example
No ModeRaw Data 10.3 4.9 8.9 11.7 6.3 7.7 One
ModeRaw Data 6.3 4.9 8.9 6.3 4.9 4.9 More
Than 1 ModeRaw Data 21 28 28 41 43 43
24Numerical Measures of Central Tendency
- The Data set 1 5 6 8 3 9 11 8 12
- The median is 3 ???
- The ordered data 1 3 5 6 8 8 9 11 12
- Mean
- Median is the or 5th observation, 8
- Mode is 8
-
25Shape
1. Describes how data are distributed 2.
Measured by skew (symmetry)
Symmetric
Left-Skewed
Right-Skewed
Relative frequency
Mean
Median
Mode
Mean
Median
Mode
Mode
Median
Mean
26Numerical Measures of Variability
- Variability the spread of the data across
possible values - 3 commonly used measures of Variability
- Range
- Variance
- Standard Deviation
27Numerical Measures of Variability
- The Range
- Largest measurement minus the smallest
measurement - Loses sensitivity when data sets are large
-
- These 2 distributionshave the same range.
- How much does therange tell you about the data
variability?
28Numerical Measures of Variability
- The Sample Variance (s2)
- The sum of the squared deviations from the mean
divided by (n-1). Expressed as units squared - Why square the deviations? The sum of the
deviations from the mean is zero -
29Equivalent Formula
30Another Equivalent Formula
31Variance Example
- Raw Data 10.3 4.9 8.9 11.7 6.3 7.7
32Numerical Measures of Variability
- The Sample Standard Deviation (s)
- The positive square root of the sample variance
- Expressed in the original units of measurement
- s ? range/4, we stress that this is no substitute
for calculating the exact value of s when
possible -
33Memorial University of Newfoundland
Standard Notation
Measure
Sample
Population
Mean
?
x
?
Stand. Dev.
s
?
2
2
Variance
s
?
Size
n
N
34Interpreting the Standard Deviation
- How many observations fit within n s of the
mean?
35Interpreting the Standard Deviation
- You have purchased compact fluorescent light
bulbs for your home. Average life length is 500
hours, standard deviation is 24, and frequency
distribution for the life length is mound shaped.
One of your bulbs burns out at 450 hours. Would
you send the bulb back for a refund?
36Numerical Measures of Relative Standing
- Descriptive measures of relationship of a
measurement to the rest of the data - Common measures
- percentile ranking or percentile score
- z-score
37Numerical Measures of Relative Standing
- Percentile rankings make use of the pth
percentile - The median is an example of percentiles.
- Median is the 50th percentile 50 of
observations lie above it, and 50 lie below it - For any p, the pth percentile has p of the
measures lying below it, and (100-p) above it
38Numerical Measures of Relative Standing
- z-score the distance between a measurement x
and the mean, expressed in standard units - Use of standard units allows comparison across
data sets
39Numerical Measures of Relative Standing
- More on z-scores
- Z-scores follow the empirical rule for mounded
distributions
40Methods for Detecting Outliers
- Outlier an observation that is unusually large
or small relative to the data values being
described - Causes
- Invalid measurement
- Misclassified measurement (different
population) - A rare (chance) event
- 2 detection methods
- Box Plots
- z-scores
41Methods for Detecting Outliers
- Box Plots
- based on quartiles, values that divide the
- dataset into 4 groups
- Lower Quartile QL 25th percentile
- Middle Quartile median
- Upper Quartile QU 75th percentile
- Interquartile Range (IQR) QU - QL
- Lower inner fence QL 1.5(IQR)
- Upper inner fence QU 1.5(IQR)
- Lower outer fence QL 3(IQR)
- Upper outer fence QU 3(IQR)
42Methods for Detecting Outliers 0
- Box Plots
- Not on plot inner and outer fences, which
determine potential outliers(), beyond the outer
fences(0) are probably outliers.
43Methods for Detecting Outliers
- Rules of thumb
- Suspect Outliers
Highly Suspect outliers - Box Plots Data points between Data
points beyond - inner outer fences
outer fences - Z-scores 2 lt z lt 3
z gt 3