Title: More Univariate Data Quantitative Graphs
1More Univariate DataQuantitative Graphs
Describing Distributions with Numbers
2Quantitative Data
- Quantitative variables take numerical values for
which it makes sense to do arithmetic operations
like adding or averaging. - Possible Graphs dotplots, stemplots,
- histograms, Cumulative frequency plots,
boxplots
3Graphs
- Be sure to always
- Title your graphs
- Label your axis including units of
- measure
- number your axes in a consistent
- and reasonable manner
4Quantitative Graphs
Histograms A histograms vertical axis is counts
while a relative frequency histograms vertical
axis is percents.
5Quantitative Graphs
Stem Leaf This type of graph uses place values
as the stems units as the leaves. (Its very
hard to describe, we are going to make one for an
example.) We can also create whats called a
back-to-back stem plot with two data sets. It is
helpful for comparing to sets of univariate data.
6Quantitative Graphs
A histogram is preferred sometimes for larger
data sets. Its strongest asset is that it shows
shape well. Its weakness is that the individual
data values are lost. A stem leaf is
preferred sometimes because it retains all data
values but its very difficult to create for
large data sets.
7Quantitative Data
- The distribution of a variable tells us what
values the variable typically takes and how often
it takes them. It is a generalization about the
variable values.
8- When describing any Quantitative distribution
- C Center
- U Unusual Features
- S Shape
- S Spread
-
- B Be
- S - Specific
9- Common Shapes of distributions/graphs
- Symmetric
- Skewed to the right
- Skewed to the left
- Bimodal
- Uniform
10- Once you have chosen a shape, you choose a
measure of center and spread based on that shape.
11Center when the distribution is symmetric Mean
the average formula
12Measure Spread or Variability when the
distribution is Symmetric
13Measure of Center when the distribution is not
symmetric Median the middle value in an
ordered list. If there are two values in the
middle, then average them.
14Measure Spread or Variability when the
distribution is not Symmetric
- We can also examine spread by looking at the
range of middle 50 of the data. This is called
the - Interquartile Range (IQR).
- IQR Q3 Q1
15We also need to talk about the 5-number summary.
The 5-number summary is made up of the minimum,
the first quartile, Q1 (where 25 of the data
lies below this value), the median, the third
quartile, Q3 (where 75 of the data lies below
this value), and the maximum.
16Another Measure of Spread or Variability
- Range the difference between the maximum and
the minimum observations. This is the simplest
measure of spread. We typically use this as
preliminary information or if it is the only
measure of spread we can calculate.
17Another measure of spread or variability
- Variance is the average of the squares of the
deviations of the observations from their mean.
It is the standard deviation squared.
18- An outlier is an individual observation in data
that falls outside the overall pattern of the
data.
19Using the IQR, we can perform a test for
outliers. Outlier Test Any value below Q1
1.5(IQR) or above Q3 1.5 (IQR) is
considered an outlier.
20Another Graph When we graph the five-number
summary along with outliers if present, it leads
to a modified boxplot.
21Measures that are not strongly affected by
extreme values are said to be resistant. The
median and IQR are more resistant than the mean
and standard deviation. The standard deviation,
is even less resistant than the mean.
22Measures of Spread or Variability Why? We
measure spread because its an important
description of what is happening with the data.
We need to know about the amount of variation we
can expect in a data set.