Title: SUMMARIZING AND GRAPHING DATA
1SUMMARIZING AND GRAPHING DATA
2Overview
3Planning and Conducting a Study
- Understand the Nature of the Problem
- Decide What to Measure and How to Measure It
- Data Collection
- Data Summarization and Preliminary Analysis
- Formal Data Analysis
- Interpretation of Results
Statistics The Exploration and analysis of Data,
4th ed. Devore/Peck
4Important Characteristics of Data
- Center A representative or average value that
indicates where the middle of the data set is
located. - Variation A measure of the amount that the data
values vary among themselves. - Distribution The nature or shape of the
distribution of the data. - Outliers Sample values that lie very far away
from the vast majority of the other sample
values. - Time Changing characteristics of the data over
time.
5Frequency Distributions
6Definition
- A frequency distribution (or frequency table)
lists data values (either individually or by
groups of intervals), along with their
corresponding frequencies (or counts).
7More Definitions
- Lower class limits are the smallest numbers that
can belong to the different classes. - Upper class limits are the largest numbers that
can belong to the different classes. - Class boundaries are the numbers used to separate
classes, but without the gaps created by class
limits. - Class midpoints are the values in the middle of
the classes. Each class midpoint can be found by
adding the lower class limit to the upper class
limit and dividing the sum by 2. - Class width is the difference between two
consecutive lower class limits or two consecutive
lower class boundaries.
8Procedure for Constructing a Frequency
Distribution
- Decide on the number of classes needed.
- CalculateRound this result to get a convenient
number. - Starting point Begin by choosing a number for
the lower limit of the first class. - Using the lower limit of the first class and the
class width, proceed to list the other lower
class limits. - List the lower class limits in a vertical column
and proceed to enter the upper class limits. - Go through the data set putting a tally in the
appropriate class for each data value. Use the
tally marks to find the total frequency for each
class.
9Example
- Use Data Set 6 Bears, and construct a frequency
distribution for the lengths of bears using 11
classes.
10Example (continued)
11Example (continued)
12Relative Frequency Distribution
- A relative frequency distribution includes the
same class limits as a frequency distribution,
but relative frequencies (a relative frequency is
found by dividing a class frequency by the total
frequency) are used instead of actual frequencies.
13Example
- Use Data Set 6 Bears, and construct a relative
frequency distribution for the lengths of bears
using 11 classes.
14Example (continued)
15Cumulative Frequency Distribution
- A cumulative frequency distribution includes the
same class limits as a frequency distribution,
but cumulative frequencies (a cumulative
frequency for a class is the sum of the
frequencies for that class and all previous
classes) are used instead of actual frequencies.
16Interpreting Frequency Distributions
- Is the distribution normal?
- Do the frequencies start low, then increase to
some maximum frequency, then decrease to a low
frequency? - Is the distribution approximately symmetric? That
is, are the frequencies evenly distributed on
both sides of the maximum frequency?
17Histograms
18Histogram
- A histogram is a bar graph in which the
horizontal scale represents classes of data
values and the vertical scale represents
frequencies. The heights of the bars correspond
to the frequency values, and the bars are drawn
adjacent to each other (without gaps).
19Example
- Use Data Set 6 Bears, and construct a histogram
for the lengths of bears using 11 classes.
20Example (continued)
21Histogram
- A relative frequency histogram has the same shape
and horizontal scale as a histogram, but the
vertical scale is marked with relative
frequencies instead of actual frequencies.
22Example
- Use Data Set 6 Bears, and construct a relative
frequency histogram for the lengths of bears
using 11 classes.
23Example (continued)
24Interpreting Histograms
- Is the distribution normal?
- Do the frequencies start low, then increase to
some maximum frequency, then decrease to a low
frequency? - Is the distribution approximately symmetric? That
is, are the frequencies evenly distributed on
both sides of the maximum frequency?
25Statistical Graphics
26Frequency Polygons
- A frequency polygon uses line segments connected
to points located directly above class midpoint
values. - A cumulative frequency polygon (or ogive) uses
line segments connected to points located
directly above class midpoint values.
27Dotplot
- A dotplot consists of a graph in which each data
value is plotted as a point (or dot) along a
scale of values. Dots representing equal values
are stacked.
28Stemplot
- A stemplot (or stem-and-leaf-plot) represents
data by separating each value into two parts - the stem (such as the leftmost digit), and
- the leaf (such as the rightmost digit).
29Pareto Charts
- A Pareto chart is a bar graph for qualitative
data, with the bars arranged in order according
to frequency. Vertical scales in Pareto charts
can represent frequencies or relative frequencies.
30Pie Charts
- A pie chart is a graph depicting qualitative data
as slices of a pie.
31Scatterplots
- A scatterplot (or scatter diagram) is a plot of
paired (x, y) data with a horizontal x-axis and a
vertical y-axis. The data are paired in a way
that matches each value from one data set with a
corresponding value from a second data set.
32Time-Series Graph
- A time-series graph is graph of time-series data,
which are data that have been collected at
different points in time.
33Presenting Data Graphically
- Some important principles
- For small data sets of 20 values or fewer, use a
table instead of a graph. - A graph of data should make the viewer focus on
the true nature of the data, not on other
elements, such as eye-catching but distracting
design features. - Do not distort the data construct a graph to
reveal the true nature of the data. - Almost all of the ink in a graph should be used
for the data, not for other design elements.
The Visual Display of Quantitative Information,
2nd ed. Tufte
34Presenting Data Graphically
- Some important principles
- Dont use screening consisting of features such
as slanted lines, dots, or cross-hatching,
because they create the uncomfortable illusion of
movement. - Dont use areas or volumes for data that are
actually one-dimensional in nature. - Never publish pie charts, because they waste ink
on non-data components, and they lack an
appropriate scale.
The Visual Display of Quantitative Information,
2nd ed. Tufte