Title: Organizing
1Chapter 2 Organizing And Summarizing
Data Section 2.1 Organizing Qualitative Data
2Qualitative data allows for classification of
individuals based on some attribute or
characteristic. When this type of data is
collected, most generally, we are interested in
determining the number of individuals that occur
within each category.
- Common Displays of Qualitative Data
- Frequency Distribution
- Relative Frequency Distribution
- Cumulative Relative Frequency Distribution
- Bar Chart
- Pareto Chart
- Pie Graph
A frequency distribution lists the number of
occurrences for each category. This is a
count of the number of individuals in each
category.
3Example A survey was taken of a past section of
STAT 104 to what their current classification
was. Please note 1Freshman, 2Sophomore,
3Junior, 4Senior, 5Other. The following data
was collected. We wish to construct a frequency
distribution.
21
20
13
8
2
64
Always add the frequency to be sure that every
data point was tallied.
4The relative frequency is the proportion or
percent of observations within a category and is
found using the formula
A relative frequency distribution lists the
relative frequencies of each category of data.
5Convert the previous frequency distribution to a
relative frequency distribution.
21
21/64 32.8
20
13
8
2
100
64
6The cumulative relative frequency is a running
total of the relative frequencies. Convert the
above relative frequency distribution to a
cumulative relative frequency distribution.
32.8
21
32.8
32.8 31.3 64.1
20
13
64.1 20.3 84.4
8
84.4 12.5 96.9
2
96.9 3.1 100
7A bar graph is constructed by labeling each
category of data on a horizontal axis and the
frequency or relative frequency of the category
on the vertical axis. A rectangle of equal
width is drawn for each category. The height of
the rectangle is equal to the categorys
frequency or relative frequency.
A pareto chart is a bar graph whose bars are
drawn in decreasing order of frequency or
relative frequency. The above bar charts are
also pareto charts.
8A pie chart is a circle divided into sectors.
Each sector represents a category of data. The
area of each sector is proportional to the
frequency of the category.
9Section 2.2 Organizing Quantitative Data I
Note We are considering only one variable
Recall that we divided quantitative data into two
categories discrete and continuous. In
summarizing quantitative data, we must first
recognize whether the data is discrete or
continuous. If the data are discrete, the
categories of data will be the observations (as
in qualitative data) however, if the data are
continuous, the categories of data (called
classes) must be created using intervals of
numbers.
10Discrete Data We can create frequency, relative
frequency and cumulative relative frequency
distributions in the same manner for discrete
data as we did for qualitative data. Using
the frequency or relative frequency distribution,
we can create a histogram.
A histogram is constructed by drawing rectangles
for each class of data. The height of each
rectangle is the frequency or relative frequency
of the class. The width of each rectangle
should be the same and the rectangles should
touch each other.
11Example The manager of a Wendys fast-food
restaurant was interested in studying the typical
number of customers who arrive during the lunch
hour. The manager collected data for 40 randomly
selected 15-minute intervals of time during lunch
and constructed the following histogram.
Experimental Unit 15 min intervals Response
of customers Observational Study Sampling SRS
12Construct a relative frequency graph of this data.
Relative Frequency 0.125 0.25
0.375
7/40 0.175
7
5
0.125
18
0.45
7
0.175
2
0.05
0.025
1
40
1
13Continuous Data Raw continuous data do not have
any predetermined categories that can be used to
contrast a frequency distribution therefore, the
categories must be created. Categories of data
are created by using intervals of numbers called
classes.
Example The following table represents the
number of United States residents between the
ages of 25 and 74 that have earned a bachelors
degree. The data are based on the Census
Populations Survey conducted in 1998.
Notice that the data are categorized, or grouped,
by intervals of numbers. Each interval
represents a class.
14The lower class limit of a class is the smallest
value within the class.
25, 35, 45, 55, 65
The upper class limit of a class is the largest
value within the class.
34, 44, 54, 64, 74
The class width is the difference between two
consecutive lower class limits.
35 25 10
Notice in the above table that the class widths
are equal for all classes. One exception to this
requirement is in open-ended tables. A table is
open-ended if the last class does not have an
upper class limit. 65 and older 24 and younger
15Example Construct a Relative Frequency
distribution and histogram for the following data.
Three-Year Rate Of Return Of Mutual Funds
n 40
We must decide on appropriate class widths. We
need the lower class limit of the first class to
be slightly smaller than the smallest data value
and the upper class limit of the last class to be
slightly greater than the largest data value.
Notice the data ranges from 10.8 to 47.7. So we
will use a lower limit of 10.0 and a class width
of 5. This gives us 8 classes.
167
7/40100 17.5
11
27.5
8
20.0
6
15.0
3
7.5
3
7.5
0
0
5.0
2
40
100
17Three-Year Rate of Return Of Mutual Funds
Relative Frequency 0 0.05 0.1
0.15 0.2 0.25
Count
18Stem-and-Leaf Plots A stem-and-leaf plot is
another way to represent quantitative data
graphically.
- Construction of a Stem-and-Leaf Plot
- The stem of the graph will consist of the digits
to the left of the rightmost digit. The leaf of
the graph will be the rightmost digit. - Write the stems in a vertical column in
increasing order. Draw a vertical line to the
right of the stems. - Write each leaf corresponding to the stems to the
right of the vertical line. The leaves must be
written in ascending order.
19Example Construct a stem-and-leaf plot of the
three-year rate of return data given.
Step One Round the data. Why would we need to
do this?
Frequency distribution with classes of width 1
are non-informative.
Note Sorting data helps
20Stem-and-Leaf Plot
Notice how data is bunched Splitting is more
informative
Stem-and-Leaf Plot Using Split Stems
21(No Transcript)
22(No Transcript)
23Shapes of Distributions Bell-shaped highest
frequency occurs near the middle and frequencies
tail off to the left and right is roughly the
same pattern Skewed One tail is stretched out
longer than the other tail. Left Skewed most
of the data is piled on the high numbers
end Right Skewed most of the data is piled on
the low numbers end Uniform the frequency of
each value of the variable is equal Unimodal
there is only one major peak Bimodal there are
two major peaks J-shaped there is no tail on
the side of the class with the highest frequency
24X
X
25(No Transcript)
26Organizing Quantitative Data II
The class midpoint is found by adding the lower
class limit and upper class limit of a class and
dividing the result by 2.
A frequency polygon is drawn by plotting a point
above each class midpoint on a horizontal axis at
a height equal to the frequency of the class.
After the points for each class are plotted,
straight lines are drawn between consecutive
points.
27Example Draw a frequency polygon for the 3-year
rate of return data. (p. 22)
First find all the class midpoints
(1014.9)/2 12.45
(1519.9)/2 17.45 22.45 27.45 32.45 37.45 4
2.45 47.45
28 A time series plot is obtained by plotting the
time in which a variable is measured on the
horizontal axis and the corresponding value of
the variable on the vertical axis. Lines are
then drawn connecting the points.
Example The following data represent the
percentage of recent high school graduates who
enroll in college. Construct a time series plot
of the data.
29(No Transcript)
30Section 2.3 Graphical Misrepresentation Of Data
- Characteristics of Good Graphics
- Label the graphic clearly and provide
explanations if needed - Avoid distortion. Dont lie about the data
- Avoid three dimensions. Three-dimensional
charts may look nice, but they distract the
reader and often result in misinterpretation of
the graphic. - Do not use more than one design in the same
graphic. Sometimes graphs use a different
design in a portion of the graphic in order to
draw attention to this area. Dont use this
technique. Let the numbers speak for themselves.