Title: Graphs, Good and Bad
1Graphs, Good and Bad
2Organizing Data
- How data are organized and presented is important
- Just as a writer organizes words into a coherent
story, a statistician organizes and presents data
to tell a clear story - Too much data is hard to digest
- Too little data is uninformative
- Disorganized or sloppily presented data are hard
to read - Also just as a story can be manipulated to
emphasize certain aspects, so can the
organization and presentation of data be
manipulated to stress of hide certain elements of
the story
3Organizing Data
- Distortion of this sort has nothing to do with
Statistics per se and all to do with human nature
- Statistics is just another form of information
and mode of communication that can be manipulated
by those with a will to deceive - Figures wont lie, but liars will figure
4Getting rich
- There was a significant bull market at the end of
the 1990s how big? - Pictures are a very quick way of absorbing how
big and putting it in perspective - The next two graphs show
- The percentage change in the Standard Poors 500
index from 1971 to 2003 - The percentage increase in value of 1,000
invested in 1970
5(No Transcript)
6(No Transcript)
7Data tables
- The Statistical Abstract of the United States is
an annual volume filled with numerical data in
tables, for example - The number of schools in the US by type
- The number of students enrolled in each type of
school - The number of degrees granted by colleges of
different types to students of different types - The data are summarized in tables
- We dont want to look at the individual data for
every school or college, rather we want an
informative summary of those data
8Example 1 What makes a clear table?
- How well educated are 30-something young adults?
- The table on the next slide answers that question
and conforms to some good practices when making
tables - Clearly labeled, see the subject of the data at
once - Main heading gives general subject and date that
data pertain to - Labels in the table identify the variables and
their units - The source of the data is attributed at the foot
of the table
9(No Transcript)
10Data tables
- Table 10.1 breaks down 30-somethings into levels
of education and displays - Counts at each educational level
- Rates or percentages of the total number of
30-somethings at each educational level - The total number of 30-somethings an the total of
the percentages at the bottom - The two columns with counts and percentages
describe the distribution of 30-somethings
according to education level - Both columns are useful, the first gives the
total size of each group, the second how much of
the total each group is
11Data tables
- If you check Table 10.1 for consistency you will
see that the total is 20,521, but if you add up
all the counts you get 20,522 - What happened?
- The entries are rounded to the nearest thousand
- They are totaled before they are rounded and then
the total is rounded separately ? may not match
perfectly
12Pie charts and bar graphs
- Pie charts show how a whole is divided into parts
- The point of a pie chart is to show the
composition of a total percentages - Pie charts can only display a few pieces of the
whole before they become hard to read and
confusing - We can make a pie chart of the distribution in
Table 10.1 - Start by drawing a circle, this represents the
whole all the 30-somethings - Wedges within the circle represent the parts,
with the angle spanned by a wedge in proportion
to the size of the part
13Pie charts and bar graphs
- For example, 22.4 of the 30-somethings have a
bachelors degree but not an advanced degree - So, the bachelors degree wedge spans 22.4 of
the 360 degrees of the total - 0.224 x 360 81 degrees
- Pie charts force us to see that the parts
together make a whole, but its hard to read
the angles and get an exact sense of the
distribution represented by a pie chart - In most cases the actual percentage are included
as labels in the wedges, as we have done in
Figure 10.3 on the next slide
14(No Transcript)
15Pie charts and bar graphs
- A bar graph addresses the fact that pie charts
can be hard to read precisely - Figure 10.4 is a bar graph of the same data in
Table 10.1 and the pie chart in Figure 10.3 - The height of each bar shows the percentage of
30-somethings with each education level - Now we can easily read that just less than 15
dont have a high school degree - A bar graph allows us to readily compare the
percentages in each category by comparing the
heights of the bars
16(No Transcript)
17Pie charts and bar graphs
- As we think about graphs, it is useful to
distinguish - Variables that have a meaningful numerical scale
such as height or SAT scores, - Variables that place individuals into categories
like sex, education or occupation - Pie charts and bar graphs are best for
categorical variables of the second type
18Pie charts and bar graphs
- Bar graphs are also useful for other things
- Bar graphs can also compare quantities that are
not part of a whole
19Example 3 High taxes?
- Figure 10.5 displays the level of taxation in
eight democratic nations - The height of the bars shows the percentage of
each nations GDP that is taken in taxes - Turns out we Americans arent taxed as heavily as
we think when compared to other nations
20(No Transcript)
21Beware the pictogram
- Bar graphs compare quantities by comparing the
heights of the bars - Our eyes react to both the height and the area of
the bars, so if two bars are of similar height
but one is wider, our eyes will emphasize the
wider one - For this reason its important to keep all the
bars the same width, so that we compare them just
on height and not on area too - Its sometimes tempting to replace the bars with
pictures - Its hard to make pictures all the same width
without distorting them ? leads to the area
problem - Avoid using pictures!
22(No Transcript)
23A misleading pictogram
- Figure 10.6 on the previous slide is misleading
- The numbers on the graph indicate that
advertising spending at Time is 1.64 times that
at Newsweek - Why does the graph suggest that Time is so much
farther ahead? - To magnify the image, the artist increases both
height and width thus making the area Times
image 1.64 x 1.64 2.7 times Newsweeks image - Our eyes respond to both the area and the height
and see a much bigger difference between Time and
Newsweek than really exists
24Change over time line graphs
- Many quantitative variables are measured over
intervals of time - The height of growing child at the end of each
month - The value of the stock market at the end of each
day - In these cases our main interest is change over
time - Change over time is displayed using a line graph
25Example 5 The price of gasoline
- How has the price of gasoline changed over time?
- Figure 10.7 on the next slide is a line graph
showing the average price of regular unleaded
gasoline across the US each month from January
1994 to July 2004 - There are 115 data points on this line graph
- It would be very difficult to read down a table
with 120 rows and see patterns in the average
price of gasoline - Reading the line charge quickly reveals some
patterns - What are they?
26(No Transcript)
27Change over time line graphs
- Look for an overall pattern. A trend is a
long-term upward or downward movement. From 94
to 99 there was no trend, but from 99 to 04
there was an upward trend - Look for striking deviations from the overall
pattern. There is a striking dip in 01 that
deviates from the upward trend. - Look for regular variation or seasonal variation.
Gasoline prices are usually highest in the
summer when people drive more and lower in winter.
28Change over time line graphs
- Seasonal variation is often removed from time
series because it does not convey important
information and may confuse the interpretation of
the time series - For example, using the unemployment rate we want
to see changes that correspond to actual
(unexpected) changes in the employment rate - We dont want to see the normal expected seasonal
variation in the unemployment rate that has
nothing to do with the labor market and all to do
with timing of the holiday season and the weather
29Change over time line graphs
30Watch those scales!
- Graphs speak strongly and can easily mislead
- One should always note the scale of the axes in a
line graph - The graphs on the next slide plot the number of
unmarried-couples households in the US - The one on the left indicates a moderate increase
- The one on the right suggests a thundering
increase - The difference is the scales of both axes
- On the left the vertical (count) axis is short an
horizontal (time) axis is long - On the right, the reverse is true which
exaggerates the rate of increase
31(No Transcript)
32Watch those scales!
- When plotting the change in a value over time, it
is often more accurate and informative to plot
the percentage increase from period to period - Figure 10.1 of the stock market (next slide) that
we have already seen does this - While the actual value of an investment, plotted
in Figure 10.2, changes dramatically in the
latter years, the increase from year-to-year
during this period was not much more than it was
in the 70s - This is because the value of the investment was
relatively small in the 70s, so actual value
added per percentage change was small
33(No Transcript)
34(No Transcript)
35Making good graphs
- Graphs are the most effective way to communicate
using data - A good graph can easily reveal fact about the
data that would be hard to detect from a table of
numbers - Good graphing principles
- Use good labels and legends that tell what
variables are plotted, the units used and the
source of the data - Make the data stand out most not the
background, labels, grids or unnecessary artwork - Pay attention to what the eye sees dont use
pictograms, choose scales carefully, and dont
use 3-D effects that confuse the eye without
adding information
36Example 7 The rise in college education
- Figure 10.9 on the next slide shows the rise in
the percentage of women 25 and older who have at
least a bachelors degree - Its a cluttered mess
- No axis labels
- Confusing grid lines
- Unnecessary pictures
- Strangely formatted and positioned title
- A good graph uses only as much ink as necessary
- Decorations are generally not advised!
37(No Transcript)
38Example 8 High taxes, reconsidered
- In the bar graph we saw in Figure 10.5, countries
are arranged along the horizontal axis
alphabetically - It is clearer to order them by tax burden so that
we can easily pick out the highest and lowest and
which countries are in between - Ordering the countries by tax burden improves the
graph by making it clear where a country stands
in the group of eight that are plotted
39(No Transcript)
40Summary
- To see what data say, use graphs
- The choice of graph depends on the type of data
- To display a distribution of categorical data,
use a pie chart or a bar graph - Pie charts always show the parts of some whole
- Bar graphs can compare any set of numbers
measured in the same units - To show how a quantitative variable changes over
time use a line graph that plots values on the
vertical axis and time against the horizontal axis
41Summary
- Graphs can mislead the eye
- Avoid pictograms
- Pay attention to scales