Title: Presentation of Data, Models, and Results
1Presentation of Data, Models, and Results
2Presentation of Data, Models, and Results
Information, that is imperfectly acquired, is
generally as imperfectly retained and a man who
has carefully investigated a printed table,
finds, when done, that he has only a very faint
idea of what he has read and that like a figure
imprinted on sand, is soon totally erased and
defaced. The amount of mercantile transactions
on money, and of profit or loss, are capable of
being as easily represented in drawing, as any
part of space, or as the face of a country
though, till now, it has not been attempted.
Upon that principle these charts were made and,
while they give a simple and distinct idea, they
are near perfect accuracy as in any way useful.
On inspecting any of these Charts attentively, a
sufficiently distinct impression will be made, to
remain unimpaired for a considerable time, and
the idea which does remain will be simple and
complete, at once including the duration and
amount. Playfair, The Commercial and Political
Atlas, 1786, emphasis added by C.Ritter
3Outline
- Perception and Mind
- Elementary Statistical Graphs
- Principles
- Techniques
- Cleaning up a graph
- Messing up a graph
- General Advice
4Preview
- Perception and Mind
- Operations of the brain
- Rates of Information Flow
- Accuracy and Speed
- Mission impossible
- Efficient use
5Perception and Mind (1) Operation of the Brain
6Perception and Mind (2) Rates of Information Flow
Main Brain
2-4º
working memory, input organization, pattern
recognition, comparisons, arithmetic
long term memory, reasoning, reflexion
7Perception and Mind (3) Accuracy and Speed
8Perception and Mind (4) Mission Impossible
We cannot judge accurately vertical distance
between curves of varying slope.
9Perception and Mind (5) Efficient Use
- Good use of CPU
- work with four tokens in short term memory
- minimize perceptive noise
- focus attention
- minimize acquisition time and distance
- maximize parallel treatment
- maximize accuracte judgement by representing
quantities by position and length
- Counterproductive use of CPU
- require more than four tokens in short term
memory - add noise
- divert attention
- separate information which has to be treated
together - force sequential treatment
- maximize inaccurate judgement by representing
quantities by angle, volume and lengths which do
not share a common axis
10Preview
- Elementary Statistical Graphs
- Overview
- Construction of Histogram
- Construction of Box Plot
- Principles
- Fidelity
- Honesty
- Sobriety
- Purpose
- Techniques
- Visual Grouping
- Matrushka
- Proximity
- Choice of display dimensions
- Small multiples
11Elementary Statistical Graphs (1) Overview
- Chronology
- Morphology, Distribution
- Position, DispersionRule/Exception
Cumulative Distribution
Histogram, Bar Chart
Box Plot
Pie Chart
Requires stationarity
Scatterplot
Dot plot
12Elementary Statistical Graphs (2) Construction
of Histogram
- Règles de construction
- superficiefréquence
- observations sur la division de deux intervalles
appartiennent à lintervalle supérieure - données entières
- utiliser comme bornes des .5
- nombre dintervalles
13Elementary Statistical Graphs (3) Construction
of Box Plot
5
max. non aberrant
4
p75
axis
3
médiane
iqr
1 iqr
p25
2
1.5 iqr
min. non aberrant
1
observation suspecte
0
Moustaches jusquau quartile /- 1.5IQR ou le
max/min selon lequel est atteint le premier
14Principles (1) Fidelity
Original data should be presented in a way that
will preserve the evidence in the original data
for all predictions assumed to be useful. W
Shewhart
15Principles (2) Honesty
Visual increase 800(18-2)/2
Lie factor 15800/53
Actual increase 53(27.5-18)/18
source Tufte, The Graphical Display of
Quantitative Information
16Principles (3) Sobriety
17Principles (4) Purpose
- Everything you do in a graph should have a
purpose. - Axes?
- Colors and styles?
- Background/Frame?
- Points or bars?
- Separate or joined by lines?
- Order?
- (Radical) suggestion
- First turn all color and style options off
- Then add in options as needed
- At each time think why am I doing this? What is
the purpose?
18Techniques (1) Visual Grouping
Objective At most 4 groups (at any layer).
19Techniques (2) Matrushka
- Data General pattern Departures from the
general pattern
20Techniques (3) Proximity
Put small differences as close together as
possible. Big differences are also visible from a
distance.
21Techniques (4) Choice of display dimensions
22Techniques (5) Small multiples
23Outline
- Cleaning up a graph
- Messing up a graph
- original
- break the groups
- add noise
- divert attention and add images
- General Advice
24Cleaning Up a Graph
25Original
10000.0
1000.0
high C
Defects
100.0
A
Low C
Low C
10.0
a
b
high C
B
Low D
High D
1.0
0.0
10.0
20.0
30.0
40.0
50.0
60.0
70.0
80.0
Color
This graph shows the effect of four experimental
factors A, B, C, and D on two responses, Color
and Defects. A (red) always increases Defects.
When D is high(dashed arrows), A reduces color. B
(blue) reduces both Defects and Color. C and D
show a strong interaction. At high C (bold
arrows), the effect of D (reducing Defects and
increasing Color) is much stronger than at low C
(thin arrows).
26Break the groups
10000.0
Cd
1000.0
cd
cD
100.0
A
CD
10.0
a
b
B
1.0
0.0
10.0
20.0
30.0
40.0
50.0
60.0
70.0
80.0
27Add Noise
10000.0
Cd
1000.0
100.0
Defucts
A
cd
cD
CD
10.0
a
b
B
1.0
0.0
10.0
20.0
30.0
40.0
50.0
60.0
70.0
80.0
Color
28Divert Attention and Add Text Images
10000.0
Cd
1000.0
Defects
100.0
A
cd
cD
CD
10.0
a
b
B
1.0
0.0
10.0
20.0
30.0
40.0
50.0
60.0
70.0
80.0
Color
29Another Example of a Messed Up Graph
Source a sales presentation
30General Advice
- When you analyze data and in particular before
any inferential statistical analysis, make graphs
and look at them. - Make time charts if time plays a role
- Add value to your graphs
- if you show a scatterplot and you have a third
variable, add it using labels, color, or markers - use categorized graphs (same graph type, scale,
close together) - minimize ink used for decoration (frames, grids,
background), focus on the data you want to show - read and interpret carefully the graphs and
tables you create, note what you see, not what is
expected and what is not - separate what is shared by the data (model) from
what is individual (residual) - (in particular for any graph you want to show to
others) think about how it will be perceived