Title: Statistics of Illumination
1Statistics of Illumination
- Beth Chance
- Roxy Peck
- Cal Poly, San Luis Obispo
2STATISTICS SAY
- Increasingly daily life involves statistical
information - interpretations of graphical and numerical
summaries - comparisons of groups
- poll results from random samples
- conclusions from randomized experiments
- predictions of future outcomes
3Most people use statistics as a drunkard uses a
lamppost-
more for support than for illumination.
4Predicting Variable Behavior
5Predicting Variable Behavior
- (a) Height of students in this class
- (b) Students preference for coca-cola vs.
pepsi-cola - (c) Number of siblings of individuals
- (d) Amount paid for last haircut
- (e) Gender breakdown
- (f) Students guesses of my age
6Matching Variables to Graphs
7Matching Variables to Graphs
- Think about context!
- Anticipate patterns and variations
- variable intuition
- graph-sense
8STATISTICS SAY
- Students heights would show more variability
than guesses of my age - KDC Pursues High-Return, Low-Risk Strategy
9What is Variability?
10What is Variability?
Class F Class G Class H Class I Class J
range 6 8 8 8 8
IQR 2.75 3 0 8 5
Std. Dev. 1.769 2.041 1.180 4.000 2.657
11Describing Variability
- The bumpiness of a histogram does not determine
the variability of the observations - The number of distinct values the variable takes
does not determine the variability of the
observations
12STATISTICS SAY
- 5236 drivers age 65 and over were involved in
fatal accidents, compared to only 2900 drivers
aged 16 and 17, so young people are safer
drivers... - 65 of motorcycle fatalities occurred in states
with mandatory helmet laws...
13Counts Versus Ratios
- Simple counts are often not a good basis for
comparison of two or more groups. - Group size isnt always obvioustwo groups of 25
U.S. states may have very different sizes even
though both include the same number of states. - Deciding on a sensible basis for comparison
requires thought!
14STATISTICS SAY
- 85 of software developers predicted that
Microsoft's integration of Internet functions
into Windows would help their company
15Some Simple Questions
- Question 1
- Lost ticket
- Yes 6
- No 9
- Lost 20
- Yes 8
- No 6
16Some Simple Questions
- People are more likely to say yes when they
have lost a 20 bill - People tend to answer not surprising to both
expressions - People are more likely to choose program A with
the save version and program B with the die
version
17Some Simple Questions
- Be careful when wording survey questions ask to
see the phrasing! - Bill Gates It would help me EMENSELY to have a
survey showing that 90 of developers believe
putting the browser into the operating system is
a good idea - Browser vs. browser technologies
18STATISTICS SAY
- Researchers in Philadelphia investigated whether
pamphlets containing information for cancer
patients are written at a level that the cancer
patients can comprehend - Median reading levels are equal
19Readability of Cancer Pamphlets
20Readability of Cancer Pamphlets
- Graphs can illuminate
- Look at the data!
- Think about the question
21STATISTICS SAY
- American men were randomly selected for the 1970
draft - Draft numbers (1-366) were assigned to birthdates
22Draft Lottery
- Calculate the median draft number for each month
- 31 days 16th value
- 30 days average 15th and 16th values
- 29 days 15th value
23Draft Lottery
- month median
- January 211.0
- February 210.0
- March 256.0
- April 225.0
- May 226.0
- June 207.5
- month median
- July 188.0
- August 145.0
- September 168
- October 201
- November 131.5
- December 100
24Draft Lottery
25Draft Lottery
26Draft Lottery
- Statistics matter
- Summaries can illuminate
- Randomization can be difficult
27STATISTICS SAY
- The average time between eruptions of the Old
Faithful Geyser is 71 minutes - August, 1985
28Geyser Eruptions
29Geyser Eruptions
- Looks can be deceiving!
- Use the graph that summarizes without losing
important details
30STATISTICS SAY
- The average major league baseball salary in the
United States is about 1.5 million
31Rowers Weights
- 2000 Mens Olympic Rowing Team
-
32Rowers Weights
33Rowers Weights
- Mean Median
- Full Data Set 197.29 207.5
- Without Coxswain 200.11 210.00
- Without Coxswain or 210.57 210.00 lightweight
rowers - With heaviest at 320 215.33 210.00
- Resistance....
34Rowers Weights
- Know what your numerical summary is measuring
- Investigate causes for unusual observations
- Baseball median salary 500,000
35STATISTICS SAY
People live longer in countries with more
televisions
36Televisions and Life Expectancy
- Buy another television?
- Association is not causation
37STATISTICS SAY
- Overall survival rates
- A 80 B 90
- Fair condition
- A 98.3 B 96.7
- Poor condition
- A 52.5 B 30.0
38Hospital Recovery Rates
- Simpsons Paradox
- Hospital A gets most of the poor condition cases
- Patients in poor condition are less likely to
survive - Thus hospital A has the lower survival rate
despite being the better choice for either
condition - Beware of lurking variables
39Hospital Recovery Rates (cont.)
100
Fair
survive
Hospital A
Hospital B
0
40Hospital Recovery Rates (cont.)
100
Fair
survive
Poor
Hospital A
Hospital B
0
41Hospital Recovery Rates (cont.)
100
Fair
survive
Poor
Hospital A
Hospital B
0
42STATISTICS SAY
- Taking an aspirin each day reduces the risk of
heart attack for men, but less so for women
43How Experiments Take Variability Into Account
- Direct control
- Blocking
- Randomization
44Randomization
1 2
3 4
5 6
7 8
45Blocking Scheme A
1 2
3 4
5 6
7 8
46Blocking Scheme B
1 2
3 4
5 6
7 8
47Results from 100 Trials
First Blocking Scheme
Completely Randomized
Second Blocking Scheme
48Controlling for Variability
- Blocking reduces variability in the estimated
mean difference - Homogeneous blocks are desirable
- Randomization evens out the effects of extraneous
variables
49STATISTICS SAY
- A log was selected at random
50Sampling Logs
- Does choosing times at random result in a random
sample of logs? - _______________________________ ?
51Estimating Mean String Length
- Does the sampling procedure produce a simple
random sample? - How is this related to the log problem??
- Can you suggest a better sampling method?
52Selecting a Sample
- Random Sampling eliminates human selection bias
so the sample will be fair and unbiased/representa
tive of the population. - While increasing the sample size improves
precision, this does not decrease bias.
53STATISTICS SAY
- 45 /- 1 of people surveyed claim to prefer
watching soccer to baseball
54Reeses Pieces
55Reeses Pieces
- Take sample of 25 candies
- Sort by color
- Calculate the proportion of orange candies in
your sample - Construct a dotplot of the distribution of sample
proportions
56Reeses Pieces
- Did everyone obtain the same sample result?
- Is there a pattern to the sample results?
- Is it possible to make predictions about the
population based on only one sample? - Can you be confident of your prediction?
57Thank You
- bchance_at_calpoly.edu
- rpeck_at_calpoly.edu