Title: Fundamentals of Applied Statistics
1Fundamentals of Applied Statistics
- Gerald van Belle
- Department Biostatistics, Department of
Occupational and Environmental Health Sciences - University of Washington,
- Seattle, WA.
2Damaged Spitfire
3Vulnerability Analysis of Spitfires (sample
15/400)
4Vulnerability analysis
Abraham Wald Advice Reinforce planes Where they
have Not been hit
5(No Transcript)
6Theme for this Week
6
7Basic BiostatisticsTheme for this Week
- What is the question?
- Is it measurable?
- Where will you get the data?
- What do you think the data are telling you?
7
8Outline of talk
- Variation Causation
- Question
- Measurable
- Get data
- Interpret
8
9Two definitions
- A.N. Whitehead (1925) on science
- a vehement and passionate interest in the
relation of general principles to irreducible and
stubborn facts. - Statistics
- A vehement and passionate interest in the
relation of general principles to
variationvariation observed, variation managed,
and variation induced.
10What is the question? Is it measurable?
Not everything that can be counted counts and
not everything that counts can be
counted. Albert Einstein
(The things that really count cant be countedor
are very difficult to count)
11Questions that count and may be countable
- Are pesticides neurotoxic?
- Does air pollution cause ill health?
- Is obesity increasing in Costa Rica?
- Does Prozac increase suicide rate?
- Why does it take 10 years to get a new drug on
the market? - Your question?
12Issues in measurability
- Not measurable e.g. ethical imperative
- Latent trait, e.g. cognition
- Takes too long, e.g. survival in clinical
trialSurrogate outcomes .e.g. blood pressure - Measurement destroys object, e.g. light bulb
- Measurement alters response e.g. Hawthorne
- Other issues?
- May not be able to summarize with one number
(next slide)
13Statistician drowning in river of average depth
25 cm
14A. Working with variation
- Describing and classifying variation
- Selection
- Controlling variation
- Inducing variation
- Working with variation
- Dealing with missing data
151. Describing and classifying variation
- We tell stories of abnormality Air travel
horror stories, laptop disasters, - We sort into genres art, biology, literature
Concept of population Characteristics of
population and sample - Variation in time, space, social structures,
Waves on beach (non-stationarity) Hierarchy,
social class - We make inferences based on limited data And
often get the wrong population Basis for a
great deal of humor Switch in expectation
162. Selection in the face of variation
- Need to know selection mechanismRandom selection
as gold standard - Representativeness Kruskal and Mosteller
papers Slippery concept Large sample vs small
sample
172. Selection in the face of variation
- State question
- Define measurement(s)
- Define population of inference
- Specify selection mechanism
- Random selection is gold standard
- Random sample is representative sample
183. Controlling variation
- Clearest examples in sports Divisions,
junior, - Societal examplesMin, max speed
limitsOccupational (noise limits, flying
hours)Vergunningen, vergunningen, - Blocking in statistics
194. Inducing variation
- Antitrust laws Increase competition, i.e.
variability - Draft system in sports Teams more equal, P(win)
near 1/2 - Societal Admission to medical school in
Holland Representativeness (slippery concept)
Key to clinical trials
205. Working with variation
- Statisticians are expert at working with
variation - Example from Dorfman (1943)
- Situation Assay for syphillis in 1000s of army
recruits relatively few of whom had syphillis. - Pool n samples, if negative, stop. If positive
test all n samples.
20
21Dorfman sampling efficiency
Assume pool of n Probability subject positive
p X number of assays needed
21
22Dorfman sampling efficiency, Eff
Efficiency of pooling relative to no pooling by
size of pool (n) and prevalence of occurrence of
event (p)
22
236. Missing data
- Serious problem, obviously
- Spitfire example
- Impacts population of inference
- Anatomy of missingness
- Normal (e.g. pediatrician chart)
- Transcription error
- Just not there (Murphy was here)
- Deliberately missing (e.g. extended testing on
subset of patients)
24Another anatomy of missingness
- as we know,
- there are known knowns
- there are things we know we know.
- We also know there are known unknowns
- that is to say,
- we know there are some things
- we do not know.
- But there are also unknown unknowns
- the ones we dont know we dont know.
- Donald Rumsfeld
(set to music, see NPR website)
25Translation into modern statistics
- as we know,
- there are known knowns
- there are things we know we know.
- We also know there are known unknowns
- that is to say,
- we know there are some things
- we do not know.
- But there are also unknown unknowns
- the ones we dont know we dont know.
- Donald Rumsfeld
Non-missing MCAR/MAR Non-ignorable
26Now its your turn
27B. Working with causation
- Hard-wired to look for causes
- Aristotles four causes
- Usual state of nature
- Establishing cause-effect in science
- Observational and experimental data
281. Hard-wired to look for causes
- Yesterday, the building is shaking
- Peter Jennings story
- Accident reports
29A few accident reports
- 1. Coming home I drove into the wrong house and
collided with a tree I dont have. - 2. A truck backed through my windshield into my
wifes face. - 3. I saw a slow moving, sad-faced old gentle-man
as he bounced off the roof of my car. - 4. I had been driving for forty years when I
fell asleep at the wheel and had an accident.
30 2. Aristotles four causes
- Material cause (table made of wood)
- Formal cause (four legs and flat top make this
a table) - Efficient cause (carpenter makes a table)
- Final cause (surface for eating or writing
makes this a table) (From S.M. Cohen, U
Washington)
313. Usual state of nature
Explanations after accident Crime in search of
criminal Sickness in search of cause Childs
behavior and parent responsibility .
324. Establishing cause-effect in science
- Causation requires longitudinal data
- Randomized expts intrinsically so
- Cohort study closest non-experimental analogue
- Usually a higher standard than law(law more
likely than not, pgt0.50)(statistics
significance level, pgt0.95)
335. Observational vs experimental studies
- Characteristic Observational Experiment
- Ethical issues Fewer More
- Researcher control Less More
- Orientation Retrospective Prospective
- Selection bias Big problem Less
- Confounding Present Absent
- Realism More Less
- Causal plausibility Weaker Stronger
- Analysis More complicated Less
345. Causation and non-experimental data
- Selection bias
- Where did you get the data?
- Confounding
- What do you think the data are telling you?
35Causal assertion. Is it measurable?
36Basic Biostatistics Applied to NYT
- What is the question?
- Is it measurable?
- Where will you get the data?
- What do you think the data are telling you?
37Basics of StatisticsBasics of Life