Title: Lecture 2: Statistical Overview
1Lecture 2Statistical Overview
Child Psychiatry Research Methods Lecture Series
- Elizabeth Garrett
- esg_at_jhu.edu
2Two Types of Statistics
- Descriptive Statistics
- Uses sample statistics (e.g. mean, median,
standard deviation) to describe the sample and
the population from which it was drawn. - Not decision oriented
- Pilot studies are descriptive
- Statistical Inference
- Inference The act of passing from statistical
sample data to generalizations . usually with
calculated degrees of certainty. - Key elements
- sample
- generalizations
- certainty
- Often used for making decisions
- drug works or it doesnt
- ADHD is genetically inherited or it isnt
-
3Example 1 Viral Exposure and Autism (Deykin
and MacMahon, 1979)
- Hypothesis
- Direct exposure to or clinical illness with
measles, mumps, or chicken pox may play a causal
role in autism.
4Example 2Neurobiology of Attention in Fetal
Alcohol Syndrome(Lockhart, 2001?)
- Hypotheses
- (1) The neurobiological basis of problems in
response inhibition and motor impersistence in
children with FAS is related to abnormalities in
the anterior frontostriatal network. - (2) The neurobiological basis of problems in
orienting/shifting attention in children with FAS
is related to abnormalities in the posterior
parietal network.
54 Statistical Plan 4.1 Primary
outcome(s) 4.2 Statistical analysis 4.3
Sample size justification
64 Statistical Plan 4.1 Primary outcome(s)
Primary outcome variable not defined!
Common Problem
7Defining Primary Outcome Variables
- Continuous
- MRI volumes
- fMRI activation levels
- blood pressure
- response time
- number of voxels activated
- cost of hospital visit
- neurobehavioral test score
8- Categorical
- Nominal
- Binary (two categories)
- gene carrier status (as diagnosed by.)
- measles (as diagnosed by.)
- ADHD (as diagnosed by.)
- Polychotomous (more than two unordered
categories) - region of activation
- Ordinal
- severity score (see BPI)
- symptom rating
- on a scale of 1 to 5.
9- Example 1 Primary outcomes
- Disease history of
- measles
- mumps
- chicken pox
- Example 2 Primary outcomes
- MRI volumes of
- corpus collosum
- caudate
- cerebellar vermis
- parietal lobes
- frontal lobe
104 Statistical Plan 4.1 Primary outcomes
- Be clear about each variable and how it is
measured.
- NOT okay to say our primary outcome variable
is cognition.
- It IS okay to say our primary outcome variable
is cognition as measured by the WISC-III.
- Multiple outcomes are okay e.g. MRI volumes
and cognitive tests can both be primary outcomes.
114 Statistical Plan 4.1 Primary
outcome(s) 4.2 Statistical analysis
- How are you going to answer specific aims using
primary outcome variable?
12- Commonly seen statistical methods in analysis
plans - t-test
- confidence interval
- Chi-square test
- Fishers exact test
- linear regression
- logistic regression
- Wilcoxon rank sum test
- ANOVA
- GEE
13Key Idea Data Reduction
- Statistics is the art/science of summarizing a
large amount of information by just a few numbers
and/or statements. - Examples
- pvalue 0.01
- OR 5.0
- prevalence 0.20 ? 0.05
14Example 1
- Recall aim To compare measles history in
autistic versus non-autistic kids. - Methods
- Odds ratio Quantifies risk of disease in two
exposure groups - Confidence interval Answers What is reasonable
range for true odds ratio? - Fishers exact test Answers Is the risk the
same in the two exposure groups?
15Statistical Analysis
- We will measure the risk of autism associated
with measles using an odds ratio. Significance
will be assessed by Fishers exact test and a 95
confidence interval will be calculated.
16Example 2
- Recall aim To compare MRI volumes in FAS kids
and controls. - Methods
- Two-sample t-test Answers are the mean volumes
in the two groups different? - 95 confidence interval Answers what is the
estimated difference in volumes in the two
groups, approximately?
17Statistical Analysis
- To answer the specific aims, we will compare the
caudate volumes in the FAS group to those in the
control group using a two sample t-test. We will
also estimate a 95 confidence interval to
provide a reasonable range of the difference in
mean volumes in the two groups.
184 Statistical Plan 4.1 Primary
outcome(s) 4.2 Statistical analysis
- Data reduction is key How are you going to
combine information from all patients to answer
scientific question?
- Specific methods need to be designated.
- Study design often changes after statistical
issues are considered!
194 Statistical Plan 4.1 Primary
outcome(s) 4.2 Statistical analysis 4.3
Sample size justification
- Do you have enough subjects to answer the
question, but not too many so that you are
efficient (in terms of money and risks)?
20Power and Sample Size Considerations
- All about precision! (Recall Craig last time)
- Intuition
- the more individuals, the better your estimate
- the more individuals, the less variability in
your estimate - the more individuals, the more precise your
estimate - but, how precise need your estimate be?
- Example 1
- Odds ratio of measles for autism 3.7
- Interpretation Babies exposed to measles
prenatally or in early infancy are at 3.7 times
the risk for autism compared to children who are
unexposed. - Strong result?
21Three Theoretical Outcomes
22Actual Result from Study
95 Confidence interval (0.97, 14.2) Fishers
exact pvalue 0.12
23Magnitude versus Significance
- Magnitude of finding How big is the odds ratio?
- Statistical significance of the finding Is the
odds ratio different than 1? - Clinical significance of the finding Is the
size of the estimated odds ratio worth worrying
about? - Autism and Measles
- exposure to measles is rare
- need a lot of subjects to show significant
difference!
24Justifying sample size in a study design
- Hypothesis testing
- Ho OR1
- Ha OR3
- Which is a more reasonable conclusion?
- Issues
- type 1 error (?)
- type 2 error (?)
Ha
Ho
25Type I and II Errors
- Type I error (?)
- The probability that we reject Ho given that it
is true - The probability that we find an association
between measles and autism when, in truth, one
does not exist. - Type II error (?)
- The probability that we reject Ha given that it
is true - The probability that we find no association
between measles and autism when, in truth, one
does exist. - Note Power 1 - ?
26Sample size dictates overlap
Scenario 1
Small samples
Large samples
Scenario 2
27Decision Rule
- Before study is completed, you know what you need
to observe to find evidence for OR1 or OR3 - Scenario 1 If observed OR gt 3.6, then conclude
that there IS an association - Scenario 2 If observed OR gt 1.6, then conclude
that there IS an association.
28Type I Error alpha
Alpha usually predetermined 0.05
29Type II Error beta
Beta is figured out conditional on alpha.
? 0.60
If sample size is small, beta will be big
If sample size is big, beta will be small
? 0.02
30Power 1- beta
Power is 1 - beta.
Power 0.40
If sample size is small, power will be small
If sample size is large, power will be large
Power 0.98
31Power/Sample Size Estimate
- Kids with Autism N 608
- Kids without Autism N 1216
- Using Fishers exact test, we have 80 power
with alpha 0.05 to detect an odds ratio of 3 if
we enroll 608 children with autism and 1216
normal controls. This assumes that 3 of
autistic children have been exposed to measles
and 1 of the controls have been exposed.
32Sample Size Table(80 power, alpha 0.05)
33Example 2 FAS and controls
- How many FAS children and controls do we need to
detect a significant difference in MRI volumes? - From previous research we can estimate (i.e.
guess) - Volumes of cerebellar vermis in FAS kids are
approximately 400. - It would be interesting if FAS kids had volumes
10 or more less than normal controls (i.e. 400
versus 450).
34Sample size needed depends on overlap between FAS
and control kids.
control
FAS
control
FAS
35Two sample t-test
- Same general approach as the odds ratio
- Define ? difference in mean volumes
control mean - FAS mean - H0 ? 0
- Ha ? 50
- Same thing which hypothesis is more reasonable
based on our data? - Note Based on previous research, we can
estimate that the standard deviaion of volumes is
70.
36What if N 100 (50 per group)?
Alpha 0.05
Beta 0.06
37Power/Sample Size Options
- For power 80, alpha 0.05
- 32 FAS and 32 controls
- For power 90, alpha 0.05
- 43 FAS and 43 controls
- To achieve 80 power with a type I error of 5,
we require 32 FAS kids and 32 controls. This
will allow us to detect a 10 difference in mean
MRI volumes of cerebellar vermis (400 versus 450,
respectively) assuming standard deviations of 70
in each group.
384 Statistical Plan 4.1 Primary
outcome(s) 4.2 Statistical analysis 4.3
Sample size justification
-Explain justification in terms of statistics.
Saying we are confident that 10 subjects will
provide. is not sufficient.
39General Biostatistics References
- Practical Statistics for Medical Research.
Altman. Chapman and Hall, 1991. - Medical Statistics A Common Sense Approach.
Campbell and Machin. Wiley, 1993 - Principles of Biostatistics. Pagano and
Gauvreau. Duxbury Press, 1993. - Fundamentals of Biostatistics. Rosner. Duxbury
Press, 1993.