Title: Experimental Design
1Experimental Design
- Shoo K Lee, MBBS, FRCPC, FAAP, PhD
- Canada Research Chair
- Professor of Pediatrics, University of Alberta
- Scientific Director, iCARE
2Study Design
3Precision and accuracy
- PRECISION is a measure of how close an estimator
is expected to be to the true value of a
parameter. - ACCURACY is the measure of how close a measured
value is to the true or expected value.
4Standard deviation
- The standard deviation is a statistic that tells
you how tightly all the various examples are
clustered around the mean in a set of dataSD is
a measure of the spread of its values
5Standard error of the mean
- Standard Error of the Mean is the standard
deviation of the difference between the measured
or estimated values and the true values - SE used to provide measures of uncertainty, e.g.
to calculate Confidence Intervals
6Mean, Median, Mode
- The Mean of the data set is its average
- The Median is the number which is in the exact
middle of the data set - The Mode is the number that appears the most
often if you are working with only one variable
7Null Hypothesis
- Null hypothesis is a hypothesis set up to be
nullified or refuted in order to support an
alternative hypothesis. - Null hypothesis states that the results observed
in a study are no different from what might have
occurred as a result of the play of chance
8Errors
- Alpha (Type 1) error is rejecting the null
hypothesis when it is true - Beta (Type 2) error is failing to reject the null
hypothesis when it is false
9Power
- If H1 is true (that is, the distribution of X is
specified by H1), then P(X R), the probability
of rejecting H0 (and thus making a correct
decision), is known as the power of the test for
the distribution
10Confidence intervals
- A confidence interval gives an estimated range of
values which is likely to include an unknown
population parameter
11Probability Distribution
- The probability distribution of a discrete random
variable is a list of probabilities associated
with each of its possible values.
12Prevalence and incidence
- Prevalence - the measure of a condition in a
population at a given point in time - Incidence - the number of new occurrences of a
condition in a population over a period of time.
13Sensitivity and specificity
- Sensitivity refers to how good a test is at
correctly identifying people who have the disease
- Specificity refers to how good the test is at
correctly identifying people who do not have the
disease
14False positive and negative
- False positive is a result that is erroneously
positive when a situation is normal (Type 1
error). - False negative is a result that shows no evidence
of the disease although the disease is actually
present (Type 2 error).
15Positive and Negative predictive
- Positive predictive value - how often a patient
with a positive test has the disease - Negative predictive value - how often a patient
with a negative test does not have the disease
16Disease
Test
17Calculation of Sensitivity, Specificity, PPV, NPV
SENSITIVITY __TP___ X 100 TP
FN SPECIFICITY __TN___ X 100 FP
TN POSITIVE PREDICTIVE VALUE __TP___ X 100
TP FP NEGATIVE PREDICTIVE VALUE __TN___ X
100 FN TN
18Stratification
- Data collected about a problem may represent
multiple sources that need to be treated
separately - Stratification is a technique to separate the
data so that patterns can be seen.
19Types of Variables
- Categorical (or nominal) - one that is given by
list of categories or classes, e.g. eye color - Ordinal one that orders (or ranks) data in
terms of degree, e.g. score - Continuous - A quantitative variable with an
infinite number of attributes, e.g. length
20Parametric and Non-parametric Tests
- Parametric Test - A statistical test in which
assumptions are made about the underlying
distribution of observed data - Non-Parametric tests are used in place of their
parametric counterparts when certain assumptions
about the underlying population are questionable.
21- PARAMETRIC
- Chi-square
- Fischers exact test
- Students t-test
- ANOVA
- Logistic regression
- NON-PARAMETRIC
- Sign rank test
- Rank sum test
22Parametric Tests
- Chi-square test is used to examine differences
with categorical variables - The Fischer's exact test should be used when the
frequency is lt5 in any part of the contingency
table - Students t-test assesses whether the means of
two groups are statistically different from each
other - ANOVA tests the hypothesis that the means among
two or more groups are equal - Logistic regression Logistic regression describes
the relationship between a dichotomous response
variable and a set of explanatory variables
23Non-parametric Tests
- Wilcoxon Matched-Pairs Signed Ranks Test is used
to determine differences between groups of paired
data when the data do not meet the rigor
associated with a parametric test (t-test) - Wilcoxon Rank Sum Test can be used to test the
null hypothesis that two populations X and Y have
the same continuous distribution
24Choosing a Test
25Paired or Matched Observation
26Odds ratio
- The odds ratio is a way of comparing whether the
probability of a certain event is the same for
two groups - OR1, the event is equally likely in both groups.
ORgt1, the event is more likely in the first
group. ORlt1, the event is less likely in the
first group