Title: Medical Statistics as a science
1Medical Statisticsas a science
2Why Do Statistics?
- Extrapolate from data collected to make general
conclusions about larger population from which
data sample was derived - Allows general conclusions to be made from
limited amounts of data - To do this we must assume that all data is
randomly sampled from an infinitely large
population, then analyse this sample and use
results to make inferences about the population
3Statistical Analysisin a Simple Experiment
- Define population of interest
- Randomly select sample of subjects to
study(clinical trials do not enrol a randomly
selected sample of patients due to
inclusion/exclusion criteria but define a precise
patient population) - Half the subjects receive one treatment and the
other half another treatment (usually placebo) - Measure baseline variables in each group(e.g.
age, Apache II to ensure randomisation
successful) - Measure trial outcome variables in each group
(e.g. mortality) - Use statistical techniques to make inferences
about the distribution of the variables in the
general population and about the effect of the
treatment
4Data
- Categorical data ? values belong to categories
- Nominal data there is no natural order to the
categoriese.g. blood groups - Ordinal data there is natural order e.g. Adverse
Events (Mild/Moderate/Severe/Life Threatening) - Binary data there are only two possible
categoriese.g. alive/dead - Numerical data ? the value is a number(either
measured or counted) - Continuous data measurement is on a
continuume.g. height, age, haemoglobin - Discrete data a count of events e.g. number of
pregnancies
5- Descriptive Statistics
- concerned with summarising or describing a
sample eg. mean, median - Inferential Statistics
- concerned with generalising from a sample, to
make estimates and inferences about a wider
population eg. T-Test, Chi Square test
6Statistical Terms
- Mean ? the average of the data ? sensitive
to outlying data - Median ? the middle of the data ? not
sensitive to outlying data - Mode ? most commonly occurring value
- Range ? the spread of the data
- IQ range ? the spread of the data
? commonly used for skewed data - Standard deviation ? a single number which
measures how much the observations vary
around the mean - Symmetrical data ? data that follows normal
distribution ? (meanmedianmode)
? report mean standard deviation n - Skewed data ? not normally distributed
? (mean?median ?mode)
? report median IQ Range
7Standard Normal Distribution
8Standard Normal Distribution
Mean /- 1 SD ? encompasses 68 of
observations Mean /- 2 SD ? encompasses 95 of
observations Mean /- 3SD ? encompasses 99.7 of
observations
9Steps in Statistical Testing
- Null hypothesisHo there is no difference
between the groups - Alternative hypothesisH1 there is a difference
between the groups - Collect data
- Perform test statistic eg T test, Chi square
- Interpret P value and confidence intervals
- P value ? 0.05 Reject Ho
- P value gt 0.05 Accept Ho
- Draw conclusions
10Meaning of P
- P Value the probability of observing a result as
extreme or more extreme than the one actually
observed from chance alone - Lets us decide whether to reject or accept the
null hypothesis - P gt 0.05 Not significant
- P 0.01 to 0.05 Significant
- P 0.001 to 0.01 Very significant
- P lt 0.001 Extremely significant
11T Test
- T test checks whether two samples are likely to
have come from the same or different populations - Used on continuous variables
- Example Age of patients in the APC study
(APC/placebo) - PLACEBO APC mean age 60.6 years mean
age 60.5 years - SD/- 16.5 SD /- 17.2
- n 840 n 850
- 95 CI 59.5-61.7 95 CI 59.3-61.7
- What is the P value?
- 0.01
- 0.05
- 0.10
- 0.90
- 0.99
- P 0.903 ? not significant ? patients from the
same population(groups designed to be matched
by randomisation so no surprise!!)
12T Test SAFE Serum Albumin
- PLACEBO ALBUMIN
- n 3500 3500
- mean 28 30
- SD 10 10
- 95 CI 27.7-28.3 29.7-30.3
- Q Are these albumin levels different?Ho
Levels are the same (any difference is there by
chance)H1 Levels are too different to have
occurred purely by chance - Statistical test T test ? P lt 0.0001 (extremely
significant)Reject null hypothesis (Ho) and
accept alternate hypothesis (H1) ie. 1 in 10 000
chance that these samples are both from the same
overall group therefore we can say they are very
likely to be different
13Effect of Sample Size Reduction
PLACEBO ALBUMIN n 350
350 mean 28 30 SD 10 10 95
CI 27.0-29.0 29.0-31.0
- smaller sample size (one tenth smaller)
- causes wider CI (less confident where mean is)
- P 0.008 (i.e. approx 0.01 ? P is significant
but less so) - This sample size influence on ability to find any
particular difference as statistically
significant is a major consideration in study
design
14Reducing Sample Size (again)
- PLACEBO ALBUMINn 35 35
- mean 28 30
- SD 10 10
- 95 CI 24.6-31.4 26.6-33.4
- using even smaller sample size (now 1/100)
- much wider confidence intervals
- p0.41 (not significant anymore)
- ? SMALLER STUDY has LOWER POWER to find any
particular difference to be statistically
significant (mean and SD unchanged) - POWER the ability of a study to detect an actual
effect or difference
15Chi Square Test
- Proportions or frequencies
- Binary data e.g. alive/dead
- PROWESS Study Primary endpoint 28 day all cause
mortality
ALIVE DEAD TOTAL
DEAD PLACEBO 581 (69.2)
259 (30.8) 840 (100)
30.8 DEAD 640 (75.3) 210
(24.7) 850 (100) 24.7 TOTAL
1221 (72.2) 469 (27.8) 1690 (100)
- Perform Chi Square test ? P 0.006 (very
significant) - 6 in 1000 times this result could happen by
chance ? 994 in 1000 times this difference
was not by chance variation
16Reducing Sample Size
- Same results but using much smaller sample size
(one tenth) -
- ALIVE DEAD TOTAL
DEAD - PLACEBO 58 (69.2) 26
(30.8) 84 (100) 30.8 - DEAD 64 (75.3) 21 (24.7)
85 (100) 24.7 - TOTAL 122 (72.2) 47 (27.8)
169 (100)
- Perform Chi Square test ? P 0.39 39 in
100 times this difference in mortality could
have happened by chance therefore results
not significant - Again, power of a study to find a difference
depends a lot on sample size for binary data
as well as continuous data
17Summary
- Size mattersBIGGER IS BETTER
- Spread mattersSMALLER IS BETTER
- Bigger differenceEASIER TO FIND
- Smaller differenceMORE DIFFICULT TO FIND
- To find a small difference you need a big study