Title: Basic Statistics in Public Health
1Basic Statistics in Public Health
2- Types of data
- Descriptive statistics
- Confidence Intervals
3TYPES OF DATA
Age?
Sex?
Social class?
4TYPES OF DATA
BMI?
Obese or not?
Underweight / normal / overweight / obese?
5Type of data?
Source Compendium of Clinical and Health
Indicators / Health Surveys for England
6SUMMARISINGDATA
7Summarising Numerical Data
- Measures of central tendency / location-
- Mean
- Median
- Mode
- Measures of spread / variability-
- Range
- Interquartile range
- Variance
- Standard deviation
8- Mean? Median?
- 3, 4, 5, 6, 7
- 9, 10, 20, 21
- 1, 2, 3, 4, 990
9Mean or Median?
- For symmetric data, meanmedian
- Choose the mean (easily understood, better
statistical properties)
- For skewed data, mean is drawn towards the tail
of distribution - Median can be better reflection of centre of data
10Positively skewed data ...
Median2
11Negatively skewed data ...
Median40
12And the Mode?
- Mode most frequently occurring value
- Hardly ever used
- Depends on how the data are grouped, and not
always unique
13The Range
- Range maximum - minimum
- In practice, usually present as (min, max)
- Poor measure of spread-
- very dependent on sample size
- affected by outliers
- but people often want to know it! present as an
extra
14Interquartile Range (IQR)
- Rank data from smallest to largest
- Lower Quartile has 1/4 of values smaller than it
- Upper Quartile has 1/4 of values larger than it
- Interquartile Range Upper Quartile - Lower
Quartile - But usually written (Lower Quartile, Upper
Quartile)
- Better than range - not influenced by outliers or
sample size
15Interquartile Range (IQR) - Note
- Note - quartiles are, strictly, observations
- There are 3 quartiles -
- lower quartile
- ?
- upper quartile
- But the word Quartiles is now often used to
mean Quarters - i.e. the 4 groups of ranked
observations - Similarly Quintiles is often used to mean
Fifths i.e. the 5 groups of ranked
observations
16Variance
- Step 1 Calculate Deviations the difference
between each observation and the mean of the data - Step 2 Square these Deviations
- Step 3 Average the Squared Deviations
- this is the Variance
- (Strictly, divide by n-1, not n)
17Standard Deviation (SD)
- Step 4 Take the square root of the Variance
- this is the Standard Deviation
This returns the statistic to the same units as
the data
- Both Variance and Standard deviation use all of
the data - But as a result, can be over-influenced by
outliers
18Symmetric data ...
- Summarise using Mean and Standard Deviation
Skewed data ...
Summarise using Median and Interquartile Range
19Useful Fact
If a dataset is Normally distributed, or at least
fairly symmetrical, then the central 95 of the
data will be included in the range Mean /- 2
Standard Deviations Sometimes called a
reference range or normal range Strictly
Mean /- 1.96 Standard deviations
20Central 95 of data
2.5
2.5
mean 2SD mean
mean 2SD
21CONFIDENCEINTERVALS
22Obesity data, England, 2006
- Age-standardised percentage obese 24.1
- 95 Confidence Interval 23.2 to 25.0
?
23Sample estimates of Population values
- The obesity data was based on a sample
- But has this sample given the right answer?
- First need to eliminate bias, e.g. take a random
sample - But even when samples are unbiased, different
samples will still give different answers - this
is known as sampling error or random variation
24Would like to know, How imprecise might the
sample estimate be, just as a result of sampling
variation? i.e. How far away might the sample
estimate be from the true population value?
- Depends on
- Sample size
- Variability of data (SD)
25A 95 confidence interval provides a measure of
the precision of a sample estimate-
There is a 95 probability that the true
population value lies within the 95 confidence
interval.
Narrow 95 CI precise estimate Wide 95
CI imprecise estimate
26- Age-standardised percentage obese 24.1
- 95 Confidence Interval 23.2 to 25.0
We are 95 confident that the true
age-standardised percentage obese for England,
2006, is somewhere between 23.2 and 25.0.
27 FOR DISCUSSION
- Why do we calculate Confidence Intervals when our
estimates are based on total population data,
e.g. SMRs for cancer?
28Presenting 95 Confidence Intervals on graphs
Self-reported smoking status in women (), by
ethnic group with 95 confidence intervals
(England, 2004)
29Interpreting 95 Confidence Intervals from graphs
- What can you say about the true smoking
prevalence for the general population? - For which ethnic groups is the prevalence of
smoking significantly different from 25? - Is the prevalence of smoking significantly
different between the Black Caribbean and Black
African populations? - Is the prevalence of smoking significantly
different between the Pakistani and Bangladeshi
populations?
30Note In general, it is better to perform a
statistical significance test, than look for
overlapping or non-overlapping confidence
intervals!
31 Food for thought
- What is the difference between a 95 confidence
interval and a 95 reference range?