Basic Statistics in Public Health - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

Basic Statistics in Public Health

Description:

Basic Statistics in Public Health – PowerPoint PPT presentation

Number of Views:16

Avg rating:3.0/5.0

Slides: 32

Provided by: swlondonpu

Category:

more less

Transcript and Presenter's Notes

Title: Basic Statistics in Public Health

1
Basic Statistics in Public Health
2

Types of data
Descriptive statistics
Confidence Intervals

3
TYPES OF DATA
Age?
Sex?
Social class?
4
TYPES OF DATA
BMI?
Obese or not?
Underweight / normal / overweight / obese?
5
Type of data?
Source Compendium of Clinical and Health
Indicators / Health Surveys for England
6
SUMMARISINGDATA
7
Summarising Numerical Data

Measures of central tendency / location-
Mean
Median
Mode

Measures of spread / variability-
Range
Interquartile range
Variance
Standard deviation

Mean? Median?
3, 4, 5, 6, 7
9, 10, 20, 21
1, 2, 3, 4, 990

9
Mean or Median?

For symmetric data, meanmedian
Choose the mean (easily understood, better
statistical properties)

For skewed data, mean is drawn towards the tail
of distribution
Median can be better reflection of centre of data

10
Positively skewed data ...
Median2
11
Negatively skewed data ...
Median40
12
And the Mode?

Mode most frequently occurring value
Hardly ever used
Depends on how the data are grouped, and not
always unique

13
The Range

Range maximum - minimum
In practice, usually present as (min, max)

Poor measure of spread-
very dependent on sample size
affected by outliers
but people often want to know it! present as an
extra

14
Interquartile Range (IQR)

Rank data from smallest to largest
Lower Quartile has 1/4 of values smaller than it
Upper Quartile has 1/4 of values larger than it
Interquartile Range Upper Quartile - Lower
Quartile
But usually written (Lower Quartile, Upper
Quartile)

Better than range - not influenced by outliers or
sample size

15
Interquartile Range (IQR) - Note

Note - quartiles are, strictly, observations
There are 3 quartiles -
lower quartile
?
upper quartile

But the word Quartiles is now often used to
mean Quarters - i.e. the 4 groups of ranked
observations
Similarly Quintiles is often used to mean
Fifths i.e. the 5 groups of ranked
observations

16
Variance

Step 1 Calculate Deviations the difference
between each observation and the mean of the data
Step 2 Square these Deviations
Step 3 Average the Squared Deviations
this is the Variance
(Strictly, divide by n-1, not n)

17
Standard Deviation (SD)

Step 4 Take the square root of the Variance
this is the Standard Deviation

This returns the statistic to the same units as
the data

Both Variance and Standard deviation use all of
the data
But as a result, can be over-influenced by
outliers

18
Symmetric data ...

Summarise using Mean and Standard Deviation

Skewed data ...
Summarise using Median and Interquartile Range
19
Useful Fact
If a dataset is Normally distributed, or at least
fairly symmetrical, then the central 95 of the
data will be included in the range Mean /- 2
Standard Deviations Sometimes called a
reference range or normal range Strictly
Mean /- 1.96 Standard deviations
20
Central 95 of data
2.5
2.5
mean 2SD mean
mean 2SD
21
CONFIDENCEINTERVALS
22
Obesity data, England, 2006

Age-standardised percentage obese 24.1
95 Confidence Interval 23.2 to 25.0

?
23
Sample estimates of Population values

The obesity data was based on a sample
But has this sample given the right answer?
First need to eliminate bias, e.g. take a random
sample
But even when samples are unbiased, different
samples will still give different answers - this
is known as sampling error or random variation

24
Would like to know, How imprecise might the
sample estimate be, just as a result of sampling
variation? i.e. How far away might the sample
estimate be from the true population value?

Depends on
Sample size
Variability of data (SD)

25
A 95 confidence interval provides a measure of
the precision of a sample estimate-
There is a 95 probability that the true
population value lies within the 95 confidence
interval.
Narrow 95 CI precise estimate Wide 95
CI imprecise estimate
26

Age-standardised percentage obese 24.1
95 Confidence Interval 23.2 to 25.0

We are 95 confident that the true
age-standardised percentage obese for England,
2006, is somewhere between 23.2 and 25.0.
27
FOR DISCUSSION

Why do we calculate Confidence Intervals when our
estimates are based on total population data,
e.g. SMRs for cancer?

28
Presenting 95 Confidence Intervals on graphs
Self-reported smoking status in women (), by
ethnic group with 95 confidence intervals
(England, 2004)
29
Interpreting 95 Confidence Intervals from graphs

What can you say about the true smoking
prevalence for the general population?
For which ethnic groups is the prevalence of
smoking significantly different from 25?
Is the prevalence of smoking significantly
different between the Black Caribbean and Black
African populations?
Is the prevalence of smoking significantly
different between the Pakistani and Bangladeshi
populations?

30
Note In general, it is better to perform a
statistical significance test, than look for
overlapping or non-overlapping confidence
intervals!
31
Food for thought