Title: Reducing Bias in Ecological Studies
1Reducing Bias in Ecological Studies
- S. Lane1, G.A. Lancaster1 and M. Green2
- University of Liverpool
- University of Lancaster
2Introduction
- Many studies in health evaluation focus on
ecological models. - Used to examine the relationship between
socio-economic risk factors - and ill-health in pre-defined geographical
regions. - Models are based on variables measured at the
aggregate level. - Aggregate data provide information on groups of
individuals. - Used when individual level data are not
available. - Advantages - data readily available from
population Censuses.
3Ecological Fallacy
- Disadvantages When ecological models are used
to make inferences - about individuals living within a district,
inferences may be incorrect. - Bias known as the ecological fallacy (Robinson
1950). - Aggregated data may misconstrue true underlying
relationships between - covariates and ill health.
- For example, when a small subgroup of the
population is responsible for a - large proportion of ill health.
- Outcome - inflated model parameters or
parameters that suggest - relationships that are counter intuitive to
known relationships.
4Population Census Data
- Available at both aggregate and individual
levels. - Aggregate data Small Area Statistics (SAS).
- SAS provides aggregate data for geographically
defined groups - e.g. Local Authority District, Electoral
Ward. - Individual Level Data Sample of Anonymised
Records (SAR). - SAR 2 randomised sample of the SAS data,
contains detailed - information on individuals.
5Study Aim
- To evaluate alternative statistical methods of
ecological analysis - which attempt to reduce the ecological
fallacy effect. - To determine in which circumstances each
method might be practicable.
6Method
- Information on Limiting Long Term Illness
(LLTI) available from 1991 - Population Census.
- Investigate relationship between socio-economic
risk factors and LLTI in - the North of England.
- Geographical area Local Authority District.
- SAS - Covered 92 Local Authority Districts (6
million individuals). - SAR Some districts combined to maintain
confidentiality - 64 Local - Authority Districts (100,000 individuals).
- Five different statistical modelling
methodologies have been investigated in - terms of their effectiveness at reducing
ecological bias.
7Statistical Modelling Techniques
- Individual Level Analysis.
- Standard Ecological Analysis.
- 3. Modified Ecological Analysis.
- 4. Aggregated Individual Level Model.
- 5. Aggregated Compound Multinomial Model.
-
8Socio-Economic Covariates
Table 1 Covariates included in the model
91. Individual Level Analysis
- Two models investigated using individual level
SAR data. - Fixed Effects Model Individuals within the
same age-gender - groupings with the same socio-economic
profile will have same risk - of developing a LLTI irrespective of the
district that they live in. - Random Effects model allows the illness rate to
vary across districts.
10Random Effects Binomial Model
is the probability that individual i in district
k develops a LLTI.
is the district level intercept term
m0k is the district level random variation xjik
is the value of the jth covariate for
individual i in district k. bj is the
parameter to be estimated for the jth covariate.
11Estimated Parameters (Standard Error)
Table 2 Individual Level Models
12Relationships Identified by the Models
- Risk of developing a LLTI increases with age.
- Gender similar risk except in age group 55 to
64. - Non-whites slightly higher risk than whites.
Figure 1 Risk of LLTI in age group 55 to 64 by
gender and ethnicity
Risk of LLTI
Risk of LLTI
Risk of LLTI
Risk of LLTI
Risk of LLTI
Risk of LLTI
Risk of LLTI
Risk of LLTI
5
5
5
5
5
5
5
5
White Females
White Females
White Females
White Females
White Females
White Females
White Females
White Females
10
10
10
10
10
10
10
10
Non
-
white Females
Non
-
white Females
Non
-
white Females
Non
-
white Females
Non
-
white Females
Non
-
white Females
Non
-
white Females
Non
-
white Females
White Males
White Males
White Males
White Males
White Males
White Males
White Males
White Males
Non
-
white males
-
Non
-
white males
Non
-
white males
-
Non
-
white males
Non
white Males
Non
-
white Males
Non
white Males
Non
-
white Males
13Relationships Identified by the Models
- Home owners are at less risk of developing LLTI
than individuals who rent - accommodation.
- Decreased risk with car ownership.
- Increased risk for individuals in social
classes III manual, IV and V. - Unemployed at higher risk than employed,
inactive at highest risk. - Unqualified at higher risk than qualified.
14Effects of the Random Intercept
- District level variation is small 0.016
(0.004) result of large - geographical areas.
- Intercept terms vary from 3.38 to 3.94 (2
to 3 in terms of the - estimated risk of developing a LLTI for the
base category).
Figure 2 District level intercept terms
152. Standard Ecological Analysis
- Ideal situation would be an individual level
model but covariate - information for health outcomes not always
available. - Census - health outcome data limited to LLTI.
- In most situations we want to make inferences
about other specific - illnesses e.g. deprivation and cancer.
- Cancer counts from cancer registry no
information on socio-economic - variables except age and gender groupings.
- Need to used aggregate Census data in an
ecological Study
16Standard Ecological Analysis - Data
- Model uses aggregate SAS data.
- 92 Local Authority Districts.
- Dependent variable number of individuals in
each district with LLTI, - 1 measurement per district.
-
- Age and gender effects not included directly
part of offset term - Covariates included as proportions, e.g.
proportion of unemployed - in each district.
- Covariates categorical not binary
(non-standard). -
- Offset Log of expected frequency of LLTI for
each district.
17Poisson Model
- The model used for the standard ecological
analysis is a Poisson - model of the form,
is the expected frequency of developing LLTI in
district k.
xjk is the value of the jth covariate in
district k bj is the parameter to be estimated
for the jth covariate. ek is the offset term
calculated from indirect standardisation
18Estimated Parameters (Standard Error)
Table 3 Standard Ecological Model
19Relationships Identified by the Model
- LLTI increases in districts with a higher
proportion of non-white. -
- LLTI decreases in districts with a higher
proportion of rented accommodation. - LLTI increases in districts with a higher
proportion of non car owners, and in districts
with a higher proportion of two car owners - LLTI increases in districts with a higher
proportion of individuals in social class
III manual, districts with high proportion of
individuals insocial classes III non- manual, and
districts with higher proportions of IV and V. - LLTI decreases in districts with a higher
proportion of unemployed and increases in
districts with higher proportion of inactive
people.
20The Ecological Fallacy Effect
Figure 3 Identified relationships between
unemployment and LLTI
21Model Parameters
- The parameter estimates highlight the ecological
fallacy at its most - extreme.
- Relationships are counter intuitive to what
would be expected - from the individual level analysis.
-
223. Modified Ecological Analysis (i)
- Proposed by Lancaster and Green (2002)
-
- Dependent variable expanded by age and sex.
- Six observations per district (552 in total).
- Age and gender terms included as binary
indicator variables gt allows between district
age/sex interaction terms to be fitted. - Covariates only 1 unique value per district.
-
Â
23Expanded data set
Table 4 Example of data set
24Binomial Model
- Poisson model appropriate when illness rates
are small. - When SAS data expanded age and gender groups 55
to 64 have high - rates of LLTI (Males 32 and Females 25.)
- Binomial model more appropriate.
where ei is the expected frequency for
observation i and ni is the number of individuals
included in observation i
25Estimated Parameters (Standard Error)
Table 5 Modified Ecological Model (i)
26Relationships Identified by the Models
- Relationships identical to those identified by
the standard ecological model. - Demonstrates that offset is effective in
correcting for age/gender differences - in illness rates between districts
- No improvement on standard ecological model in
reducing ecological bias - in this example, but could try more complex
interactions between age/sex - and socio-economic variables.
27Electoral Ward Level
- The modified ecological analysis model (i) does
not improve upon the - standard ecological model when studying
main effects. -
- May be due to large geographical area at Local
Authority District level - alternative Electoral Ward level analysis.
- Aggregated data expanded by age and gender
2144 wards 6 - observations per ward each covariate value
(proportion) repeated - 6 times per ward - total 12864 observations
in data set.
28Parameter Estimates (Standard Error)
Table 6 Electoral Ward Level Model
29Relationships Identified by the Models
- Significant improvement on Local Authority
District level models. - Risk of developing LLTI decreases in wards with
high proportion of - car owners and increases in wards with high
proportion of individuals - who do not own a car.
- Risk increases in wards with high proportion of
unemployed. - Ecological bias still remains LLTI decreases
in wards with high proportion - of individuals in social classes IV and V.
303. Modified Ecological Analysis (ii)
- Incorporates individual SAR level data into
analysis. - SAR data aggregated over district and by age
and gender - (384 observations).
- Unique covariate proportion for each age-gender
category - (6 per district).
- Binomial Model no offset term included in
model.
31Expanded data set
Table 7 Example of data set
Â
32Estimated Parameters (Standard Error)
Table 8 Modified Ecological Model (ii)
Â
33Relationships Identified by the Models
- Three significant improvements have been
identified. - Increased risk of developing LLTI in districts
with high proportion of - individuals in lower social classes. i.e III
manual, IV and V. - Risk of developing LLTI increases in districts
with high proportion of - unemployed.
- Risk of LLTI decreases with car ownership.
- Some ecological bias still remains housing
tenure, qualifications
34Summary of Results
- Modified Ecological Model (i) - Electoral Ward
level analysis improves - upon Local Authority District analysis.
- gt implies that method may not work well for
large geographical areas - gt may improve with age/sex interaction
terms but then more complex - to interpret results
- Modified Ecological Model (ii) - incorporating
individual level SAR - data into aggregate model improves upon
Modified Ecological Model (i) - in reducing ecological bias.
354. Aggregated Individual Level Model
- Proposed by Prentice and Sheppard (1995).
- Appealing model as combines aggregate level
illness rates with - individual level covariate information
- Model constructed by aggregating individual
level relative rate models - over each Local Authority District.
- The observed mean illness rate for each
district is then regressed on the - mean relative rate model for each district.
36Relative Rate Model
is the probability that the ith individual in
district k develops a LLTI
nk is the number of individuals in district
k xjik is value of covariate j for individual i
in district k. bj is the parameter to be
estimated for jth covariate.
37Estimation of the Relative Rate Parameters
- The model parameters b are the solution to the
score equations.
is the mean observed illness rate for district
k .
is the mean expected rate
Vk is a working variance for
38Convergence Problems
- Score equations are solved iteratively using
the Newton-Raphson procedure. - Did not converge for some combinations of
covariates. - One parameter had large negative value tending
to - Caused row of zeros in matrix D consequently
could not be inverted and iteration procedure
crashed.
39Results Obtained
- Data set restricted to a single age and gender
category males aged 30 to 44. - Compared with equivalent Poisson/Binomial model
for Modified Ecological Regression (ii).
Table 9 Aggregated individual level model.
40Relationships Identified by the Models
- All models identify a negative relationship
between no car ownership - and LLTI. i.e. reduced risk of developing
LLTI. - Results inconclusive unstable algorithm.
- Model is still constructed at aggregate level -
may still be sensitive to - ecological bias.
415. Aggregated Compound Multinomial Model
- Proposed by Brown and Payne (1986), Forcina
and Marchetti (1989) - Method to find internal cell probabilities for
a r x s contingency table.
Table 10 Contingency table for district k
42Aggregated Compound Multinomial Model
- In Table 10 the X and Y marginal totals are
known and the transitional probabilities (pij )
are to be estimated . - District level covariates (z) can be
incorporated to model log odds ratios -
43Results Obtained
- Example Age and illness for males, no
covariates, i.e. assumes - no variation across districts.
- Internal probabilities severely underestimate
the observed illness - rates.
Table 11 Aggregated Compound Multinomial Model
44Conclusions
- Individual Level Model gives the best results
in terms of identifying - the perceived correct relationships
between the covariates and LLTI. - Standard Ecological Model performs poorly at
the Local Authority - District level ecological bias.
- Modified Ecological Model (i) at electoral
ward level is an improvement on - Local Authority District level model when
using aggregate SAS data. - Modified Ecological Model (ii) incorporating
individual level SAR - data into the aggregate model is most
effective in reducing ecological bias. - Aggregated Individual Level Model may assist
in reducing the - ecological fallacy if the convergence
problems can be overcome. - Aggregated Compound Multinomial Model is
complex algorithm to use - and underestimates illness rates in our
example.