Reducing Bias in Ecological Studies - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Reducing Bias in Ecological Studies

Description:

Car Ownership. One car. Two or more cars. No car. Social Class. I and II. III non-manual ... Decreased risk with car ownership. ... – PowerPoint PPT presentation

Number of Views:85
Avg rating:3.0/5.0
Slides: 45
Provided by: sla62
Category:

less

Transcript and Presenter's Notes

Title: Reducing Bias in Ecological Studies


1
Reducing Bias in Ecological Studies
  • S. Lane1, G.A. Lancaster1 and M. Green2
  • University of Liverpool
  • University of Lancaster

2
Introduction
  • Many studies in health evaluation focus on
    ecological models.
  • Used to examine the relationship between
    socio-economic risk factors
  • and ill-health in pre-defined geographical
    regions.
  • Models are based on variables measured at the
    aggregate level.
  • Aggregate data provide information on groups of
    individuals.
  • Used when individual level data are not
    available.
  • Advantages - data readily available from
    population Censuses.

3
Ecological Fallacy
  • Disadvantages When ecological models are used
    to make inferences
  • about individuals living within a district,
    inferences may be incorrect.
  • Bias known as the ecological fallacy (Robinson
    1950).
  • Aggregated data may misconstrue true underlying
    relationships between
  • covariates and ill health.
  • For example, when a small subgroup of the
    population is responsible for a
  • large proportion of ill health.
  • Outcome - inflated model parameters or
    parameters that suggest
  • relationships that are counter intuitive to
    known relationships.

4
Population Census Data
  • Available at both aggregate and individual
    levels.
  • Aggregate data Small Area Statistics (SAS).
  • SAS provides aggregate data for geographically
    defined groups
  • e.g. Local Authority District, Electoral
    Ward.
  • Individual Level Data Sample of Anonymised
    Records (SAR).
  • SAR 2 randomised sample of the SAS data,
    contains detailed
  • information on individuals.

5
Study Aim
  • To evaluate alternative statistical methods of
    ecological analysis
  • which attempt to reduce the ecological
    fallacy effect.
  • To determine in which circumstances each
    method might be practicable.

6
Method
  • Information on Limiting Long Term Illness
    (LLTI) available from 1991
  • Population Census.
  • Investigate relationship between socio-economic
    risk factors and LLTI in
  • the North of England.
  • Geographical area Local Authority District.
  • SAS - Covered 92 Local Authority Districts (6
    million individuals).
  • SAR Some districts combined to maintain
    confidentiality - 64 Local
  • Authority Districts (100,000 individuals).
  • Five different statistical modelling
    methodologies have been investigated in
  • terms of their effectiveness at reducing
    ecological bias.

7
Statistical Modelling Techniques
  • Individual Level Analysis.
  • Standard Ecological Analysis.
  • 3. Modified Ecological Analysis.
  • 4. Aggregated Individual Level Model.
  • 5. Aggregated Compound Multinomial Model.

8
Socio-Economic Covariates
Table 1 Covariates included in the model
9
1. Individual Level Analysis
  • Two models investigated using individual level
    SAR data.
  • Fixed Effects Model Individuals within the
    same age-gender
  • groupings with the same socio-economic
    profile will have same risk
  • of developing a LLTI irrespective of the
    district that they live in.
  • Random Effects model allows the illness rate to
    vary across districts.

10
Random Effects Binomial Model
is the probability that individual i in district
k develops a LLTI.
is the district level intercept term
m0k is the district level random variation xjik
is the value of the jth covariate for
individual i in district k. bj is the
parameter to be estimated for the jth covariate.
11
Estimated Parameters (Standard Error)
Table 2 Individual Level Models
12
Relationships Identified by the Models
  • Risk of developing a LLTI increases with age.
  • Gender similar risk except in age group 55 to
    64.
  • Non-whites slightly higher risk than whites.

Figure 1 Risk of LLTI in age group 55 to 64 by
gender and ethnicity
Risk of LLTI
Risk of LLTI
Risk of LLTI
Risk of LLTI
Risk of LLTI
Risk of LLTI
Risk of LLTI
Risk of LLTI
5
5
5
5
5
5
5
5
White Females
White Females
White Females
White Females
White Females
White Females
White Females
White Females
10
10
10
10
10
10
10
10
Non
-
white Females
Non
-
white Females
Non
-
white Females
Non
-
white Females
Non
-
white Females
Non
-
white Females
Non
-
white Females
Non
-
white Females
White Males
White Males
White Males
White Males
White Males
White Males
White Males
White Males
Non
-
white males
-
Non
-
white males
Non
-
white males
-
Non
-
white males
Non
white Males
Non
-
white Males
Non
white Males
Non
-
white Males

13
Relationships Identified by the Models
  • Home owners are at less risk of developing LLTI
    than individuals who rent
  • accommodation.
  • Decreased risk with car ownership.
  • Increased risk for individuals in social
    classes III manual, IV and V.
  • Unemployed at higher risk than employed,
    inactive at highest risk.
  • Unqualified at higher risk than qualified.

14
Effects of the Random Intercept
  • District level variation is small 0.016
    (0.004) result of large
  • geographical areas.
  • Intercept terms vary from 3.38 to 3.94 (2
    to 3 in terms of the
  • estimated risk of developing a LLTI for the
    base category).

Figure 2 District level intercept terms
15
2. Standard Ecological Analysis
  • Ideal situation would be an individual level
    model but covariate
  • information for health outcomes not always
    available.
  • Census - health outcome data limited to LLTI.
  • In most situations we want to make inferences
    about other specific
  • illnesses e.g. deprivation and cancer.
  • Cancer counts from cancer registry no
    information on socio-economic
  • variables except age and gender groupings.
  • Need to used aggregate Census data in an
    ecological Study

16
Standard Ecological Analysis - Data
  • Model uses aggregate SAS data.
  • 92 Local Authority Districts.
  • Dependent variable number of individuals in
    each district with LLTI,
  • 1 measurement per district.
  • Age and gender effects not included directly
    part of offset term
  • Covariates included as proportions, e.g.
    proportion of unemployed
  • in each district.
  • Covariates categorical not binary
    (non-standard).
  • Offset Log of expected frequency of LLTI for
    each district.

17
Poisson Model
  • The model used for the standard ecological
    analysis is a Poisson
  • model of the form,

is the expected frequency of developing LLTI in
district k.
xjk is the value of the jth covariate in
district k bj is the parameter to be estimated
for the jth covariate. ek is the offset term
calculated from indirect standardisation
18
Estimated Parameters (Standard Error)
Table 3 Standard Ecological Model
19
Relationships Identified by the Model
  • LLTI increases in districts with a higher
    proportion of non-white.
  • LLTI decreases in districts with a higher
    proportion of rented accommodation.
  • LLTI increases in districts with a higher
    proportion of non car owners, and in districts
    with a higher proportion of two car owners
  • LLTI increases in districts with a higher
    proportion of individuals in social class
    III manual, districts with high proportion of
    individuals insocial classes III non- manual, and
    districts with higher proportions of IV and V.
  • LLTI decreases in districts with a higher
    proportion of unemployed and increases in
    districts with higher proportion of inactive
    people.

20
The Ecological Fallacy Effect
Figure 3 Identified relationships between
unemployment and LLTI
21
Model Parameters
  • The parameter estimates highlight the ecological
    fallacy at its most
  • extreme.
  • Relationships are counter intuitive to what
    would be expected
  • from the individual level analysis.

22
3. Modified Ecological Analysis (i)
  • Proposed by Lancaster and Green (2002)
  • Dependent variable expanded by age and sex.
  • Six observations per district (552 in total).
  • Age and gender terms included as binary
    indicator variables gt allows between district
    age/sex interaction terms to be fitted.
  • Covariates only 1 unique value per district.

 
23
Expanded data set
Table 4 Example of data set
24
Binomial Model
  • Poisson model appropriate when illness rates
    are small.
  • When SAS data expanded age and gender groups 55
    to 64 have high
  • rates of LLTI (Males 32 and Females 25.)
  • Binomial model more appropriate.
  • Includes offset term

where ei is the expected frequency for
observation i and ni is the number of individuals
included in observation i
25
Estimated Parameters (Standard Error)
Table 5 Modified Ecological Model (i)
26
Relationships Identified by the Models
  • Relationships identical to those identified by
    the standard ecological model.
  • Demonstrates that offset is effective in
    correcting for age/gender differences
  • in illness rates between districts
  • No improvement on standard ecological model in
    reducing ecological bias
  • in this example, but could try more complex
    interactions between age/sex
  • and socio-economic variables.

27
Electoral Ward Level
  • The modified ecological analysis model (i) does
    not improve upon the
  • standard ecological model when studying
    main effects.
  • May be due to large geographical area at Local
    Authority District level
  • alternative Electoral Ward level analysis.
  • Aggregated data expanded by age and gender
    2144 wards 6
  • observations per ward each covariate value
    (proportion) repeated
  • 6 times per ward - total 12864 observations
    in data set.

28
Parameter Estimates (Standard Error)
Table 6 Electoral Ward Level Model
29
Relationships Identified by the Models
  • Significant improvement on Local Authority
    District level models.
  • Risk of developing LLTI decreases in wards with
    high proportion of
  • car owners and increases in wards with high
    proportion of individuals
  • who do not own a car.
  • Risk increases in wards with high proportion of
    unemployed.
  • Ecological bias still remains LLTI decreases
    in wards with high proportion
  • of individuals in social classes IV and V.

30
3. Modified Ecological Analysis (ii)
  • Incorporates individual SAR level data into
    analysis.
  • SAR data aggregated over district and by age
    and gender
  • (384 observations).
  • Unique covariate proportion for each age-gender
    category
  • (6 per district).
  • Binomial Model no offset term included in
    model.

31
Expanded data set
Table 7 Example of data set
 
32
Estimated Parameters (Standard Error)
Table 8 Modified Ecological Model (ii)
 
33
Relationships Identified by the Models
  • Three significant improvements have been
    identified.
  • Increased risk of developing LLTI in districts
    with high proportion of
  • individuals in lower social classes. i.e III
    manual, IV and V.
  • Risk of developing LLTI increases in districts
    with high proportion of
  • unemployed.
  • Risk of LLTI decreases with car ownership.
  • Some ecological bias still remains housing
    tenure, qualifications

34
Summary of Results
  • Modified Ecological Model (i) - Electoral Ward
    level analysis improves
  • upon Local Authority District analysis.
  • gt implies that method may not work well for
    large geographical areas
  • gt may improve with age/sex interaction
    terms but then more complex
  • to interpret results
  • Modified Ecological Model (ii) - incorporating
    individual level SAR
  • data into aggregate model improves upon
    Modified Ecological Model (i)
  • in reducing ecological bias.

35
4. Aggregated Individual Level Model
  • Proposed by Prentice and Sheppard (1995).
  • Appealing model as combines aggregate level
    illness rates with
  • individual level covariate information
  • Model constructed by aggregating individual
    level relative rate models
  • over each Local Authority District.
  • The observed mean illness rate for each
    district is then regressed on the
  • mean relative rate model for each district.

36
Relative Rate Model
is the probability that the ith individual in
district k develops a LLTI
nk is the number of individuals in district
k xjik is value of covariate j for individual i
in district k. bj is the parameter to be
estimated for jth covariate.
37
Estimation of the Relative Rate Parameters
  • The model parameters b are the solution to the
    score equations.


is the mean observed illness rate for district
k .
is the mean expected rate
Vk is a working variance for
38
Convergence Problems
  • Score equations are solved iteratively using
    the Newton-Raphson procedure.
  • Did not converge for some combinations of
    covariates.
  • One parameter had large negative value tending
    to
  • Caused row of zeros in matrix D consequently
    could not be inverted and iteration procedure
    crashed.

39
Results Obtained
  • Data set restricted to a single age and gender
    category males aged 30 to 44.
  • Compared with equivalent Poisson/Binomial model
    for Modified Ecological Regression (ii).

Table 9 Aggregated individual level model.
40
Relationships Identified by the Models
  • All models identify a negative relationship
    between no car ownership
  • and LLTI. i.e. reduced risk of developing
    LLTI.
  • Results inconclusive unstable algorithm.
  • Model is still constructed at aggregate level -
    may still be sensitive to
  • ecological bias.

41
5. Aggregated Compound Multinomial Model
  • Proposed by Brown and Payne (1986), Forcina
    and Marchetti (1989)
  • Method to find internal cell probabilities for
    a r x s contingency table.

Table 10 Contingency table for district k
42
Aggregated Compound Multinomial Model
  • In Table 10 the X and Y marginal totals are
    known and the transitional probabilities (pij )
    are to be estimated .
  • District level covariates (z) can be
    incorporated to model log odds ratios

43
Results Obtained
  • Example Age and illness for males, no
    covariates, i.e. assumes
  • no variation across districts.
  • Internal probabilities severely underestimate
    the observed illness
  • rates.

Table 11 Aggregated Compound Multinomial Model
44
Conclusions
  • Individual Level Model gives the best results
    in terms of identifying
  • the perceived correct relationships
    between the covariates and LLTI.
  • Standard Ecological Model performs poorly at
    the Local Authority
  • District level ecological bias.
  • Modified Ecological Model (i) at electoral
    ward level is an improvement on
  • Local Authority District level model when
    using aggregate SAS data.
  • Modified Ecological Model (ii) incorporating
    individual level SAR
  • data into the aggregate model is most
    effective in reducing ecological bias.
  • Aggregated Individual Level Model may assist
    in reducing the
  • ecological fallacy if the convergence
    problems can be overcome.
  • Aggregated Compound Multinomial Model is
    complex algorithm to use
  • and underestimates illness rates in our
    example.
Write a Comment
User Comments (0)
About PowerShow.com