Philip Clarke and Denise Silva - PowerPoint PPT Presentation

About This Presentation
Title:

Philip Clarke and Denise Silva

Description:

Development of Small Area Estimation at ONS Philip Clarke and Denise Silva Outline Small Area Estimation Problem History and current provision Development in progress ... – PowerPoint PPT presentation

Number of Views:352
Avg rating:3.0/5.0
Slides: 49
Provided by: cla104
Category:

less

Transcript and Presenter's Notes

Title: Philip Clarke and Denise Silva


1
Development of Small Area Estimation at ONS
  • Philip Clarke and Denise Silva

2
Outline
  • Small Area Estimation Problem
  • History and current provision
  • Development in progress
  • Wider research
  • Consultancy service

3
1. Small Area Estimation Problem
  • Official statistics provide an indispensable
    element in the information system of a democratic
    society (Fundamental Principles of Official
    Statistics, UNSD )
  • Sample surveys are used to provide estimates for
    target parameters on population (or National)
    level and also for subpopulations or domains of
    study
  • However implementation in a Small Area Context is
    challenging

4
Small Area Estimation Problem
  • In small areas/domains sample sizes are usually
    not large enough to provide reliable estimates
    using classical design based methods.
  • Small area estimation problem refers to SMALL
    SAMPLE SIZES (or none at all) in the domain or
    area of interest.

5
2. History
  • Small Area Estimation in UK begun as research
    project in late 1990s.
  • In response to calls for locally focussed
    information in many different areas
  • Environmental
  • Business
  • Social, e.g. health, housing, deprivation,
    unemployment.
  • Also calls for more general domain estimation
  • e.g. cross classifications by age/sex,
    occupation.
  • Initial experimental studies on mental health
    estimation for DoH.

6
Developing alternative methodology
  • Purpose
  • To enable production of reliable estimates of
    characteristics of interest for small areas or
    domains based on very small or no sample.
  • To asses the quality (precision) of estimates.
  • Several years of research and development (since
    1995)
  • Partnership work with universities and Statistics
    Finland
  • The EURAREA project
  • Research programme funded by Eurostat to enhance
    techniques to meet European needs (from
    2001-2004)

7
Basis of Approach Relax the Survey Restriction
  • Borrow strength by removing the isolation of
    depending solely on the survey and solely on
    respondents in a given area.
  • Widen the class of respondents for a given area
    by pooling
  • together similar areas.
  • Widen the class of respondents by taking past
    period respondents into account.
  • Take advantage of other related data sources
    which are not sample survey based.
  • Known as auxiliary data.
  • e.g. Administrative data or census data which
    are available for all areas/domains.

8
Model based estimation
  • All approaches detailed are based on an implicit
    or explicit model.
  • The auxiliary data and use of survey data from
    all areas is the approach currently adopted in
    UK.
  • Borrows strength nationally.
  • Uses an explicit statistical model to represent
    the relationship between the survey variable of
    interest and auxiliary data.
  • Dependent variable is survey variable of
    interest.
  • Independent variables are certain auxiliary data
    variables known as covariates.
  • Model fitted using sample data and assumed to
    apply generally.
  • Model then used in the obtaining of area/domain
    estimates.

9
Outline of a model structure
  • Suppose variable of interest, Y, in an area j is
    linearly related to a single covariate X
  • A possible model structure is given by
  • where is the mean of Y in area j
  • This is a deterministic structure, so we need to
    add some random variability

10
  • Obtain
  • uj represent random area differences from the
    deterministic value.
  • represents variability between areas.

11
Model fitting
  • Fit the model using direct survey estimates for
    each area.
  • This introduces additional sampling variability.
  • Unit level sampling variability
  • giving rise to additional area level sampling
    variability

12
Estimating from the model
  • Once the model is fitted, estimate for area j by
    using parameter estimates

13
Estimating from the model
  • Once the model is fitted, estimate for area j by
    using parameter estimates
  • Estimate of mean squared error given by

14
Estimating from the model
  • Once the model is fitted, estimate for area j by
    using parameter estimates
  • Estimate of mean squared error given by
  • Modelling success measured by obtaining estimates
    with high precision based on low mean squared
    errors.

15
Current provision
  • SAEP a generic methodology for application to
    variables from household based surveys.
  • Mean household income based on Family Resources
    Survey published as Experimental Statistics for
    wards in 1998/99, 2001/02 and for middle layer
    super output areas 2004/05
  • Specialised methodology for labour market
    estimation of unemployment from Labour Force
    Survey.
  • Unemployment levels and rates routinely published
    quarterly as National Statistics for Local
    Authority Districts in Great Britain.

16
SAEP methodology and income estimation
  • SAEP methodology is -
  • derived from outlined model-based approach,
  • BUT is
  • based on a unit (household)/area multilevel
    model
  • borrows strength across areas using multivariate
    area level auxiliary data (covariates)
  • can model transformation of variable of interest
    if required
  • adapted for estimating at ward/middle layer super
    output area (MSOA) from customary ONS clustered
    design household sample surveys

17
Application to income estimation- Response
Variable
  • Income value for each household sampled in Family
    Resources Survey (FRS).
  • 3,300 MSOAs in England and Wales with sample in
    2004/05,
  • 21,500 total responding households.
  • But not a simple random sample.
  • Clustered design with primary sampling units as
    postcode sectors,
  • 1,500 sampled postcode sectors.

18
Coping with design clustering
  • Samples are random samples of postcode sectors
  • So random terms are around postcode sectors,
    indexed by j
  • Estimation is required for geographically
    distinct wards or middle layer super output
    areas
  • So covariates are for these areas, indexed by d
  • For estimation, covariates must be known for all
    areas not just sampled areas.

19
SAEP model and estimator structure for income
estimation
  • Multilevel structure gives rise to unit level
    random term replacing area sampling
    variability
  • Logarithmic transformation of income taken
    because of positive skewness of income
    distribution
  • Model

20
SAEP model fitting procedure
  • Create a dataset containing
  • Variable of interest from individual household
    responses to survey.
  • values of a large number of administrative and
    census variables for the particular household
    area of residence which we believe could impact
    on variable of interest, eg census variables, DWP
    social benefit claimant rates, council tax band
    proportions

21
SAEP model fitting procedure (cont.)
  • Starting with a null model, fit covariates in a
    stepwise manner in order of significance by using
    specialised multilevel software eg. MLwiN or
    SAS PROC MIXED.
  • In this way select a set of significant
    covariates and fit an accepted model.
  • Use diagnostic techniques to investigate model
    against assumptions eg. Randomness of residuals,
    unbiasedness of predictions.

22
Estimator and mean squared error
  • Estimator on log income scale
  • A synthetic estimator is used omitting the random
    area terms

23
Estimator and mean squared error
  • Estimator on log income scale
  • A synthetic estimator is used omitting the random
    area terms
  • Mean squared error

24
Converting to raw income scale
  • Need to make allowance for
  • mean(log) log(mean)
  • Area estimate

25
Converting to raw income scale
  • Need to make allowance for
  • mean(log) log(mean)
  • Area estimate
  • Confidence interval

26
Actual model for ward estimation of income in
2004/05
phrpman proportion of household reference
persons aged 16-74 who are in professional or
managerial occupations. lnphrpecac logit of
proportion of household reference persons aged
16-74 who are economically active. lnphhtype1
logit of proportion of one person
households. engegh proportion of council tax
band GH dwellings for England. pcgeo
proportion of people aged 60 and over claiming
pension credit (guarantee element only) .
27
(No Transcript)
28
Income estimation outputs
  • Estimates obtained of sufficient precision for
    publication and acceptable to user community.
  • Accredited as Experimental Statistics
  • Placed on Neighbourhood Statistics website
    together with user guides and technical
    documentation.

29
Estimation of unemployment at local authority
level
  • BACKGROUND
  • Unemployment is a key indicator and is used for
    policy making and resource allocation
  • Official UK measure of unemployment follows the
    International Labour Organisation Definition
    (ILO)
  • ILO unemployment is estimated via the Labour
    Force Survey (national level)
  • Small (local) sample sizes in the LFS for some
    areas

30
Features of Labour Force Survey
  • A rotating panel survey
  • Roughly 60,000 households surveyed each quarter
  • Each household remains in sample for 5 quarters
    (waves 1 to 5) then drops out
  • Waves 1 and 5 respondents for last four quarters
    used to obtain an annual local labour force
    survey dataset of about 90,000 independent
    households.
  • Unclustered survey design giving a sample in
    each LAD.

31
Features of unemployment modelling
  • Unclustered LFS design means
  • direct estimates available for each LAD
  • availability of estimated random area terms in
    LAD estimation
  • However
  • low precision of direct survey estimates due to
    small sample sizes
  • need for better precision model-based estimates
  • Availability of a highly correlated covariate
    number of claimants of unemployment benefit/job
    seekers allowance
  • Eliminates need for model fitting to a range of
    possible covariates on each occasion.

32
The small area estimation model
  • A LOGISTIC multilevel model by local authority
    (d) and six age/sex classes (i). It relates the
    probability pdi of an individual to be
    unemployed.
  • Response variable proportion of unemployed
    individuals in LFS in age/sex class of local
    authority (logit transformed).
  • Covariate data
  • Benefit data the logit of the claimant
    proportion of job seekers allowance in each
    age/sex class within each local authority and
    also for overall age/sex classes
  • The age/sex class male/female for age groups (16
    to 24 25 to 49 50 and over)
  • Geographical region the 12 government office
    regions (GOR)
  • ONS area classification 7 categories under the
    National
  • Statistics Area Classification for Local
    Authorities

33
  • The model used to link pid with the auxiliary
    data is a Binomial linear mixed model with a
    logistic link function

Area random effect
34
Estimator from model
  • The model-based estimator of proportion
    unemployed in each age/sex group of each LAD is
    then given after fitting model by
  • Note the use of the term in the estimator
    as it is now available for each LAD.

35
Model-based estimate for Unemployment
  • Model has estimated a proportion at each age/sex
    group
  • This is converted into an estimate of
    unemployment level at each LAD by
  • multiplying each proportion estimate by the LFS
    estimate of population unsampled
  • adding those sampled and found unemployed
  • summing the age/sex group estimates
  • Final Estimator for unemployment level for area d
    is

6 age-sex groups
36
LAD Estimation of unemployment rate
  • The estimate of unemployment rate is obtained
    using model-based estimate of unemployment level
    and the direct estimate of employment

Model-based estimate of Unemployment
Direct survey estimate of Employment
37
Precision of Estimates
  • The mean squared error (MSE) for the unemployment
    level estimates in LAD d is given by several
    components
  • G1 and G2 come from the uncertainty in estimating
    the coefficients and u in the model
  • G3 arises because we have estimated the
    variance of u
  • G4 is necessary because the model estimates
    actual values rather than means
  • G5 is the additional variance component due the
    estimation of population size in each
    LAD

38
Unemployment estimates publication
  • The standard errors of the model based estimates
    found to be smaller than the corresponding direct
    standard errors in each LAD.
  • Model-based estimates have been accredited as
    National Statistics and now published quarterly
    in Labour Market statistics releases.
  • (http//www.statistics.gov.uk/StatBase/Product.asp
    ?vlnk14160)

39
3. Developments in progress
  • Labour Market area
  • Consistent estimation of all three labour market
    states - employed, not economically active,
    unemployed
  • Currently Local Authority labour market
    estimates are
  • Model-based estimates for unemployment
  • Direct survey estimates for economically
    inactivity and employment figures
  • Now developing a multivariate model to estimate
    concurrently number of unemployed, employed and
    economic inactive people by local authority

40
Compositional data
  • The proportions of individuals classified in each
    category are
  • Proportions bounded between 0 and 1 and subject
    to a unity-sum constraint.
  • Multinomial Logistic model to relate labour
    market probabilities with auxiliary data for all
    categories is therefore defined with only 2
    equations.

41
Multinomial Logistic Model

42
Multinomial Logistic Model
Then
43
The Model
  • Relates the probabilities of labour market states
    to following predictors
  • age/sex group Geographical region and ONS area
    classification
  • Benefit data claimant proportions (JSA) and
    incapacity benefit
  • Other variables will be tested (e.g. income
    support)

44
Model-based estimate for all Labour Market States
  • Model estimates a proportion for each labour
    market state at each age/sex group
  • Final Estimator for a labour market state j for
    area d is

6 age-sex groups
All labour market states
45
Development stage of multinomial model
  • Current stage
  • development of SAS programs to calculate
    precision of the multinomial estimates based on
    methodology proposed by Saei(2006)
  • Model selection and test of other covariates
  • Model cross validation including several time
    periods
  • Up to now
  • Implementation of the multinomial model indicates
    that plausible estimates can be obtained for all
    labour market states when simultaneously modelled

46
Developments in progress (cont.)
  • Labour Market area
  • Unemployment estimation at Parliamentary
    constituency level
  • Non-nested geography but with certain matching
    areas
  • Issue here is to ensure consistency with local
    authority estimates at comparable areas
  • Model developed and estimates likely to become
    available in the coming year

47
Developments in progress (cont.)
  • Income estimation
  • Estimation at local authority level
  • Clustered survey design entails a modification of
    SAEP framework to cater
  • Currently in development
  • Estimation of poverty proportion households
    below threshold
  • Currently being developed for MSOA/local
    authority level

48
4. Wider research activities
  • In conjunction with academic partners
  • Estimation of change over time
  • Current work is confined to single point-in-time
    estimation but users would like indication of
    progress over time particular in relation to
    funding
  • Estimation of poverty using M-quantile modelling
  • Research using FRS data by Nikos Tzavidis
  • Models incorporating spatial relationships
  • Preliminary investigation of spatial relationship
    in unemployment model in conjunction with Ayoub
    Saei at Southampton University
  • Link with work at Imperial College by Nicky Best
    and Virgilio Gomez-Rubio

49
5. Methodology Consultancy Service
  • ONS is currently establishing a methodology
    consultancy service
  • To undertake and support statistical work by
    other government departments and public sector
    organisations.
  • Resource for assessment/quality improvement
  • Currently working with Health and Safety
    Executive on small area estimation of incidence
    of work related illness at local authority level.

50
References
  • Small Area Estimation Project Report. Model-Based
    Small Area Estimation Series No.2, ONS, January
    2003
  • Developments in small area estimation in UK with
    focus in current research. Clarke, P., Mcgrath
    K., Chandra, H., Tzavidis, N. (2007). IASS
    Satellite Meeting on Small Area Estimation, Pisa.
  • Model Based Estimates of Income for Middle Layer
    Super Output Areas 2004/05 Technical Report, ONS,
    September 2007
  • http//neighbourgood.statistics.gov.uk/HTMLDo
    cs/images/Technical Report 2004_05 v2 -
    Final_tcm97-53513.pdf http//neighbourhood.statis
    tics.gov.uk/dissemination/MetadataDownloadPDF.do?d
    ownloadId21704
  • Development of improved estimation methods for
    local area unemployment levels and rates. Labour
    Market Trends, vol. 111, no 1
  • www.statistics.gov.uk/cci/article.asp?id372
  • Summary publication accompanying the publication
    of the 2003 unemployment estimates November 2004
  • http//www.statistics.gov.uk/downloads/theme_labo
    ur/ALALFS/AnnexA.pdf
Write a Comment
User Comments (0)
About PowerShow.com