Methodological challenges in integrating data collections in business statistics - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Methodological challenges in integrating data collections in business statistics

Description:

Methodological challenges in integrating data collections in business statistics ... few concessions to statistical needs. Accuracy. unaffected by sampling ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 17
Provided by: pauls215
Category:

less

Transcript and Presenter's Notes

Title: Methodological challenges in integrating data collections in business statistics


1
Methodological challenges in integrating data
collections in business statistics
  • Paul Smith
  • Office for National Statistics

2
Outline
  • Data quality for different sources
  • quality measures for survey and administrative
    inputs
  • quality measures for outputs
  • Combinations of sources
  • familiar and more advanced situations
  • Mode effects
  • Models
  • Discussion

3
Statistical data collections - quality
  • Relevance
  • generally questions conform to desired concepts
  • may be tailoring for
  • practicality
  • consistency across collections even if concepts
    differ
  • Accuracy
  • affected by sampling
  • impacts from non-response, measurement error
  • Timeliness
  • generally relatively timely

4
Administrative data - quality
  • Relevance
  • questions conform to administrative (not
    statistical) concepts
  • few concessions to statistical needs
  • Accuracy
  • unaffected by sampling
  • processes to discourage non-response
  • treatment of measurement error differs by
    variable
  • Timeliness
  • generally slow

5
Differences between types of source
  • Sampling accuracy is measurable for surveys, not
    relevant for administrative data sources
  • confidence in quality reduced for admin data
  • balance of accuracy measures different
  • Building statistical requirements into
    administrative series
  • requires negotiation and agreement
  • VAT classification information in the UK
  • INSEE has statistical and accounting information
    well integrated

6
Questionnaire design
  • Questionnaire design principles mostly used in
    designing statistical collections
  • Administrative data seen as forms not
    questionnaires
  • less attention to question phrasing to obtain
    required answer
  • more on statutory requirements

7
Output data quality
  • Data quality from combined outputs can be
    challenging to measure
  • function of the qualities of the input sources,
    and the methods used to combine them
  • some well-known general approaches
  • development of measures needed for particular
    cases (eg from models)

8
Combinations of sources - 1
  • Frame and sample information
  • Sampling frames typically derived from
    administrative sources
  • Multiple uses of frame information
  • sample design
  • sample selection
  • validation and editing
  • estimation and variance estimation
  • Quality easily derived standard situation

9
Combinations of sources - 2
  • Dual-frame surveys
  • More than one administrative source
  • Pension funds survey in the UK
  • Units
  • Business register
  • Challenges of population inflation if matching
    not perfect
  • Estimate probability that unit appears in sample
    from either source
  • use in appropriate weighting procedure
  • adjustment for P(in both surveys) depends on
    survey type

10
Combinations of sources - 3
  • Multiple surveys
  • different periodicity
  • summary information monthly, detail annually
  • for example capital expenditure quarterly
    breakdown, annual summary
  • Benchmarking
  • where short-period surveys small (and variable)
    and annual larger (and less variable)
  • Quality measures
  • account for sampling error in both sources
  • account for non-response and measurement errors
    in larger survey

11
Combinations of sources - 4
  • Auxiliary information
  • If administrative concept not close to
    statistical concept, data may still be useful
  • Auxiliary information in estimation
  • not required to be correct, only correlated with
    outcome
  • the better the correlation, the better the
    accuracy
  • Auxiliary information in validation
  • use tax data to improve validation follow-up
    activity
  • Data confrontation
  • Use multiple sources to identify discrepancies
  • Balancing

12
Mode effects
  • Mode effects manifest in several ways
  • differences in contact rate
  • differences in response rate given contact
  • differences in question replies given response
  • Test differences through a designed experiment
    (van den Brakel Renssen 1998, 2005)
  • evaluates whole-process differences (not
    individual steps)
  • non-response adjustment if good predictors for
    response amongst auxiliary data (var increases)
  • model-based adjustments for other changes

13
Temporal differences
  • Administrative data often have longer reference
    period than statistical requirement
  • Implies temporal disaggregation (model-based)
    Dagum Cholette 2006
  • Quality implications
  • estimated data as inputs
  • sensitivity of model to interesting changes

14
Models for combining data
  • Full flexibility in combining data available
    through modelling approach
  • Models at boundary between statistical producer
    and user
  • Ideally statistical results insensitive to model
    assumptions
  • small area estimates
  • useful for social surveys
  • challenges for business surveys not yet resolved
  • modelling for unit structures - BRES

15
Discussion
  • Aim more from existing sources
  • often imperfect matches
  • modelling only appropriate approach
  • subjective
  • robust to assumptions
  • sensitivity analysis
  • Mixed mode collections
  • usability and low cost
  • data combination
  • quality components harder to measure

16
  • for more details see the paper, or contact
  • paul.smith_at_ons.gov.uk
Write a Comment
User Comments (0)
About PowerShow.com