Methods of Geographical Perturbation for Disclosure Control - PowerPoint PPT Presentation

About This Presentation
Title:

Methods of Geographical Perturbation for Disclosure Control

Description:

Postcodes. Static boundaries. Local government ... Postcode Level ... Maximise Utility: swap shorter distances (between adjacent postcodes instead of EDs) ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 27
Provided by: carolin101
Category:

less

Transcript and Presenter's Notes

Title: Methods of Geographical Perturbation for Disclosure Control


1
Methods of Geographical Perturbation for
Disclosure Control
POPFEST June 2006, Liverpool
  • Division of Social Statistics
  • And Department of Geography

Caroline Young
Supervised jointly by Prof. Chris Skinner
(Statistics) and Prof. David Martin (Geography)
2
Overview of Presentation
  • Part I - Description of Disclosure Control
  • Introduction to PhD topic - disclosure by
    differencing
  • Part 2 Methodology to protect against
    Differencing
  • Conclusions and Future Work

3
What is Disclosure Control?
  • Protecting confidentiality of statistical data,
    particularly the Census
  • UK Census a promise given to respondents to
    protect confidentiality (also legal obligations)
  • Disclosure control procedures are necessary to
    ensure confidentiality

4
How can Disclosure Occur?
5
What is Statistical Disclosure Control?
  • Statistical Disclosure Control refers to
    statistical methods which modify the data to
    control the disclosure risk

6
Disclosure by Differencing
Disclosure by geographical differencing occurs
when multiple geographies can be linked to reveal
new information
7
Differencing from two geographies
Census User A wants Geography A.
8
Differencing from two geographies
Census User B wants Geography B.
9
Differencing from two geographies
Differenced area
Nested geography
Ref Duke-Williams Rees (1998)
10
Disclosure by Geographical Differencing
Fictitious Table 1 Claimants in Small Area (to
larger boundary)
Fictitious Table 2 Claimants in Small Area A (to
smaller boundary)
11
Disclosure by Geographical Differencing
Calculated Table 2 Claimants in Differenced Area
Differenced area in yellow
12
Demand for Multiple Geographies
  • Increased user demand for flexible or
    non-standard geographies

13
  • Part II
  • Methodology to protect against Differencing

14
Random Record Swapping (UK Census 2001)
  • Introduce uncertainty into the true geographical
    location of a subset of households
  • Basic idea Swap the location of
  • household A with the location of similar
    household B
  • A unique household in an area (cell value of one)
    may not be the true household may have been
    swapped. Cannot disclose information with any
    certainty.

15
Assessing Performance of a Swapping Method
  • Risk-Utility concept - finding a balance
  • MAXIMISE UTILITY
  • Measure of damage/utility Average Absolute
    Deviation (AAD) per cell (averaged over all
    tables)
  • MINIMISE RISK
  • Measure of risk of true uniques in table
    (averaged over all tables)
  • Identification Rate of cell counts where
    which
  • relate to the same household as

Let represent cell of table
and the number of cells in table .
16
Experiments
  • Performed simulations on a synthetic census
    dataset
  • Random record swapping method (UK Census 2001)
    used as benchmark to assess new approaches
  • Examine disclosure risk at small area level
    (postcodes) since the aim is to protect slivers
    produced by differencing
  • Some simplified results here

17
Simulating Census Swaps
  • Full details of methods are unknown as they are
    confidential
  • MAKE A GUESS...
  • (1) UK Random Record Swap
  • Swap a random sample (10) of households between
    Enumeration Districts (EDs) but not out of Local
    Authority district. Pair similar households (plus
    other constraints)
  • (2) US Targeted Record Swap
  • Swap 10 of risky households only (households
    that are unique)

18
Disclosure Risk
In practice, other post-tabulation methods were
also used (small cell adjustment) to offer more
protection at small area level But we need a
pre-tabulation method one method that protects
data before aggregation
19
100 swapping
  • Reduce disclosure risk swap ALL households
  • Maximise Utility swap shorter distances
    (between adjacent postcodes instead of EDs)
  • Disclosure risk is much reduced at small area
    level
  • Too much damage at higher levels of aggregation

20
Distance Swap
  • Current swapping distances are dependent on
    pre-set geographies which have different shapes
    and population distributions. Plus boundaries
    often change
  • New Distance swap sample swapping distances
    from a distribution equivalent to 100 random
    swap (truncated normal with same mean and std)

21
Density Swap
  • How to improve distance swap?
  • Want more control over damage and risk.
  • Solution
  • Low density areas are more vulnerable to
    disclosure attacks - fewer people living there.
    These households require greater perturbation.
  • Households in high density areas are less risky
    and require perturbing smaller distances (also
    reduces damage).

22
Density Swap
  • Change sampling distribution sample number of
    households
  • Takes into account local population density
  • Distance is not Euclidean but in terms of number
    of households

Urban area
Rural area
23
Effectiveness of Density Swap
  • Choice of sampling distribution is very important
    (normal, exponential, etc)
  • Sort households appropriately to control pairing
    of households
  • Match households appropriately definition of
    similar households

24
Results of all 100 swaps
25
Conclusions and Further Work
  • Density Swap appears to be a good solution BUT
    need to examine at other measures of damage and
    risk
  • Is the density swap better than the combination
    of methods used on the 2001 Census? (swapping
    plus small cell adjustment)
  • Discriminate between local-area uniques and
    wide-area uniques

26
References
  • Brown D. (2003) Different Approaches to
    Disclosure Control Problems Associated with
    Geography. Joint ECE/ Eurostat work session on
    statistical data confidentiality, Luxembourg.
  • Duke-Williams, O. and Rees, P. (1998) Can Census
    Offices publish statistics for more than one
    small area geography? An analysis of the
    differencing problem in statistical disclosure
    International Journal of Geographical Information
    Science 12, 579-605
  • Elliot, M. J., (2005) An overview of Statistical
    Disclosure Control Paper presented to RSS Social
    Statistics Committee conference on Linking survey
    and administrative data and statistical
    disclosure control. London May 2005.
  • L. Willenborg and T. de Waal. Statistical
    Disclosure Control in Practice. Springer-Verlag,
    New York, 1996.
  • Voas D. and Williamson P. (2000) 'An evaluation
    of the combinatorial optimisation approach to the
    creation of synthetic microdata', International
    Journal of Population Geography, 6, 349-366.
Write a Comment
User Comments (0)
About PowerShow.com