The Identification of Special Uniques - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

The Identification of Special Uniques

Description:

The current project focuses on risk assessment but the work has implication for ... The number of unique pairs, triplets etc. contained within a record. Conclusions ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 20
Provided by: statist
Category:

less

Transcript and Presenter's Notes

Title: The Identification of Special Uniques


1
The Identification of Special Uniques
  • Mark Elliot
  • Centre for Census and Survey Research,
  • Anna Manning
  • Department of Computer Science
  • University of Manchester, UK

2
Elements of Statistical Disclosure Research
  • Disclosure Risk Assessment
  • Disclosure Control techniques
  • Effects on Analytical Power
  • The current project focuses on risk assessment
    but the work has implication for the other aspects

3
The non-monotonicity of the impact of
geographical detail on disclosure risk
4
First special uniques proposition
Records that are sample unique at a coarse level
of geographical detail are more likely to be
population unique at a finer level of
geographical detail than sample uniques at the
finer geographical level (as measured by the UUSU
ratio).
5
Schematic representation of the greater
probability of population uniqueness given
special uniqueness.
Course
Sample Uniques
Predictive probability gtUUSU
Geographical detail
Special Uniques of
Uusu ratio
fine
Sample Uniques
Population Uniques
6
Two hypothetical keys showing number of bands
7
Second special uniques proposition
Records that are sample unique using broad
variable codings are more likely to be population
unique with more detailed variable codings than
sample uniques with the more detailed codings.
8
Geographical coding
  • Our projects use a nested geography with
  • fifteen levels.
  • For the purposes of this presentation
  • level 1 population size 500,000
  • level 7 population size 30,000

9
Standard key variables
  • In standard key variables, names consist of
  • a name and a number indicating the number of
    possible
  • values. In this paper we use
  • Basic key
  • Age94 (single years and top coded)
  • Sex2
  • Marcon5 (marital status)
  • Other variables
  • Primecon11 (primary economic status)
  • Ethnic10 (ethnicity)
  • Age19 (five year bands)

10
Identifying Special UniquesAn example using
basicethnic10
11
Identifying Special UniquesAn example using
basicethnic10
12
Identifying Special UniquesAn example using
basicprimecon11
13
Identifying Special Uniques aggregating on age
basicethnic10
14
Identifying Special Uniques aggregating on
agegeography.
15
Probability of a correct match given a unique
match before after controlling special uniques
16
Data Mining and Statistical Disclosure
  • An ESRC funded project to design and
  • implement a new algorithm for
  • discovering all special uniques in a set
  • of anonymised microdata.
  • Two processes will be involved
  • An exhaustive search for all potentially risky
    records
  • The isolation and grading of the special uniques.

17
The identification of all special uniques in a
sample of data
  • This process is combinatorially explosive, as
    illustrated by the following results from the 2
    Individual SAR

18
Grading the riskiness of a record
  • We believe that there are two key issues
  • The size of the attribute set responsible for
    uniqueness e.g. size2 for a 16-year-old widow
  • The number of unique pairs, triplets etc.
    contained within a record

19
Conclusions
  • The Special Uniques Identification method
    provides a useful way of identifying risky
    records
  • Further work is to enable more sophisticated
    classification of specials
  • High performance computing using will enable
    exhaustive search and identification of specials.
Write a Comment
User Comments (0)
About PowerShow.com