Title: The Identification of Special Uniques
1The Identification of Special Uniques
- Mark Elliot
- Centre for Census and Survey Research,
- Anna Manning
- Department of Computer Science
- University of Manchester, UK
2Elements of Statistical Disclosure Research
- Disclosure Risk Assessment
- Disclosure Control techniques
- Effects on Analytical Power
- The current project focuses on risk assessment
but the work has implication for the other aspects
3The non-monotonicity of the impact of
geographical detail on disclosure risk
4First special uniques proposition
Records that are sample unique at a coarse level
of geographical detail are more likely to be
population unique at a finer level of
geographical detail than sample uniques at the
finer geographical level (as measured by the UUSU
ratio).
5Schematic representation of the greater
probability of population uniqueness given
special uniqueness.
Course
Sample Uniques
Predictive probability gtUUSU
Geographical detail
Special Uniques of
Uusu ratio
fine
Sample Uniques
Population Uniques
6Two hypothetical keys showing number of bands
7Second special uniques proposition
Records that are sample unique using broad
variable codings are more likely to be population
unique with more detailed variable codings than
sample uniques with the more detailed codings.
8Geographical coding
- Our projects use a nested geography with
- fifteen levels.
- For the purposes of this presentation
- level 1 population size 500,000
- level 7 population size 30,000
9Standard key variables
- In standard key variables, names consist of
- a name and a number indicating the number of
possible - values. In this paper we use
- Basic key
- Age94 (single years and top coded)
- Sex2
- Marcon5 (marital status)
- Other variables
- Primecon11 (primary economic status)
- Ethnic10 (ethnicity)
- Age19 (five year bands)
10Identifying Special UniquesAn example using
basicethnic10
11Identifying Special UniquesAn example using
basicethnic10
12Identifying Special UniquesAn example using
basicprimecon11
13Identifying Special Uniques aggregating on age
basicethnic10
14Identifying Special Uniques aggregating on
agegeography.
15Probability of a correct match given a unique
match before after controlling special uniques
16Data Mining and Statistical Disclosure
- An ESRC funded project to design and
- implement a new algorithm for
- discovering all special uniques in a set
- of anonymised microdata.
- Two processes will be involved
- An exhaustive search for all potentially risky
records - The isolation and grading of the special uniques.
17The identification of all special uniques in a
sample of data
- This process is combinatorially explosive, as
illustrated by the following results from the 2
Individual SAR
18Grading the riskiness of a record
- We believe that there are two key issues
- The size of the attribute set responsible for
uniqueness e.g. size2 for a 16-year-old widow - The number of unique pairs, triplets etc.
contained within a record
19Conclusions
- The Special Uniques Identification method
provides a useful way of identifying risky
records - Further work is to enable more sophisticated
classification of specials - High performance computing using will enable
exhaustive search and identification of specials.