Title: Li Feng
1State-of-the-Art in Privacy Preservation
2Statistics
Table 1 Statistics on privacy preservation
1
1
12
14
Table 2 Statistics on anonymization
4
2
6
12
3Outline
- New problems
- New techniques
- Improvement
- Discussion
4New ProblemsSchema and data matching
- Record matching
- Identifying common information shared by two data
sourse - Propose a protocol for privacy-preserving record
matching between two parties that can have
different schemas and privacy requirements also
at schema level
- Novelty
- obtaining privacy by embedding the records of
each of the two parties in a vector space
(SparseMap) - Can handle different schemas
- Approximate matching
Privacy Preserving Schema and Data
Matching,SIGMOD,2007
5New ProblemsSchema and data matching (cont.)
6New ProblemsSchema and data matching (cont.)
7New ProblemsSchema and data matching (cont.)
- Limitation
- Using a trust third party
- Only applicable for two-party scenario
8New ProblemsHiding the presence
- all the existing anonymization methods are
concern about weaken the linkage between
individual and sensitive values - The first time to answer Is somebody in the
published dataset?
Hiding the Presence of Individuals from Shared
Databases,SIGMOD,2007
9New ProblemsHiding the presence (cont.)
The probability of a record in a generalized
table T is between ,
10New ProblemsRe-publication of dynamic datasets
- Static dataset
- Record insertion
- Record insertion and deletion
?
m-Invariance Towards Privacy Preserving
Re-publication of Dynamic Datasets,SIGMOD,2007 Pri
vacy, Accuracy, and Consistency Too A Holistic
Solution to Contingency Table Release,
PODS,2007 Maintaining K-Anonymity against
Incremental Updates,SSDBM,2007
11(No Transcript)
12New ProblemsRe-publication of dynamic
datasets(cont.)
- Most of the methods for static dataset is not
secure if the dataset is dynamic - In dynamic environment, the previous released
data will enhance the adversaries background
knowledge privacy breach is easier - We should process the inserted, deleted and
stayed data in different strategy
13Outline
- New problems
- New techniques
- Improvement
- Discussion
14New techniquesspace mapping
- Map the multi-dimensional data to 1D data
- Its easier to anonymize 1D data
Fast Data Anonymization with Low Information
Loss, VLDB,2007
15New techniquesspace mapping (cont.)
- A double-edged sword!
- Adapting space-mapping techniques, we just need
to concentrate on optimizing 1D data, but - Find a proper mapping method is a problem
- For different data, proper may be different
16New techniquesspatial index
- Observation
- K-anonymizing a data set is similar to
building a spatial index over the data set
The performance and characteristics of
anonymization is totally relied on the index
techniques!
K-Anonymization as Spatial Indexing Toward
Scalable and Incremental Anonymization, VLDB,2007
17Outline
- New problems
- New techniques
- Improvement
- Discussion
18Improvement
- Relax/enhance constrains background knowledge
- Concentrate on specific applicable problem
- More safer anonymization principle
19Improvementbased on background knowledge
- Minimality principle
- a k-anonymization should not generalize,
suppress, or distort the data more than it is
necessary to achieve k-anonymity - If the adversary knows that
- 1) the published data is generalized according
to minimality principle - 2) The anonymization goal (e.g. k-anonymity,
l-diversity) - Then the existing anonymization method is not
secure!
Enhanced Background knowledge
Minimality Attack in Privacy Preserving Data
Publishing,VLDB,2007
20Improvementbased on background knowledge
- All the existing anonymization method is base on
the following background knowledge assumption - The adversary can obtain a background knowledge
table we know exactly what the adversary has
- background knowledge table
- Assume the data owner just knows how much
background knowledge the adversary has the
number of basic units of knowledge
Worst Case Background Knowledge,ICDE,2007
21Improvementspecific applicable problem
Original microdata table
Original microdata table
- Query 1.The sum of salaries of those with age in
35,55
- Query 1.The sum of salaries of those with age in
35,55
- Goal Provide privacy preservation while
answering aggregate query as accuracy as possible - Privacy preservation
- (k, e)-anonymous
- Query answering
- Help table optimal partition algorithm
Result 210K,615K
Result 530K,540K
3-anonymous microdata table
3-anonymous microdata table after permutation
Aggregate Query Answering on Anonymized
Tables,ICDE,2007
22ImprovementMore safer anonymization principle
- K-anonymity every record in the published table
is identical to at least k 1 other records
- L-diversityEvery QI group contains at least l
well-represented sensitive values
- K-anonymity only concern on quasi-identifier
- l-diversity concern on both quasi-identifier and
sensitive attribute (l-diversity is still not
safe!) - t-closeness
- The distribution of a sensitive attribute in any
equivalence class is close to the distribution of
the attribute in the overall table
sensitive attribute
A 2-anonymity table
An external database
A 2-diverse table
An external database
distribution of sensitive attribute
Information loss
t-Closeness Privacy Beyond k-Anonymity and
l-Diversity, ICDE,2007
23Outline
- New problems
- New techniques
- Improvement
- Discussion
24DiscussionCritical factor of data anonymization
- 1. The form of original data
- Single table/multi-table, sequence, stream
- Single/multi-dimensional
- Static/dynamic
- Sensitive attribute single/multiple,
numeric/hierarchical - 2. The adversarys background knowledge
- Strong/weak background knowledge
- Table form ,conjunction rule or statistics
- The amount of background knowledge
- 3. The metrics of privacy
- disclosure risk/the amount of disclosed data
- 4. The metrics of information loss
- The size of equivalence class/times of
generalization
25DiscussionCritical factor of data anonymization
(cont.)
- 5. The anonymization techniques
- Generalization/suppression/randomization/mapping/e
ncryption - 6. the usage of the anonymized data
- General purpose/specific purpose
- 7. Attacks to anonymized data
-
26Thanks!
?