Li Feng

1 / 26

About This Presentation

Title:

Li Feng

Description:

The first time to answer: Is somebody in the published dataset? ... The distribution of a sensitive attribute in any equivalence class is close to ... –

Number of Views:98

Avg rating:3.0/5.0

Slides: 27

Provided by: lif7

Category:

more less

Transcript and Presenter's Notes

Title: Li Feng

1
State-of-the-Art in Privacy Preservation

Li Feng
Sept. 20

2
Statistics
Table 1 Statistics on privacy preservation
1
1
12
14
Table 2 Statistics on anonymization
4
2
6
12
3
Outline

New problems
New techniques
Improvement
Discussion

4
New ProblemsSchema and data matching

Record matching
Identifying common information shared by two data
sourse
Propose a protocol for privacy-preserving record
matching between two parties that can have
different schemas and privacy requirements also
at schema level

Novelty
obtaining privacy by embedding the records of
each of the two parties in a vector space
(SparseMap)
Can handle different schemas
Approximate matching

Privacy Preserving Schema and Data
Matching,SIGMOD,2007
5
New ProblemsSchema and data matching (cont.)
6
New ProblemsSchema and data matching (cont.)
7
New ProblemsSchema and data matching (cont.)

Limitation
Using a trust third party
Only applicable for two-party scenario

8
New ProblemsHiding the presence

all the existing anonymization methods are
concern about weaken the linkage between
individual and sensitive values
The first time to answer Is somebody in the
published dataset?

Hiding the Presence of Individuals from Shared
Databases,SIGMOD,2007
9
New ProblemsHiding the presence (cont.)
The probability of a record in a generalized
table T is between ,
10
New ProblemsRe-publication of dynamic datasets

Static dataset
Record insertion
Record insertion and deletion

?
m-Invariance Towards Privacy Preserving
Re-publication of Dynamic Datasets,SIGMOD,2007 Pri
vacy, Accuracy, and Consistency Too A Holistic
Solution to Contingency Table Release,
PODS,2007 Maintaining K-Anonymity against
Incremental Updates,SSDBM,2007
11
(No Transcript)
12
New ProblemsRe-publication of dynamic
datasets(cont.)

Most of the methods for static dataset is not
secure if the dataset is dynamic
In dynamic environment, the previous released
data will enhance the adversaries background
knowledge privacy breach is easier
We should process the inserted, deleted and
stayed data in different strategy

13
Outline

New problems
New techniques
Improvement
Discussion

14
New techniquesspace mapping

Map the multi-dimensional data to 1D data
Its easier to anonymize 1D data

Fast Data Anonymization with Low Information
Loss, VLDB,2007
15
New techniquesspace mapping (cont.)

A double-edged sword!
Adapting space-mapping techniques, we just need
to concentrate on optimizing 1D data, but
Find a proper mapping method is a problem
For different data, proper may be different

16
New techniquesspatial index

Observation
K-anonymizing a data set is similar to
building a spatial index over the data set

The performance and characteristics of
anonymization is totally relied on the index
techniques!
K-Anonymization as Spatial Indexing Toward
Scalable and Incremental Anonymization, VLDB,2007
17
Outline

New problems
New techniques
Improvement
Discussion

18
Improvement

Relax/enhance constrains background knowledge
Concentrate on specific applicable problem
More safer anonymization principle

19
Improvementbased on background knowledge

Minimality principle
a k-anonymization should not generalize,
suppress, or distort the data more than it is
necessary to achieve k-anonymity
If the adversary knows that
1) the published data is generalized according
to minimality principle
2) The anonymization goal (e.g. k-anonymity,
l-diversity)
Then the existing anonymization method is not
secure!

Enhanced Background knowledge
Minimality Attack in Privacy Preserving Data
Publishing,VLDB,2007
20
Improvementbased on background knowledge

All the existing anonymization method is base on
the following background knowledge assumption
The adversary can obtain a background knowledge
table we know exactly what the adversary has

background knowledge table

Assume the data owner just knows how much
background knowledge the adversary has the
number of basic units of knowledge

Worst Case Background Knowledge,ICDE,2007
21
Improvementspecific applicable problem
Original microdata table
Original microdata table

Query 1.The sum of salaries of those with age in
35,55

Query 1.The sum of salaries of those with age in
35,55

Goal Provide privacy preservation while
answering aggregate query as accuracy as possible
Privacy preservation
(k, e)-anonymous
Query answering
Help table optimal partition algorithm

Result 210K,615K
Result 530K,540K
3-anonymous microdata table
3-anonymous microdata table after permutation
Aggregate Query Answering on Anonymized
Tables,ICDE,2007
22
ImprovementMore safer anonymization principle

K-anonymity every record in the published table
is identical to at least k 1 other records

L-diversityEvery QI group contains at least l
well-represented sensitive values

K-anonymity only concern on quasi-identifier
l-diversity concern on both quasi-identifier and
sensitive attribute (l-diversity is still not
safe!)
t-closeness
The distribution of a sensitive attribute in any
equivalence class is close to the distribution of
the attribute in the overall table

sensitive attribute
A 2-anonymity table
An external database
A 2-diverse table
An external database
distribution of sensitive attribute
Information loss
t-Closeness Privacy Beyond k-Anonymity and
l-Diversity, ICDE,2007
23
Outline