Li Feng - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Li Feng

Description:

The first time to answer: Is somebody in the published dataset? ... The distribution of a sensitive attribute in any equivalence class is close to ... – PowerPoint PPT presentation

Number of Views:98
Avg rating:3.0/5.0
Slides: 27
Provided by: lif7
Category:
Tags: attribute | feng

less

Transcript and Presenter's Notes

Title: Li Feng


1
State-of-the-Art in Privacy Preservation
  • Li Feng
  • Sept. 20

2
Statistics
Table 1 Statistics on privacy preservation
1
1
12
14
Table 2 Statistics on anonymization
4
2
6
12
3
Outline
  • New problems
  • New techniques
  • Improvement
  • Discussion

4
New ProblemsSchema and data matching
  • Record matching
  • Identifying common information shared by two data
    sourse
  • Propose a protocol for privacy-preserving record
    matching between two parties that can have
    different schemas and privacy requirements also
    at schema level
  • Novelty
  • obtaining privacy by embedding the records of
    each of the two parties in a vector space
    (SparseMap)
  • Can handle different schemas
  • Approximate matching

Privacy Preserving Schema and Data
Matching,SIGMOD,2007
5
New ProblemsSchema and data matching (cont.)
6
New ProblemsSchema and data matching (cont.)
7
New ProblemsSchema and data matching (cont.)
  • Limitation
  • Using a trust third party
  • Only applicable for two-party scenario

8
New ProblemsHiding the presence
  • all the existing anonymization methods are
    concern about weaken the linkage between
    individual and sensitive values
  • The first time to answer Is somebody in the
    published dataset?

Hiding the Presence of Individuals from Shared
Databases,SIGMOD,2007
9
New ProblemsHiding the presence (cont.)
The probability of a record in a generalized
table T is between ,
10
New ProblemsRe-publication of dynamic datasets
  • Static dataset
  • Record insertion
  • Record insertion and deletion

?
m-Invariance Towards Privacy Preserving
Re-publication of Dynamic Datasets,SIGMOD,2007 Pri
vacy, Accuracy, and Consistency Too A Holistic
Solution to Contingency Table Release,
PODS,2007 Maintaining K-Anonymity against
Incremental Updates,SSDBM,2007
11
(No Transcript)
12
New ProblemsRe-publication of dynamic
datasets(cont.)
  • Most of the methods for static dataset is not
    secure if the dataset is dynamic
  • In dynamic environment, the previous released
    data will enhance the adversaries background
    knowledge privacy breach is easier
  • We should process the inserted, deleted and
    stayed data in different strategy

13
Outline
  • New problems
  • New techniques
  • Improvement
  • Discussion

14
New techniquesspace mapping
  • Map the multi-dimensional data to 1D data
  • Its easier to anonymize 1D data

Fast Data Anonymization with Low Information
Loss, VLDB,2007
15
New techniquesspace mapping (cont.)
  • A double-edged sword!
  • Adapting space-mapping techniques, we just need
    to concentrate on optimizing 1D data, but
  • Find a proper mapping method is a problem
  • For different data, proper may be different

16
New techniquesspatial index
  • Observation
  • K-anonymizing a data set is similar to
    building a spatial index over the data set

The performance and characteristics of
anonymization is totally relied on the index
techniques!
K-Anonymization as Spatial Indexing Toward
Scalable and Incremental Anonymization, VLDB,2007
17
Outline
  • New problems
  • New techniques
  • Improvement
  • Discussion

18
Improvement
  • Relax/enhance constrains background knowledge
  • Concentrate on specific applicable problem
  • More safer anonymization principle

19
Improvementbased on background knowledge
  • Minimality principle
  • a k-anonymization should not generalize,
    suppress, or distort the data more than it is
    necessary to achieve k-anonymity
  • If the adversary knows that
  • 1) the published data is generalized according
    to minimality principle
  • 2) The anonymization goal (e.g. k-anonymity,
    l-diversity)
  • Then the existing anonymization method is not
    secure!

Enhanced Background knowledge
Minimality Attack in Privacy Preserving Data
Publishing,VLDB,2007
20
Improvementbased on background knowledge
  • All the existing anonymization method is base on
    the following background knowledge assumption
  • The adversary can obtain a background knowledge
    table we know exactly what the adversary has
  • background knowledge table
  • Assume the data owner just knows how much
    background knowledge the adversary has the
    number of basic units of knowledge

Worst Case Background Knowledge,ICDE,2007
21
Improvementspecific applicable problem
Original microdata table
Original microdata table
  • Query 1.The sum of salaries of those with age in
    35,55
  • Query 1.The sum of salaries of those with age in
    35,55
  • Goal Provide privacy preservation while
    answering aggregate query as accuracy as possible
  • Privacy preservation
  • (k, e)-anonymous
  • Query answering
  • Help table optimal partition algorithm

Result 210K,615K
Result 530K,540K
3-anonymous microdata table
3-anonymous microdata table after permutation
Aggregate Query Answering on Anonymized
Tables,ICDE,2007
22
ImprovementMore safer anonymization principle
  • K-anonymity every record in the published table
    is identical to at least k 1 other records
  • L-diversityEvery QI group contains at least l
    well-represented sensitive values
  • K-anonymity only concern on quasi-identifier
  • l-diversity concern on both quasi-identifier and
    sensitive attribute (l-diversity is still not
    safe!)
  • t-closeness
  • The distribution of a sensitive attribute in any
    equivalence class is close to the distribution of
    the attribute in the overall table

sensitive attribute
A 2-anonymity table
An external database
A 2-diverse table
An external database
distribution of sensitive attribute
Information loss
t-Closeness Privacy Beyond k-Anonymity and
l-Diversity, ICDE,2007
23
Outline
  • New problems
  • New techniques
  • Improvement
  • Discussion

24
DiscussionCritical factor of data anonymization
  • 1. The form of original data
  • Single table/multi-table, sequence, stream
  • Single/multi-dimensional
  • Static/dynamic
  • Sensitive attribute single/multiple,
    numeric/hierarchical
  • 2. The adversarys background knowledge
  • Strong/weak background knowledge
  • Table form ,conjunction rule or statistics
  • The amount of background knowledge
  • 3. The metrics of privacy
  • disclosure risk/the amount of disclosed data
  • 4. The metrics of information loss
  • The size of equivalence class/times of
    generalization

25
DiscussionCritical factor of data anonymization
(cont.)
  • 5. The anonymization techniques
  • Generalization/suppression/randomization/mapping/e
    ncryption
  • 6. the usage of the anonymized data
  • General purpose/specific purpose
  • 7. Attacks to anonymized data

26
Thanks!
?
Write a Comment
User Comments (0)
About PowerShow.com