Privacy Protection - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Privacy Protection

Description:

... the medical record of an ex-governor of Massachusetts from a real publication. ... A friend of Joe has the knowledge: 'Joe does not have pneumonia' ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 20
Provided by: cseCu
Category:

less

Transcript and Presenter's Notes

Title: Privacy Protection


1
Privacy Protection
  • This presentation was prepared by Yufei
    Tao.http//www.cse.cuhk.edu.hk/taoyf

2
Motivation
  • A hospital has a table, the microdata, to publish.

3
Motivation
  • Removing Name?

4
Linking attack
Voter registration list
Published table
Quasi-identifier (QI) attributes
An adversary
5
Real threats
  • Fact 87 of Americans can be uniquely identified
    by Zipcode, gender, date-of-birth.
  • Sweeney International Journal on Uncertainty,
    Fuzziness and Knowledge-based Systems, 2002
    shows that she can identify the medical record of
    an ex-governor of Massachusetts from a real
    publication.

6
Real threats (cont.)
  • Banking data
  • Tax records
  • Exam scores
  • Creditcard transactions

7
Privacy protection
  • Distort the dataset before releasing it.
  • Concerns
  • Privacy
  • Utility the dataset must be useful for research.
  • Paradox privacy ?, utility ?.

8
Main issues
  • Privacy principle
  • What do we mean by adequate privacy protection?
  • Distortion algorithm
  • How to achieve the above principle?

9
Generalization
  • Replace a QI-value with a fuzzier form.

QI attributes
Sensitive attribute
4 QI groups
10
k-anonymity Sweeney International Journal on
Uncertainty, Fuzziness and Knowledge-based
Systems 02
  • Each QI-group has at least k tuples.
  • 2-anonymous generalization

11
Defects of k-anonymity
  • What is the disease of Joe?

No diversity in this QI group.
A voter registration list
12
l-diversity Machanavajjhala et al. ICDE 06
  • Each QI group should have at least l
    well-represented sensitive values.
  • Different ways to definewell-representativeness
    .
  • Naive l different values.

l 2
13
Defects of the naive interpretation
  • Assume that Joe is identified in the following QI
    group. What is the probability that he contracted
    HIV?
  • Implication The frequency of the most frequent
    sensitive value in a QI group should be bounded
    by 1 / l.
  • A very popular definition of l-diversity.

98 tuples
A QI group with 100 tuples
14
Exclusive-value attacks
  • A friend of Joe has the knowledge Joe does not
    have pneumonia.
  • How likely would this friend assume that Joe had
    HIV?

50 tuples
A QI group with 100 tuples
49 tuples
15
Battling exclusive-value attacks
  • Even if an adversary can eliminate pneumonia,
    s/he can infer that Joe has HIV only with 40 / 70
    probability.

40 tuples
A QI group with 100 tuples
30 tuples
30 tuples
16
Battling 3-exclusive-values attacks
The most frequent value
The 2nd most frequent value
A QI group
The 3rd most frequent value
The 4th most frequent value
The other values
17
Battling 3-exclusive-values attacks
The most frequent value
A QI group
The other values
As many as the red box
18
Battling 3-exclusive-values attacks
  • Assume that Joe is a person in the QI group.
  • Property If an adversary can eliminate only ? 3
    diseases, s/he can correctly guess the disease of
    Joe with at most 50 probability.

HIV
pneumonia
A QI group
bronchitis
cancer
The other values
19
A short summary
  • Why data privacy?
  • How to protect it?
  • A very active research topic with urgent
    applications.
Write a Comment
User Comments (0)
About PowerShow.com