Defining and Achieving Differential Privacy - PowerPoint PPT Presentation

About This Presentation
Title:

Defining and Achieving Differential Privacy

Description:

Title: An Ad Omnia Approach to Defining and Achieving Private Data Analysis Author: Cynthia Dwork Last modified by: Carnegie Mellon University Created Date – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 22
Provided by: Cynth82
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Defining and Achieving Differential Privacy


1
Defining and Achieving Differential Privacy
  • Cynthia Dwork, Microsoft

TexPoint fonts used in EMF. Read the TexPoint
manual before you delete this box. AAAAAAAA
2
Meaningful Privacy Guarantees
  • Statistical databases
  • Medical
  • Government Agency
  • Social Science
  • Searching / click stream
  • Learn non-trivial trends while protecting privacy
    of individuals and fine-grained structure

3
Linkage Attacks
  • Using innocuous data in one dataset to identify
    a record in a different dataset containing both
    innocuous and sensitive data
  • At the heart of the voluminous research on hiding
    small cell counts in tabular data

4
(No Transcript)
5
The Netflix Prize
  • Netflix Recommends Movies to its Subscribers
  • Offers 1,000,000 for 10 improvement in its
    recommendation system
  • Not concerned here with how this is measured
  • Publishes training data
  • Nearly 500,000 records, 18,000 movie titles
  • The ratings are on a scale from 1 to 5
    (integral) stars. To protect customer privacy,
    all personal information identifying individual
    customers has been removed and all customer ids
    have been replaced by randomly-assigned ids. The
    date of each rating and the title and year of
    release for each movie are provided.
  • Some ratings not sensitive, some may be sensitive
  • OK for Netflix to know, not OK for public to know

6
A Publicly Available Set of Movie Rankings
  • International Movie Database (IMDb)
  • Individuals may register for an account and rate
    movies
  • Need not be anonymous
  • Visible material includes ratings, dates,
    comments
  • By definition, these ratings not sensitive

7
The Fiction of Non-PII Narayanan Shmatikov 2006
  • Movie ratings and dates are PII
  • With 8 movie ratings (of which we allow 2 to be
    completely wrong) and dates that may have a 3-day
    error, 96 of Netflix subscribers whose records
    have been released can be uniquely identified in
    the dataset.
  • Linkage attack prosecuted using the IMDb.
  • Link ratings in IMDb to (non-sensitive) ratings
    in Netflix, revealing sensitive ratings in
    Netflix
  • NS draw conclusions about user.
  • May be wrong, may be right. User harmed either
    way.

8
What Went Wrong?
  • What is Personally Identifiable Information?
  • Typically syntactic, not semantic
  • Eg, genome sequence not considered PII ??
  • Suppressing PII doesnt rule out linkage
    attacks
  • Famously observed by Sweeney, circa 1998
  • AOL debacle
  • Need a more semantic approach to privacy

9
Semantic Security Against an Eavesdropper
Goldwasser Micali 1982
  • Vocabulary
  • Plaintext the message to be transmitted
  • Ciphertext the encryption of the plaintext
  • Auxiliary information anything else known to
    attacker
  • The ciphertext leaks no information about the
    plaintext.
  • Formalization
  • Compare the ability of someone seeing aux and
    ciphertext to guess (anything about) the
    plaintext, to the ability of someone seeing only
    aux to do the same thing. Difference should be
    tiny.

10
Semantic Security for Statistical Databases?
  • Dalenius, 1977
  • Anything that can be learned about a respondent
    from the statistical database can be learned
    without access to the database
  • An ad omnia guarantee
  • Happily, Formalizes to Semantic Security
  • Recall Anything about the plaintext that can be
    learned from the ciphertext can be learned
    without the ciphertext
  • Popular Intuition prior and posterior views
    about an individual shouldnt change too much.
  • Clearly Silly
  • My (incorrect) prior is that everyone has 2 left
    feet.
  • Very popular in literature nevertheless
  • Definitional awkwardness even when used correctly

11
Semantic Security for Statistical Databases?
  • Unhappily, Unachievable
  • Cant achieve cryptographically small levels of
    tiny
  • Intuition (adversarial) user is supposed to
    learn unpredictable things about the DB
    translates to learning more than a
    cryptographically tiny amount about a respondent
  • Relax tiny?

12
Relaxed Semantic Security for Statistical
Databases?
  • Relaxing Tininess Doesnt Help
  • Dwork Naor 2006
  • Database teaches average heights of population
    subgroups
  • Terry Gross is two inches shorter than avg
    Lithuanian ?
  • Access to DB teaches Terrys height
  • Terrys height learnable from the DB, not
    learnable otherwise
  • Formal proof extends to essentially any notion of
    privacy compromise, uses extracted randomness
    from the SDB as a one-time pad.
  • Bad news for k-,l-,m- etc.
  • Attack Works Even if Terry Not in DB!
  • Suggests new notion of privacy risk incurred by
    joining DB
  • Differential Privacy
  • Privacy, when existence of DB is stipulated
  • Before/After interacting vs Risk when
    in/notin DB

13
Differential Privacy
  • K gives ?-differential privacy if for all values
    of DB and Me and all transcripts t

Pr t
14
Differential Privacy is an Ad Omnia Guarantee
  • No perceptible risk is incurred by joining DB.
  • Anything adversary can do to me, it could do
    without Me (my data).

15
An Interactive Sanitizer KDwork, McSherry,
Nissim, Smith 2006
noise
f
K
f DB ? R K (f, DB) f(DB) Noise Eg,
Count(P, DB) rows in DB with Property P
16
Sensitivity of a Function f
  • How Much Can f(DB Me) Exceed f(DB - Me)?
  • Recall K (f, DB) f(DB) noise
  • Question Asks What difference must noise obscure?
  • f maxDB, Me f(DBMe) f(DB-Me)
  • eg, ?Count 1

17
Calibrate Noise to Sensitivity
? f maxDB, Me f(DBMe) f(DB-Me)
Theorem To achieve ?-differential privacy, use
scaled symmetric noise Lap(x/R) with R ?f/?.
0
R
2R
3R
4R
5R
-R
-2R
-3R
-4R
Prx proportional to exp(-x/R) Increasing R
flattens curve more privacy Noise depends on f
and ?, not on the database
18
Calibrate Noise to Sensitivity
? f maxDB, Me f(DBMe) f(DB-Me)
Theorem To achieve ?-differential privacy, use
scaled symmetric noise Lap(x/R) with R ?f/?.
0
R
2R
3R
4R
5R
-R
-2R
-3R
-4R
19
Multiple Queries
  • For query sequence f1, , fd ?-privacy achieved
    with noise generation parameter ? Ri ? ?fi/?
    for each response.
  • Can sometimes do better.
  • Noise must increase with the sensitivity of the
    query sequence. Naively, more queries means
    noisier answers
  • Dinur and Nissim 2003 et sequelae
  • Speaks to the Non-Interactive Setting
  • Any non-interactive solution permitting too
    accurate answers to too many questions is
    vulnerable to attack.
  • Privacy mechanism is at an even greater
    disadvantage than in the interactive case can be
    exploited

20
Future Work
  • Investigate Techniques from Robust Statistics
  • Area of statistics devoted to coping with
  • Small amounts of wild data entry errors
  • Rounding errors
  • Limited dependence among samples
  • Problem the statistical setting makes strong
    assumptions about existence and nature of an
    underlying distribution
  • Differential Privacy for Social Networks, Graphs
  • What are the utility questions of interest?
  • Definitional and Algorithmic Work for Other
    Settings
  • Differential approach more broadly useful
  • Several results discussed in next few hours
  • Porous Boundary Between Inside and Outside?
  • Outsourcing, bug reporting, combating D-DoS
    attacks and terror

21
Privacy is a natural resource. Its
non-renewable, and its not yours. Conserve it.
Write a Comment
User Comments (0)
About PowerShow.com