Title: When Random Sampling Preserves Privacy
1When Random Sampling Preserves Privacy
- Kamalika Chaudhuri
- U.C.Berkeley
Nina Mishra U.Virginia
2The Problem
Sanitizer
Sanitized Database
Database
- Setting
- Table Set of rows
- Sanitizer Releases each row with probability p
- What are the conditions under which this
sanitizer preserves privacy?
3Search Data
- AOL released user search data
- Replaced usernames with random ids
4Search Data
Kamalika
Cynthia
Nina
Berkeley restaurants Low degree spanning
trees Tickets to India Privacy
sampling Airfare Santa Barbara
Traffic on 101N Restaurants Mountain
View Rank Aggregation Memory bound
functions Crypto registration
Falafel Charlottesville Query
Auditing Clustering streaming Tickets to
SFO Privacy sampling
5 U.S. Census Data
- Random sample of preprocessed data
- Removing unique values
- Merging cells with less than a threshold number
of individuals
6Privacy Definition DMNS06,
S
T
T
- ?-Indistinguishability
- Two tables T, T, differ by a single row
- S Output of the sanitizer
- PrS T (1 ?) PrS T
7An Example
S
T
T
- Cannot always get ?-Indistinguishability with
random sampling - T n rows with value 0
- T n-1 rows with value 0, 1 row with value 1
- S 1 row with value 1, s 1 rows with value 0
8Privacy DefinitionDKMMiNa06,BDMN05
S
T
T
- (?,?)-Indistinguishability
- Two tables T, T, differ by a single row
- S Output of the sanitizer
- With probability at least 1 - ?,
- PrS T (1 ?) PrS T
9An Example
S
T
T
- Cannot always get (?,?)-Indistinguishability for
all tables - A table where all rows have unique values
10When does Random Sampling preserve Privacy?
- Parameters
- (?, ?)-indistinguishability
- k number of distinct values in T
- t number of values which occur at most
log(k/?)/? times in T - Theorem This can be guaranteed if
- p lt ? (if t 0)
- p lt Õ(? ? /t)
11Classification of Values
For (?, ?)-indistinguishability
Rare Value
Infrequent Value
Common Value
12Rare Values
S
T
T
- If a rare value v is observed in a random sample,
- PrSTgt(1 ?/log(k/d)) PrST
13Common Values
S
T
T
- For a common value v,
- PrST PrST
- Typically, the number of rows with a common value
is close to its expectation
14Infrequent Values
S
T
T
- For an infrequent value v,
- PrST PrST
- Typically, the number of rows with an infrequent
value is at most log(k/?) away from its expected
value
15Properties of a Good Sample
- A sample S is ?-indistinguishable if
- No rare values
- The number of rows with common value v is within
a constant factor of expectation - The number of rows with infrequent value v is at
most an additive O(log(k/?)) more than its
expected value
16When does Random Sampling preserve Privacy?
- Such a sample occurs with probability at least 1
- ? if - p lt ? (if t0)
- p lt Õ(? ? /t)
17Utility of Random Sampling
- Assuming no rare values
- Error in the frequency of each value additive
1/vn - DMNS06 Estimates histogram with an additive
error of 1/n in each frequency - Sampling may give a compact representation of the
histogram
18Conclusions
- Random sampling preserves privacy only when there
are few rare values - With rare values, the probability of failure can
be high - ? ?(1/n) as opposed to 1/2n DKMMiNa06,
BDMN05 - Error in estimating the frequency of each value
can be high - Additive 1/vn as opposed to 1/n of DMNS06
19 20The Problem
- What are the conditions under which this
sanitizer preserves privacy?