Title: Data Collection With SelfEnforcing Privacy
1Data Collection WithSelf-Enforcing Privacy
- Philippe Golle, PARC
- Frank McSherry, Microsoft Research
- Ilya Mironov, Microsoft Research
2Roadmap
- Problem
- Solutions
- Scheme 1 no data disclosure
- Scheme 2 randomized response
- Scheme 3 accurate data, interactive process
- Future research directions
3A pollster conducts a survey
4(No Transcript)
5(No Transcript)
6(No Transcript)
7Data quality - accuracy - participation
better privacy
perception of
8From the horses mouth
- Survey of the Census Bureaus field staff
- 18 believe that ACCIDENTAL release of
confidential data may occur within the next 5-10
years - 19 believe that MALICIOUS release of
confidential data may occur within the next 5-10
years - Source T. Mayer, Interviewer attitudes about
privacy and confidentiality, 2001
9Solutions?
- Encryption (TLS)
- Stops an eavesdropper, not the pollster
- Solemn pledge (aka privacy policy)
- Why should we believe it?
- Privacy-preserving datamining and disclosure
- Assumes honest pollster
- Randomized response (aka lying)
- May hurt utility, only limited privacy
10Threat Model
- Pollster may be corrupt!
- Privacy goal
- Deter corrupt pollster from releasing any
sensitive information submitted by individual
respondents
11Solution self-enforcing privacy
- Basic idea punish the pollster if it leaks
sensitive information - A mechanism for submitting data to an
untrustworthy pollster such that - Leakage of sensitive data can be caught and
publicly verified - If sensitive data is not leaked, the probability
of wrongly indicting the pollster is negligible
PRIVACY FOR RESPONDENTS
SECURITY FOR THE POLLSTER
12Self-enforcing privacy solutions?
- Auditors to check pollsters compliance with
privacy policy - But audits are expensive and incomplete
- Audits do not help with post-mortem or forensic
evidence - Tainted data
- Users submit data they can easily recognize
- e.g., use a unique e-mail address and monitor it
- But cannot prove misbehavior to third party
- Our approach
- A kind of publicly verifiable tainted data
13bond
pollster
bounty-hunter
respondents
14bond
bait
bait
bait
pollster
bounty-hunter
respondents
15Homomorphic encryption
- Public-key encryption
- E(M), E(N) ? E(MN)
- E(M), a ? E(aM)
-
- ElGamal of gM
? E(M)
E(M)
16Scheme 1 Self-enforcing privacy
011 01
0 1 1 0
1 0
17Scheme 1 Self-enforcing privacy
0 1 1 1
Alice Bob Charlie David
0 1 1
0
18Scheme 1 Privacy for respondents
0 1 1 1
Alice Bob Charlie David
19Theorem
- If
- k secret bits
- pollster adds ½-e noise
- a-fraction are baits
- Then
- with more than k/(ae2) leaked bitsthe bond can
be claimed
20Example
- If
- 160 secret bits
- pollster adds 10 noise
- 10-fraction are baits
- Then
- with more than 8,000 leaked bitsthe bond can be
claimed
21Security for the pollster
0 1 1 1
Alice Bob Charlie David
221600 Pennsylvania ave, DC
23Scheme 1
- OK except NO meaningful release of data
24Randomized response Warner 65
- A method for getting honest responses to
sensitive questions - Assume the respondent must answer a binary
(Yes/No) question - Example Did you cut the cherry tree?
- The respondent flips a biased coin
- Answers truly with probability p gt .5
- George Washington Yes
- Lies with probability 1 - p
- George Washington No
- The respondent does not reveal the outcome of the
coin flip
25Randomized response
0 0 0 1
-
0
0 0 1 1
Alice Bob Charlie David
0
1
1
26Differential privacy
- Differential privacy a privacy definition that
- guarantees uncertainty about any one record
- permits disclosure of aggregate information
Details Cynthia Dwork, Differential Privacy,
ICALP 2006
27Randomized response
0 0 0 1
f(e)-differential privacy
0
0 0 1 1
Alice Bob Charlie David
0
1
1
e-differential privacy
28Scheme 2 (randomized response)
- release of aggregate data
- imprecise responses
29Precise answers
0 1 0
0 1 1 1
Alice Bob Charlie David
f(r)
1
dense (RSA)
30Indictment process
Your honor, Exhibit 1 r1, f(r1) decrypts to
0 Exhibit 100 r100, f(r100) decrypts to 1
No! f(r1) decrypts to 1!
Guilty as charged
Not guilty
no contest
30
31Analysis
- Privacy of the respondents
- Security for the pollster
- differential privacy definition
32Scheme 3
- release of aggregate data
- accurate responses
- interactive indictment process
33Three schemes
34Research directions
- Achieve all three properties
- release of aggregate data
- accurate responses
- non-interactive indictment process
- Better schemes
- assume some coordination?
- Tighter analysis of disclosure policies
- variants/alternative to differential privacy?
- Rational adversary game theory connection