Private Analysis of Data Sets - PowerPoint PPT Presentation

About This Presentation
Title:

Private Analysis of Data Sets

Description:

The basic tool: Homomorphic Encryption. Semantically secure public key encryption ... Server uses homomorphic properties to compute. y Enc( r P(y) y) (r is random) ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 30
Provided by: Ben5155
Category:

less

Transcript and Presenter's Notes

Title: Private Analysis of Data Sets


1
Private Analysis of Data Sets
  • Benny Pinkas
  • HP Labs, Princeton

2
A story
Were experiencing a lot of fraud lately
Here too..
I cant find a pattern to recognize fraud in
advance..
Neither can I..
  • But, what about
  • Patients privacy
  • Business secrets

Maybe we should share information..
Have you heard of Secure function evaluation ?
This is all theory. It cant be efficient.
3
New Opportunities for Interaction
  • Between
  • Enterprises, and government agencies holding
    sensitive data.
  • P2P users
  • Mobile wireless crowds (PDAs, cell phones)
  • What about privacy?
  • A bidirectional approach
  • Finding what is actually needed
  • Designing useful and efficient cryptographic tools

4
Cryptographic Protocols for Privacy Preserving
Computation
y
x
Input
F(x,y) and nothing else
Output
y
As if
x
F(x,y)
F(x,y)
5
Does the trusted party scenario make sense?
y
x
F(x,y)
F(x,y)
  • We cannot hope for more privacy
  • Does the trusted party scenario make sense?
  • Are the parties motivated to submit their true
    inputs?
  • Can they tolerate the disclosure of F(x,y)?
  • If so, we can implement the scenario without a
    trusted party.

6
Secure Function Evaluation Yao,GMW,BGW
  • F(x,y) A public function.
  • Represented as a Boolean circuit C(x,y).
  • Implementation
  • O(X) oblivious transfers. O(C)
    communication.
  • Pretty efficient for small circuits! (but what
    about
  • larger circuits?)

7
An equality circuit
1 if xy 0 otherwise

x
y
8
Cryptographic methods vs. randomization methods
overhead
Our goal
inaccuracy
lack of privacy
9
Examples of Simple Privacy Preserving Primitives
(with reasonable solutions)
  • Is X Y? Is X gt Y?
  • What is X ? Y? What is median of X ? Y?
  • Auctions (negotiations). Many parties, private
    bids. Compute the winning bidder and the sale
    price, but nothing else. NPS
  • Voting
  • Add privacy to data mining algs (ID3 LP)

10
Private Set Intersection
  • with
  • Mike Freedman, NYU
  • Kobbi Nissim, MSR

11
Applications of Set Intersection
Government agency B
Government agency A
People on welfare
Expensive car buyers
Compute intersection and nothing else
12
Computing the Intersection
  • Private Equality Test (PET)
  • Alice x. Bob y.
  • Output 1 iff xy
  • Privacy preserving solutions
  • Cannot use hash functions alone
  • Yao, FNW, NP
  • Generalization list intersection
  • X x1, , xn Y y1, , yn

13
The basic tool Homomorphic Encryption
  • Semantically secure public key encryption
  • Given Enc(M1), ENC(M2), can compute (without
    knowing the decryption key)
  • Enc(M1M2)
  • Enc(c M1) for any constant c.
  • I.e. Enc(a0)Enc(a1)xEnc(an)xn Enc(P(x))
  • Examples El Gamal, Paillier, DJ.

14
The Scenario
  • Client X x1, , xn
  • Server Y y1, , yn
  • Output
  • Client learns X ? Y.
  • Server learns nothing.

15
The Protocol
  • Client defines a polynomial of degree n whose
    roots are x1,,xn
  • P(y) (x1-y)(x2-y)(xn-y)
  • anyn a1y a0
  • Sends to server homomorphic encryptions of
    coefficients
  • Enc(an),, Enc(a0)
  • (only the client can decrypt)

16
The Protocol
  • Server uses homomorphic properties to compute
  • ?y Enc( rP(y) y) (r is random)
  • If y?X?Y result is Enc(r0y)Enc(y), otherwise
    result is Enc(random).
  • Server sends (permuted) results to C.
  • C decrypts, compares to its list.

17
Security
  • Bad server? The server only sees semantically
    secure encryptions. Learning about Cs input
    breaking enc.
  • Bad client? The client can, given only the output
    X?Y, simulate her view in the protocol. (I.e.
    she generates encryptions of items in X?Y, and of
    random items.)

18
Efficiency
  • Client encrypts and decrypts n values
  • Communication is O(n)
  • Server
  • For each input computes Enc(rP(y)y), i.e. n
    exponentiations.
  • Total O(n2) exponentiations
  • Can use hashing to reduce overhead to O(n lnln
    n).

19
Is Approximation easier?
  • Can we approximate size of intersection (i.e.
    scalar product) with sublinear overhead?
  • Lower bound ?
  • Approximating X?Y within 1 ? e factor requires
    ?(n) communication (?constant e).
  • True even for randomized algorithms.
  • Proof reduction to Razborovs lower bound for
    Disjointness.
  • Upper bound protocols with matching overhead.

20
Secure Computation of the Kth-ranked element
  • with
  • Gagan Aggarwal, Stanford
  • Nina Mishra, HPL

21
Secure Computation of the Kth-ranked element
  • Inputs
  • A SA B SB
  • Large sets of unique items (?D).
  • Theres also the multi-party scenario
  • Output x ? SA ? SB
  • s.t. y yltx, y?SA?SB k-1
  • Median k (SA SB) / 2

22
Motivation
  • Basic statistical analysis of distributed data
  • E.g. histogram of salaries in competing business
    in the same area
  • Sometimes the parties might want to hide the size
    of their inputs

23
Some information is always revealed
  • The Kth-ranked element reveals some information
  • Suppose SA x1,,x1000
  • Median of SA ? SB x400
  • Party A now learns that SB contains at least 200
    elements smaller than x400
  • But she shouldnt learn more

24
Results, and previous work
  • Previous work generic constructions overhead
    at least linear in k.
  • New results
  • Two-party log k secure comparisons of log D bit
    numbers.
  • Multi-party log D simple computations with log D
    bit numbers.

25
An (insecure) two-party median protocol
RA
LA
SA
mA
mA lt mB
SB
RB
LB
mB
LA lies below the median, RB lies above the
median. New median is same as original median.
Recursion ? Need log n rounds (suppose each set
contains 2i items)
26
Secure two-party median protocol
A deletes x?SA s.t. x lt mA. B deletes x?SB s.t.
x gt mB.
YES
A finds median of SA, call it mA B finds
median of SB, call it mB
mA lt mB
A deletes x?SA s.t. x gt mA. B deletes x?SB s.t.
x lt mB.
NO
Secure comparison (e.g. a small circuit)
27
Proof of security
  • Simulation Given the protocols output, each
    party can simulate the execution of the protocol

SA
median
First comparison mA lt mB
Second comparison mA gt mB
28
Arbitrary inputs, arbitrary k
SA
K
2i
SB
Now, compute the median of two sets of size k
Size should be a power of 2
median of new inputs kth element of original
inputs
29
Conclusions
  • Efficient privacy preserving primitives for basic
    tasks
  • Open problems
  • Intersection approximate matching?
  • Median clustering?
  • Theory and applications can and should interact
  • Tools from the theory of cryptography (e.g. SFE)
    can be used in applications
  • Applications can benefit from rigorous analysis
  • Theres a lot more to be done
Write a Comment
User Comments (0)
About PowerShow.com