Title: Privacy Preserving Data Mining
1Secure Multiparty Computation Multi-round
protocols
Li Xiong CS573 Data Privacy and Security
2Secure multiparty computation
- General circuit based secure multiparty
computation methods - Specialized secure multiparty computation
protocols - Decision tree mining across horizontally
partitioned data - Secure sum, secure union
- Association rule mining across horizontally
partitioned data - Most of them rely on cryptographic primitives and
are still expensive - Multi-round protocols as an alternative
3Multi-round protocols
- Max/min, top k
- k-th element protocol using secure comparison
(Aggarwal 04) - Multi-round probabilistic protocols (Xiong 07)
- OR (Union)
- Commutative encryption based
- Multi-round probabilistic protocols (Bawa 03)
Secure computation of the k-ranked element,
Aggarwal, 2004 Preserving Privacy for outsourcing
aggregation services, Xiong, 2007 Privacy
Preserving Indexing of Documents on the Network,
Bawa, 2003
4K-th element (Aggarwal 04)
- Input
- Di, i 1, 2, , s
- k
- Range of data values alpha, beta.
- Size of the union of database n
- Output
- The k-th ranked elements in the union of Di
Secure computation of the k-ranked element,
Aggarwal, 2004
5kth element protocol
- Initialize
- Each party ranks its elements in ascending order.
Initialize current range a,b to alpha, beta,
set n sum Di - Repeat until done
- Set m (ab)/2
- Each party computes li, number of elements less
than m, and gi, number of elements greater than m - If sum(li) lt k-1 and sum(gi) lt n-k, done
- If sum(li) gtk, set b m-1, output 0
- If sum(gi) gt n-k1, set a m1, output 1
6Cost
- Number of rounds logM where M is the range size
- Each round requires two secure sums and two
secure comparisons
7Multi-round protocols
- Can we get away from cryptographic primitives?
- Multi-round protocols idea
- Use randomizations (random response)
- Utilize inherent network anonymity of multiple
nodes - Multi-round protocols
- May not be completely secure
- May not be completely accurate
8Multi-round protocols
- Multi-round probabilistic protocols for max/min
and top k (Xiong 07) - Multi-round OR (union) protocol (Aggarwal 04)
9Protocol Structure
- Random response (Warner 1965)
- Multi-round randomized protocol
- Randomized local computation
- Multi-node anonymity
- Assumption semi-honest model
Preserving Privacy for outsourcing aggregation
services, Xiong, 2007
10A Naïve Max/Min Protocol
gi-1gtvi gi-1ltvi
gi gi-1 vi
- Add in randomization how, when, and how much?
11Max Protocol Random response
- Random response at node i
gi-1gtvi gi-1(r)ltvi
gi(r) gi-1(r) w/ prob Pr random number w/ prob 1-Pr vi
12Max Protocol multi-round random response
- Multiple rounds
- Randomization Probability at round r
- Pr(r)
- Local algorithm at round r and node i
gi-1(r)gtvi gi-1(r)ltvi
gi(r) gi-1(r) w/ prob Pr rand gi-1(r), vi) w/ prob 1-Pr vi
13Max Protocol - Illustration
Start
18
35
32
0
D2
D2
30
10
32
40
35
18
35
32
20
40
D3
D4
32
40
35
14PrivateTopK Protocol
Gi(r)topk(Gi-1(r) U Vi) Vi Gi(r)
Gi-1(r) m Vi if m0 then Gi(r)
Gi-1(r) else with probability 1-Pr(r)
Gi(r) Gi-1(r) with probability
Pr Gi(r)1k-m Gi-1(r)1k-m
Gi(r)k-m1k a sorted list of m
random values generated from
min(Gi(r)k-delta,Gi-1(r-1)k-m1),
Gi(r)k) end
15Min/Max Protocol - Correctness
- Precision bound
- Converges with r
- Smaller p0 and d provides faster convergence
16Min/Max Protocol - Cost
- Communication cost
- single round O(n)
- Minimum of rounds given
- precision guarantee (1-e)
17Min/Max Protocol - Security
- Probability/confidence based metric P(CIR,R)
- Different types of exposures based on claim
- Data value via
- Data ownership Vi contains a
- Loss of Privacy (LoP) P(CIR,R) P(CR)
- Information entropy based metric
- Loss of privacy as a measure of randomness of
information H(DR) - H(DIR,R)
18Min/Max Protocol Security (Analysis)
- Upper bound for average expected LoP
- max r 1/2r-1 (1-P0dr-1)
- Larger p0 and d provides better privacy
19Min/Max Protocol Security (Experiments)
- Loss of privacy decreases with increasing number
of nodes - Probabilistic protocol achieves better privacy
(close to 0) - When n is large, anonymous protocol is actually
okay!
20Union
- Commutative encryption based approach
- Number of rounds 2 rounds
- Each round encryption and decryption
- Multi-round random-response approach?
21Vector
p1
p2
pc
VG
1
0
1
0
1
1
OR
OR
OR
0
0
0
- Each database has a boolean vector of the data
items - Union vector is a logical OR of all vectors
Privacy Preserving Indexing of Documents on the
Network, Bawa, 2003
22Group Vector Protocol
Processing of VG at ps of round r
Pex1/2r, Pin1-Pex for(i1 iltL i) if
(Vsi1 and VGi0) Set VGi1 with
prob. Pin if (Vsi0 and VGi1)
Set VGi0 with prob. Pex
p2
pc
p1
r1, Pex1/2, Pin1/2
r2, Pex1/4, Pin3/4
23Open issues
- Tradeoff between accuracy, efficiency, and
security - How to quantify security
- How to design adjustable protocols
- Can we generalize the algorithms for a set of
operators based on their properties - Operators sum, union, max, min
- Properties commutative, associative, invertible,
randomizable
24Enjoy the spring break!