Title: CS 656 Paper Presentation
1CS 656 Paper Presentation
- General k-Anonymization is Hard
Presented By Shashank Gupta
k-Anonymity Model
2What is k-Anonymity?
Source Minimality Attack in Privacy Preserving
Data Publishing Raymond Chi-Wing Wong, Ada
Wai-Chee Fu, Ke Wang, Jian Pei
3What is k-Anonymity?
Source Minimality Attack in Privacy Preserving
Data Publishing Raymond Chi-Wing Wong, Ada
Wai-Chee Fu, Ke Wang, Jian Pei
4What is k-Anonymity?
Source Minimality Attack in Privacy Preserving
Data Publishing Raymond Chi-Wing Wong, Ada
Wai-Chee Fu, Ke Wang, Jian Pei
5What to prove?
NP-hardness of optimal k-anonymity For a
sufficiently large alphabet, k-anonymity is hard
for any k 3 Optimal k-anonymity Given a list
of records, minimize the number of fields
suppressed, such that for each record r, there
are k - 1 other records that are
indistinguishable from r.
Source On the Complexity of Optimal K-Anonymity,
Ryan Williams, Adam Meyerson, PODS 2004
6Why to prove?
Ends justify the means! Always? Just because you
have got the right answer(end) does not mean that
the method (means) that you employed to obtain it
is correct. Goal More efficient
algorithms (Many a times getting a correct
solution late is as bad as getting a wrong
solution)
7What is NP-Hard?
P contains all decision problems which can be
solved by a deterministic Turing machine using a
polynomial amount of computation time, or
polynomial time. NP is the set of decision
problems solvable in polynomial time by a
non-deterministic Turing machine NP-complete is
a class of problems having two properties
Any given solution to the problem can be verified
quickly (in polynomial time) the set of problems
with this property is called NP. If the
problem can be solved quickly (in polynomial
time), then so can every problem in NP. (In a
sense, NPC problems are most likely problems in
NP to be hard)
8What is NP-Hard?
NP-hard (nondeterministic polynomial-time hard),
in computational complexity theory, is a class of
problems informally "at least as hard as the
hardest problems in NP. A problem H is NP-hard
if and only if there is an NP-complete problem L
that is polynomial time Turing-reducible to
H Reducibility A problem Q is reducible to Q
(Q p Q) if any instance of Q can be easily
rephrased (in polynomial time) as an instance of
Q Linear equations are reducible to
quadratic ax b 0 becomes 0x2 ax b
0 If a decision problem is NPC, then the
optimization problem is NP-Hard
9What is NP-Hard?
Source Wikipedia NP-Hard
10Hardness of k-anonymity
Optimal k-anonymity Given a list of records,
minimize the number of fields suppressed, such
that for each record r, there are k - 1
other records that are indistinguishable from
r. We will give a reduction from k-dimensional
perfect matching to the above problem k-dimension
al perfect matching Given a collection C of
k-sets over a universe U, is there a subset S ? C
such that Every x ? U is in some k-set s in
S The sets of S are disjoint i.e. for every
s1, s2 ? S, s1 n s2 Ø Note When k 2, this
is polynomial time solvable (but the problem
is NP-hard for k 3)
Source On the Complexity of Optimal K-Anonymity,
Ryan Williams, Adam Meyerson, PODS 2004
11From 3-D perfect matching to 3-anonymity
Given an instance of 3-dim. perfect matching U
x1, x2, . . . , xn, C s1, . . . , sm such
that For all j 1, . . . , m, sj ? U and sj
3 , Define a table T of records where
Records (rows) correspond to xi ? U Attributes
(columns) correspond to sj ? C More
precisely, Ti, j 0 if xi ? sj , i
otherwise. We then ask does the optimal
3-anonymized solution suppress at most n (m -
1) fields?
Source On the Complexity of Optimal K-Anonymity,
Ryan Williams, Adam Meyerson, PODS 2004
12Example of reduction in action
Example of reduction in action U 1, 2, 3, 4,
5, 6 and C 1, 2, 3, 1, 4, 5, 4, 5, 6,
2, 3, 6 The reduction results in the table
Source On the Complexity of Optimal K-Anonymity,
Ryan Williams, Adam Meyerson, PODS 2004
13Perfect Matching 1
3-D perfect matching 1, 2, 3, 4, 5, 6
corresponds to the 3-anonymized table
Source On the Complexity of Optimal K-Anonymity,
Ryan Williams, Adam Meyerson, PODS 2004
14Perfect Matching 2
3-D perfect matching 1, 4, 5, 2, 3, 6
corresponds to
Some observations If a set sj doesnt appear
in the perfect matching, then its column is all
s If sj does appear, then 3 entries in its
column are not s
Source On the Complexity of Optimal K-Anonymity,
Ryan Williams, Adam Meyerson, PODS 2004
15Why does this work?
(Recall m number of sets in collection number
of columns in table) A group of 3 rows needs
at least 3 (m - 1) stars in order for the group
to become indistinguishable Follows from Ti,
j i if xi /? sj A group of 3 rows
corresponds to the elements of a set sj if and
only if exactly 3 (m - 1) stars are
required The rows have 0 in the jth column,
differ in other columns Thus there is a
perfect matching iff for every group of 3 rows,
exactly 3 (m - 1) stars are necessary ? n
(m - 1) stars in total So there is a 3-D perfect
matching if and only if the number of
entries suppressed in the optimal 3-anonymized
solution is n (m - 1)
Source On the Complexity of Optimal K-Anonymity,
Ryan Williams, Adam Meyerson, PODS 2004
16Thank You
17What is Hypergraph?
Sample of hypergraph X v1,v2,v3,v4,v5,v6,v7,
E e1,e2,e3,e4 v1,v2,v3,v2,v3,
v3,v5,v6,v4. Hypergraph is a
generalization of a graph, where edges can
connect any number of vertices.