Title: Inference Problem Privacy Preserving Data Mining
1Inference ProblemPrivacy Preserving Data Mining
2Readings and Assignments
- Required
- Pfleeger Chapter 6.5
- Interesting reading
- I. Moskowitz, M. H. Kang Covert Channels Here
to Stay? http//citeseer.nj.nec.com/cache/papers/c
s/1340/httpzSzzSzwww.itd.nrl.navy.milzSzITDzSz554
0zSzpublicationszSzCHACSzSz1994zSz1994moskowitz-co
mpass.pdf/moskowitz94covert.pdf - Jajodia, Meadows Inference Problems in
Multilevel Secure Database Management Systems
http//www.acsac.org/secshelf/book001/book001.html
, essay 24
3Indirect Information Flow Channels
- Covert channels
- Inference channels
4Communication Channels
- Overt Channel designed into a system and
documented in the user's manual - Covert Channel not documented. Covert channels
may be deliberately inserted into a system, but
most such channels are accidents of the system
design.
5Covert Channel
- Timing Channel based on system times
- Storage channels not time related communication
- Can be turned into each other
6Inference Channels
- Non-sensitive
- information
Sensitive Information
Meta-data
7Inference Channels
- Statistical Database Inferences
- General Purpose Database Inferences
8Statistical Databases
- Goal provide aggregate information about groups
of individuals - E.g., average grade point of students
- Security risk specific information about a
particular individual - E.g., grade point of student John Smith
- Meta-data
- Working knowledge about the attributes
- Supplementary knowledge (not stored in database)
9Types of Statistics
- Macro-statistics collections of related
statistics presented in 2-dimensional tables - Micro-statistics Individual data records used
for statistics after identifying information is
removed
Sex\Year 1997 1998 Sum
Female 4 1 5
Male 6 13 19
Sum 10 14 24
Sex Course GPA Year
F CSCE 590 3.5 2000
M CSCE 590 3.0 2000
F CSCE 790 4.0 2001
10Statistical Compromise
- Exact compromise find exact value of an
attribute of an individual (e.g., John Smiths
GPA is 3.8) - Partial compromise find an estimate of an
attribute value corresponding to an individual
(e.g., John Smiths GPA is between 3.5 and 4.0)
11Methods of Attacks and Protection
- Small/Large Query Set Attack
- C characteristic formula that identifies groups
of individuals - If C identifies a single individual I, e.g.,
count(C) 1 - Find out existence of property
- If count(C and D)1 means I has property D
- If count(C and D)0 means I does not have D
- OR
- Find value of property
- Sum(C, D), gives value of D
12Small/Large Query Set Attack cont.
- Protection from small/large query set attack
query-set-size control - A query q(C) is permitted only if
- N-n ? C ? n , where n ? 0 is a parameter of
the database and N is all the records in the
database
13Tracker attack
q(C) is disallowed
CC1 and C2 TC1 and C2
Tracker
C
C2
C1
q(C)q(C1) q(T)
14Tracker attack
q(C and D) is disallowed
CC1 and C2 TC1 and C2
C
Tracker
C2
C1
C and D
q(C and D) q(T or C and D) q(T)
D
15Query overlap attack
Q(John)q(C1)-q(C2)
C1
C2
Kathy
Paul
John
Eve
Max
Fred
Mitch
Protection query-overlap control
16Insertion/Deletion Attack
- Observing changes overtime
- q1q(C)
- insert(i)
- q2q(C)
- q(i)q2-q1
- Protection insertion/deletion performed as pairs
17Statistical Inference Theory
- Give unlimited number of statistics and correct
statistical answers, all statistical databases
can be compromised (Ullman)
18Inferences in General-Purpose Databases
- Queries based on sensitive data
- Inference via database constraints
- Inferences via updates
19Queries based on sensitive data
- Sensitive information is used in selection
condition but not returned to the user. - Example Salary secret, Name public
- ?Name?Salary25,000
- Protection apply query of database views at
different security levels
20Database Constraints
- Integrity constraints
- Database dependencies
- Key integrity
21Integrity Constraints
- CAB
- Apublic, Cpublic, and Bsecret
- B can be calculated from A and C, i.e., secret
information can be calculated from public data
22Database Dependencies
- Metadata
- Functional dependencies
- Multi-valued dependencies
- Join dependencies
- etc.
23Functional Dependency
- FD A ? B, that is for any two tuples in the
relation, if they have the same value for A, they
must have the same value for B. - Example FD Rank ? Salary
- Secret information Name and Salary together
- Query1 Name and Rank
- Query2 Rank and Salary
- Combine answers for query1 and 2 to reveal Name
and Salary together
24Key integrity
- Every tuple in the relation have a unique key
- Users at different levels, see different versions
of the database - Users might attempt to update data that is not
visible for them
25Example
Secret View
Name (key) Salary Address
Black P 38,000 P Columbia S
Red S 42,000 S Irmo S
Public View
Name (key) Salary Address
Black P 38,000 P Null P
26Updates
Public User
Name (key) Salary Address
Black P 38,000 P Null P
- Update Blacks address to Orlando
- Add new tuple (Red, 22,000, Manassas)
- If
- Refuse update covert channel
- Allow update
- Overwrite high data may be incorrect
- Create new tuple which data it correct
- (polyinstantiation) violate key constraints
27Updates
Secret user
Name (key) Salary Address
Black P 38,000 P Columbia S
Red S 42,000 S Irmo S
- Update Blacks salary to 45,000
- If
- Refuse update denial of service
- Allow update
- Overwrite low data covert channel
- Create new tuple which data it correct
- (polyinstantiation) violate key constraints
28Inference Problem
- No general technique is available to solve the
problem - Need assurance of protection
- Hard to incorporate outside knowledge
29The Inference Problem
- General Purpose Database
- Non-confidential data Metadata ?
- Undesired Inferences
- Web Enabled Data
- Non-confidential data Metadata (data and
application semantics) Computational Power
Connectivity ? Undesired Inferences
30Correlated Inference
Object. waterSource Object
basin waterSource place Object
district place address place
base Object fort base
Base
Place
base
Public
Public
Water source
Water Source
31Inference Control
Access Control
Confidential
Public
X
Misinfo
Organizational Data
Attacker
X
32Inference Control
Confidential
Public
Misinfo
Organizational Data
- ACCESS and INFERENCE CONTROL POLICY
- Logic-based inference detection
- Exact and partial disclosure
- Data and metadata protection
- Heterogeneous data manipulation
- Metadata discovery
33Data Mining and Privacy
- Statistical inference
- K-anonymity
- Correlation
- General inference
- Pattern ? metadata
- Biased learning
34Next Class