Inference Problem Privacy Preserving Data Mining - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Inference Problem Privacy Preserving Data Mining

Description:

Title: CSCE 790 Secure Database Systems Author: FARKAS Last modified by: FARKAS, CSILLA Created Date: 1/17/2001 9:31:44 AM Document presentation format – PowerPoint PPT presentation

Number of Views:119
Avg rating:3.0/5.0
Slides: 35
Provided by: FARKAS3
Category:

less

Transcript and Presenter's Notes

Title: Inference Problem Privacy Preserving Data Mining


1
Inference ProblemPrivacy Preserving Data Mining
2
Readings and Assignments
  • Required
  • Pfleeger Chapter 6.5
  • Interesting reading
  • I. Moskowitz, M. H. Kang Covert Channels Here
    to Stay? http//citeseer.nj.nec.com/cache/papers/c
    s/1340/httpzSzzSzwww.itd.nrl.navy.milzSzITDzSz554
    0zSzpublicationszSzCHACSzSz1994zSz1994moskowitz-co
    mpass.pdf/moskowitz94covert.pdf
  • Jajodia, Meadows Inference Problems in
    Multilevel Secure Database Management Systems
    http//www.acsac.org/secshelf/book001/book001.html
    , essay 24

3
Indirect Information Flow Channels
  • Covert channels
  • Inference channels

4
Communication Channels
  • Overt Channel designed into a system and
    documented in the user's manual
  • Covert Channel not documented. Covert channels
    may be deliberately inserted into a system, but
    most such channels are accidents of the system
    design.

5
Covert Channel
  • Timing Channel based on system times
  • Storage channels not time related communication
  • Can be turned into each other

6
Inference Channels
  • Non-sensitive
  • information

Sensitive Information

Meta-data

7
Inference Channels
  • Statistical Database Inferences
  • General Purpose Database Inferences

8
Statistical Databases
  • Goal provide aggregate information about groups
    of individuals
  • E.g., average grade point of students
  • Security risk specific information about a
    particular individual
  • E.g., grade point of student John Smith
  • Meta-data
  • Working knowledge about the attributes
  • Supplementary knowledge (not stored in database)

9
Types of Statistics
  • Macro-statistics collections of related
    statistics presented in 2-dimensional tables
  • Micro-statistics Individual data records used
    for statistics after identifying information is
    removed

Sex\Year 1997 1998 Sum
Female 4 1 5
Male 6 13 19
Sum 10 14 24
Sex Course GPA Year
F CSCE 590 3.5 2000
M CSCE 590 3.0 2000
F CSCE 790 4.0 2001
10
Statistical Compromise
  • Exact compromise find exact value of an
    attribute of an individual (e.g., John Smiths
    GPA is 3.8)
  • Partial compromise find an estimate of an
    attribute value corresponding to an individual
    (e.g., John Smiths GPA is between 3.5 and 4.0)

11
Methods of Attacks and Protection
  • Small/Large Query Set Attack
  • C characteristic formula that identifies groups
    of individuals
  • If C identifies a single individual I, e.g.,
    count(C) 1
  • Find out existence of property
  • If count(C and D)1 means I has property D
  • If count(C and D)0 means I does not have D
  • OR
  • Find value of property
  • Sum(C, D), gives value of D

12
Small/Large Query Set Attack cont.
  • Protection from small/large query set attack
    query-set-size control
  • A query q(C) is permitted only if
  • N-n ? C ? n , where n ? 0 is a parameter of
    the database and N is all the records in the
    database

13
Tracker attack
q(C) is disallowed
CC1 and C2 TC1 and C2
Tracker
C
C2
C1
q(C)q(C1) q(T)
14
Tracker attack
q(C and D) is disallowed
CC1 and C2 TC1 and C2
C
Tracker
C2
C1
C and D
q(C and D) q(T or C and D) q(T)
D
15
Query overlap attack
Q(John)q(C1)-q(C2)
C1
C2
Kathy
Paul
John
Eve
Max
Fred
Mitch
Protection query-overlap control
16
Insertion/Deletion Attack
  • Observing changes overtime
  • q1q(C)
  • insert(i)
  • q2q(C)
  • q(i)q2-q1
  • Protection insertion/deletion performed as pairs

17
Statistical Inference Theory
  • Give unlimited number of statistics and correct
    statistical answers, all statistical databases
    can be compromised (Ullman)

18
Inferences in General-Purpose Databases
  • Queries based on sensitive data
  • Inference via database constraints
  • Inferences via updates

19
Queries based on sensitive data
  • Sensitive information is used in selection
    condition but not returned to the user.
  • Example Salary secret, Name public
  • ?Name?Salary25,000
  • Protection apply query of database views at
    different security levels

20
Database Constraints
  • Integrity constraints
  • Database dependencies
  • Key integrity

21
Integrity Constraints
  • CAB
  • Apublic, Cpublic, and Bsecret
  • B can be calculated from A and C, i.e., secret
    information can be calculated from public data

22
Database Dependencies
  • Metadata
  • Functional dependencies
  • Multi-valued dependencies
  • Join dependencies
  • etc.

23
Functional Dependency
  • FD A ? B, that is for any two tuples in the
    relation, if they have the same value for A, they
    must have the same value for B.
  • Example FD Rank ? Salary
  • Secret information Name and Salary together
  • Query1 Name and Rank
  • Query2 Rank and Salary
  • Combine answers for query1 and 2 to reveal Name
    and Salary together

24
Key integrity
  • Every tuple in the relation have a unique key
  • Users at different levels, see different versions
    of the database
  • Users might attempt to update data that is not
    visible for them

25
Example
Secret View
Name (key) Salary Address
Black P 38,000 P Columbia S
Red S 42,000 S Irmo S
Public View
Name (key) Salary Address
Black P 38,000 P Null P
26
Updates
Public User
Name (key) Salary Address
Black P 38,000 P Null P
  • Update Blacks address to Orlando
  • Add new tuple (Red, 22,000, Manassas)
  • If
  • Refuse update covert channel
  • Allow update
  • Overwrite high data may be incorrect
  • Create new tuple which data it correct
  • (polyinstantiation) violate key constraints

27
Updates
Secret user
Name (key) Salary Address
Black P 38,000 P Columbia S
Red S 42,000 S Irmo S
  • Update Blacks salary to 45,000
  • If
  • Refuse update denial of service
  • Allow update
  • Overwrite low data covert channel
  • Create new tuple which data it correct
  • (polyinstantiation) violate key constraints

28
Inference Problem
  • No general technique is available to solve the
    problem
  • Need assurance of protection
  • Hard to incorporate outside knowledge

29
The Inference Problem
  • General Purpose Database
  • Non-confidential data Metadata ?
  • Undesired Inferences
  • Web Enabled Data
  • Non-confidential data Metadata (data and
    application semantics) Computational Power
    Connectivity ? Undesired Inferences

30
Correlated Inference
Object. waterSource Object
basin waterSource place Object
district place address place
base Object fort base
Base
Place
base
Public
Public
Water source
Water Source
31
Inference Control
Access Control
Confidential
Public
X
Misinfo
Organizational Data
Attacker
X
32
Inference Control
Confidential
Public
Misinfo
Organizational Data
  • ACCESS and INFERENCE CONTROL POLICY
  • Logic-based inference detection
  • Exact and partial disclosure
  • Data and metadata protection
  • Heterogeneous data manipulation
  • Metadata discovery

33
Data Mining and Privacy
  • Statistical inference
  • K-anonymity
  • Correlation
  • General inference
  • Pattern ? metadata
  • Biased learning

34
Next Class
  • Midterm exam review
Write a Comment
User Comments (0)
About PowerShow.com