Private Matching - PowerPoint PPT Presentation

About This Presentation

Title:

Private Matching

Description:

Technological changes erode privacy: ubiquitous computing, cheap storage. ... (credit card purchases, magazine subscriptions, bank deposits, flights) ... – PowerPoint PPT presentation

Number of Views:141

Avg rating:3.0/5.0

Slides: 38

Provided by: Ben5153

Category:

more less

Transcript and Presenter's Notes

Title: Private Matching

1
Privacy Preserving Data Mining Lecture
1 Motivating privacy research, Introducing
Crypto
Benny Pinkas HP Labs, Israel
2
Course structure

Lecture 1
Introduction to privacy
Introduction to cryptography, in particular, to
rigorous cryptographic analysis.
Definitions
Proofs of security
Lecture 2
Cryptographic tools for privacy preserving data
mining.
Lecture 3
Non-cryptographic tools for privacy preserving
data mining
In particular, answer perturbation.

3
Privacy-Preserving Data Mining

Allow multiple data holders to collaborate in
order to compute important information while
protecting the privacy of other information.
Security-related information
Public health information
Marketing information
Advantages of privacy protection
protection of personal information
protection of proprietary or sensitive
information
enables collaboration between different data
owners (since they may be more willing or able to
collaborate if they need not reveal their
information)
compliance with the law

4
Privacy Preserving Data Mining

Two papers appeared in 2000
Privacy preserving data mining, Agrawal and
Srikant, SIGMOD 2000. (statistical approach)
Privacy preserving data mining, Lindell and
Pinkas, Crypto 2000. (cryptographic approach)
Why privacy now?
Technological changes erode privacy ubiquitous
computing, cheap storage.
Public awareness health coverage, employment,
personal relationships.
Historical changes Small towns vs. Cities vs.
Connected society.
Privacy is a real problem that needs to be solved

5
Some data privacy cases hospital data

Hospital data contains
Identifying information name, id, address
General information age, marital status
Medical information
Billing information
Database access issues
Your doctor should get every information that is
required to take care of you
Emergency rooms should get all medical
information that is required to take care of
whoever comes there
Billing department should only get information
relevant to billing
Problem how to stop employees from getting
information about family, neighbors, celebrities?

6
Some data privacy cases Medical Research

Medical research
Trying to learn patterns in the data, in
aggregate form.
Problem how to enable learning aggregate data
without revealing personal medical information?
Hiding names is not enough, since there are many
ways to uniquely identify a person
A single hospitals/medical researcher might not
have enough data
How can different organizations share research
data without revealing personal data?

7
Public Data

Many public records are available in electronic
form birth records, property records, voter
registration
Your information serves as an error correcting
code of your identity
Latanya Sweeney
Date of birth uniquely identifies 12 of the
population of Cambridge, MA.
Date of birth gender 29
Date of birth gender (9 digit) zip code 95
Sweeney was therefore able to get her medical
information from an annonymized database

8
Census data

A trusted party (the census bureau) collects
information about individuals
Collected data
Explicitly identifying data (names, address..)
Implicitly identifying data (combination of
several attributes)
Private data
The data should is collected to help decision
making
Partial or aggregate data should therefore made
public

9
Total Information Awareness (TIA)

Collects information about transactions (credit
card purchases, magazine subscriptions, bank
deposits, flights)
Early detection of terrorist activity
Check a chemistry book in the library, buy
something at a hardware store and something in a
pharmacy
Early collection of epidemic bursts
Early symptoms of Anthrax are similar to the flu
Check non-traditional data sources grocery and
pharmacy data, school attendance records, etc..
Such systems are developed and used
Could the collection of data be done in a privacy
preserving manner? (without learning about
individuals?)

10
Basic Scenarios

Single (centralized) database, e.g., census data
This is often a simple abstraction of a more
complicated scenario, so we better solve this one
Need to collect data and present it in a privacy
preserving way
Published data (e.g., on a CD)
A trusted party collects data and then
publishes a sanitized version
Users can do any computation they wish with the
sanitized data
For example, statistical tabulations.

11
Basic Scenarios

Multi database scenarios
Two or more parties with private data want to
cooperate.
Horizontally split Each party has a large
database. Databases have same attributes but are
about different subjects. For example, the
parties are banks which each have information
about their customers.
Vertically split Each party has some information
about the same set of subjects. For example, the
participating parties are government agencies
each with some data about every citizen.

bank 1
u1 un
u1 un
houses
u1 un
bank 2
bank
taxes
12
Issues and Tools

Best privacy can be achieved by not giving any
data, but..
Privacy tools cryptography LP00
Encryption data is hidden unless you have the
decryption key. However, we also want to use the
data.
Secure function evaluation two or more parties
with private inputs. Can compute any function
they wish without revealing anything else.
Strong theory. Starts to be relevant to real
applications.
Non-cryptographic tools AS00
Query restriction prevent certain queries from
being answered.
Data/Input/output perturbation add errors to
inputs hide personal data while keeping
aggregates accurate. (randomization, rounding,
data swapping.)
Can these be understood as well as we understand
Crypto? Provide the same level of security as
Crypto?

13
Introduction to Cryptography
14
Why learn/use crypto to solve privacy issues?

Why are we referring to crypto?
Cryptography is one of the tools we can use for
preserving privacy
A mature research area
many useful results/tools
Can reflect on our thinking how is security
defined in cryptography? How should we define
privacy?

15
What is Cryptography?
Traditionally how to maintain secrecy in
communication
Alice and Bob talk while Eve tries to listen
Bob
Alice
Eve
16
History of Cryptography

Very ancient occupation
Up to the mid 70s - mostly classified military
work
Exception Shannon, Turing
Since then - explosive growth
Commercial applications
Scientific work tight relationship with
Computational Complexity Theory
Major works Diffie-Hellman, Rivest, Shamir and
Adleman (RSA)
Recently - more involved models for more diverse
tasks.
Scope How to maintain the secrecy, integrity and
functionality in computer and communication
system.

17
Relation to computational hardness

Cryptography uses problems that are infeasible to
solve.
Uses the intractability of some problems in order
to construct secure systems.
Feasible computable in probabilistic polynomial
time (PPT)
Infeasible no probabilistic polynomial time
algorithm
Usually average case hardness is needed
For example, the discrete log problem

18
The Discrete Log Problem

Let G be a group and g an element in G.
Given y?G let x be minimal non-negative integer
satisfying the equation ygx.
x is called the discrete log of y to base g.
Example ygx mod p in the multiplicative group
of Zp (p is prime). (For example, p7, g3, y4
? x4.)
In general, it is easy to exponentiate
(using repeated squaring and the binary
representation of x)
Computing the discrete log is believed to be hard
in Zp if p is large. (E.g., p is a prime,
pgt768 bits, p2q1 and q is also a prime.)

19
Encryption

Alice wants to send a message m ? 0,1n to Bob
Set-up phase is secret
Symmetric encryption Alice and Bob share a
secret key k
They want to prevent Eve from learning anything
about the message

Ek(m)
Alice
Bob
k
k
Eve
20
Public key encryption

Alice generates a private/public key pair (SK,PK)
Only Alice knows the secret key SK
Everyone (even Eve) knows the public key PK, and
can encrypt messages to Alice
Only Alice can decrypt (using SK)

EPK(m)
Alice
Bob
SK
PK
EPK(m)
Charlie
Eve
PK
21
Rigorous Specification of Security

To define the security of a system we must
specify
What constitute a failure of the system
The power of the adversary
computational
access to the system
what it means to break the system.

22
What does learn mean?

Even if Eve has some prior knowledge of m, she
should not have any advantage in
Probability of guessing m, or probability of
guessing whether m is m0 or m1, or prob. of
computing any other function f of m ,or even
computing m
Ideally the message sent is a independent of the
message m
Implies all the above
Achievable one-time pad (symmetric encryption)
Let r?R 0,1 n be the shared key.
Let m ? 0,1 n
To encrypt m send r ? m
To decrypt z send m z ? r
Shannon achievable only if the entropy of the
shared secret is at least as large as that of m.
Therefore must use long key ?.

23
Defining security

The power of the adversary
Computational Probabilistic polynomial time
machine (PPTM)
Access to the system e.g. can it change
messages?
Passive adversary, (adaptive) chosen plaintext
attack, chosen ciphertext attack
What constitutes a failure of the system?
Recovering plaintext from ciphertext not enough
Allows for the leakage of partial information
In general, hard to answer which partial
information may/should not be leaked. Application
dependent.
How would partial information the adversary
already holds be combined with what he learns to
affect privacy?
Better Prevent learning anything about an
encrypted message
There are two common, equivalent, definitions

24
Security of Encryption Definition
1Indistinguishability of Encryptions

Adversary A chooses any X0 , X1 ??0,1?n
Receives encryption of Xb for b?R?0,1?
Has to decide whether b ? 0 or b ? 1.
For every PPTM A, choosing a pair X0 , X1
??0,1?n
Pr? A(E(X0)) 1 ? - Pr? A(E(Xb1))? 1 ?
neg(n)
(Probability is over the choice of keys,
randomization in the encryption and As coins)
Note that a proof of security must be rigorous

25
Computational Indistinguishability

Definition two sequences of distributions Dn
and Dn on 0,1n are computationally
indistinguishable if
for every polynomial p(n) and sufficiently large
n, for every probabilistic polynomial time
adversary A that receives input y ? 0,1n and
tries to decide whether y was sampled from Dn or
Dn
ProbA0 Dn - ProbA0 Dn lt
1/p(n)

26
Security of Encryption Definition 2Semantic
Security

Simulation Whatever Adversary A can compute
given an encryption of X ??0,1?n so can a
simulator S that does not get to see the
encryption of X.
A selects a distribution Dn on ?0,1?n and a
relation R(X,Y) - computable in PPT (e.g.
R(X,Y)1 iff Y is last bit of X).
X?R Dn is sampled
Given E(X), A outputs Y trying to satisfy
R(X,Y)
The simulator S does the same without access to
E(X)
Simulation is successful if A and S have the same
success probability
Successful simulation ? semantic security

27
Security of Encryption (2)Semantic Security

More formally
For every PPTM A there is a PPTM S so that
for all PPTM relations R
for X?R Dn
? Pr? R(X,A(E(X)) ? - Pr? R(X,S(?)) ? ?
is negligible.
In other words The outputs of A and S are
indistinguishable even for a test that is aware
of X.

28
Which is the Right Definition?

Semantic security seems to convey that the
message is protected
But it is usually easier to prove
indistinguishability of encryptions
Would like to argue that the two definitions are
equivalent
Must define the attack chosen plaintext attack
Adversary can obtain the encryption for any
message it chooses, in an adaptive manner
More severe attacks chosen ciphertext
The Equivalence Theorem
A cryptosystem is semantically secure if and
only if it has the indistinguishability of
encryptions property

29
Equivalence Proof (informal)

Semantic security ? Indistinguishability of
encryptions
Suppose no indistinguishability
A chooses a pair X0 , X1??0,1?n for which it can
distinguish encryptions with non-negligible
advantage ?
Choose
Distribution Dn X0 , X1
Relation R which is equality with X
?S that doesnt get E(X), and outputs Y we have
Prob R( X, Y ) ½
Given E(Xb ), run A(E(Xb )), get output b?0,1,
set YXb
Now, Pr?A(E(Xb)) 1 ? b ? 1? - Pr?A(E(Xb))?
1 ? b ? 0? gt ?
Therefore, Pr?R(X,Y)? - Pr?R(E(X,Y)? gt ? / 2

30
Equivalence Proof (informal)

Indistinguishability of encryptions ? Semantic
security
Suppose no semantic security A chooses some
distribution Dn and some relation R
Choose X0, X1 ?R Dn , choose b?R 0,1, compute
E(Xb).
Give E(Xb) to A, ask A to compute Yb A(E(Xb))
For X0 , X1 ?R Dn let
?0 ProbR(X0, Yb), ?1 ProbR(X1, Yb)
With noticeable probability ?0 - ?1 is
non-negligible, since otherwise Yb can be
computed without the encryption.
If ?0 - ?1 is non-negligible, then we can
distinguish between an encryption of X0 and X1

31
Lessons learned?

Rigorous approach to cryptography
Defining security
Proving security

32
References

Books
O. Goldreich, Foundations of Cryptography Vol 1,
Basic Tools, Cambridge, 2001
Pseudo-randomness, zero-knowledge
Vol 2, Basic Applications (to be available May
2004)
Encryption, Secure Function Evaluation)
Other volumes in www.wisdom.weizmann.ac.il/oded/b
ooks.html
Web material/courses
S. Goldwasser and M. Bellare, Lecture Notes on
Cryptography,
http//www-cse.ucsd.edu/mihir/papers/gb.html
M. Naor, 9th EWSCS, http//www.cs.ioc.ee/yik/schoo
ls/win2004/naor.php

33
Secure Function Evaluation

A major topic of cryptographic research
How to let n parties, P1,..,Pn compute a function
f(x1,..,xn)
Where input xi is known to party Pi
Parties learn the final input and nothing else

34
The Millionaires Problem Yao
x
y
Alice
Bob
Whose value is greater?
Leak no other information!
35
Comparing Information without Leaking it
x
y
Alice
Bob

Output Is xy?
The following solution is insecure
Use a one-way hash function H()
Alice publishes H(x), Bob publishes H(y)

36
Secure two-party computation - definition
y
x
Input
F(x,y) and nothing else
Output
y
As if
x
Trusted third party
F(x,y)
F(x,y)
37
Leak no other information

A protocol is secure if it emulates the ideal
solution
Alice learns F(x,y), and therefore can compute
everything that is implied by x, her prior
knowledge of y, and F(x,y).
Alice should not be able to compute anything else
Simulation
A protocol is considered secure if
For every adversary in the real world
There exists a simulator in the ideal world,
which outputs an indistinguishable transcript
, given access to the information that the
adversary is allowed to learn