Title: Private Keyword Search on Streaming Data
1Private Keyword Search on Streaming Data
Rafail Ostrovsky William
Skeith UCLA
(patent pending)
2Motivating Example
- The intelligence community collects data from
multiple sources that might potentially be
useful for future analysis. - Network traffic
- Chat rooms
- Web sites, etc
- However, what is useful is often classified.
3Current Practice
- Continuously transfer all data to a secure
environment. - After data is transferred, filter in the
classified environment, keep only small fraction
of documents.
4Filter
Storage
! D(1,3)! D(1,2)! D(1,1)!
D(3,1)
D(1,1)
D(1,2)
D(2,2)
D(2,3)
D(3,2)
D(2,1)
D(1,3)
D(3,3)
! D(2,3)! D(2,2)! D(2,1)!
Filter rules are written by an analyst and are
classified!
! D(3,3)! D(3,2)! D(3,1)!
5Current Practice
- Drawbacks
- Communication
- Processing
6How to improve performance?
- Distribute work to many locations on a network
- Seemingly ideal solution, but
- Major problem
- Not clear how to maintain privacy, which is the
focus of this talk
7Storage E (D(1,2)) E (D(1,3))
Filter
! D(1,3)! D(1,2)! D(1,1)!
Decrypt
Storage E (D(2,2))
Filter
! D(2,3)! D(2,2)! D(2,1)!
Storage D(1,2) D(1,3) D(2,2)
Storage
Filter
! D(3,3)! D(3,2)! D(3,1)!
8- Example Filter
- Look for all documents that contain special
classified keywords, selected by an analyst - Perhaps an alias of a dangerous criminal
- Privacy
- Must hide what words are used to create the
filter - Output must be encrypted
9More generally
- We define the notion of Public Key Program
Obfuscation - Encrypted version of a program
- Performs same functionality as un-obfuscated
program, but - Produces encrypted output
- Impossible to reverse engineer
- A little more formally
10Public Key Program Obfuscation
11Privacy
12Related Notions
- PIR (Private Information Retrieval)
CGKS,KO,CMS - Keyword PIR KO,CGN,FIPR
- Program Obfuscation BGIRSVY
- Here output is identical to un-obfuscated
program, but in our case it is encrypted. - Public Key Program Obfuscation
- A more general notion than PIR, with lots of
applications
13What we want
Filter
Storage
! D(1,3)! D(1,2)! D(1,1)!
14This is matching document 2
This is a Non-matching document
This is matching document 1
This is matching document 3
This is a Non-matching document
This is a Non-matching document
15How to accomplish this?
16Several Solutions based on Homomorphic Encryptions
- For this talk Paillier Encryption
- Properties
- Plaintext set Zn
- Ciphertext set Zn2
- Homomorphic, i.e., E(x)E(y) E(xy)
17Simplifying Assumptions for this Talk
- All keywords come from some poly-size dictionary
- Truncate documents beyond a certain length
18D
Dictionary
. . .
(g,gD)
Output Buffer
19Heres another matching document
- Collisions cause two problems
- Good documents are destroyed
- 2. Non-existent documents could be fabricated
This is matching document 1
This is matching document3
This is matching document 2
20- Well make use of two combinatorial lemmas
21(No Transcript)
22How to detect collisions?
- Append a highly structured, (yet random) k-bit
string to the message - The sum of two or more such strings will be
another such string with negligible probability
in k - Specifically, partition k bits into triples of
bits, and set exactly one bit from each triple to
1
23- 100001100010010100001010010
010001010001100001100001010
010100100100010001010001010
100100010111100100111010010
24Detecting Overflow gt m
- Double buffer size from m to 2m
- If m lt documents lt 2m, output overflow
- If documents gt 2m, then expected number of
collisions is large, thus output overflow in
this case as well. - Not yet in eprint version, will appear soon, as
well as some other extensions.
25More from the paper that we dont have time to
discuss
- Reducing program size below dictionary size
(using ? Hiding from CMS) - Queries containing AND (using BGN machinery)
- Eliminating negligible error (using perfect
hashing) - Scheme based on arbitrary homomorphic encryption
26Conclusions
- Private searching on streaming data
- Public key program obfuscation, more general than
PIR - Practical, efficient protocols
- Many open problems
27Thanks For Listening!?