Title: Executing SQL over Encrypted Data in Database-Service-Provider Model
1Executing SQL over Encrypted Data in
Database-Service-Provider Model
- Hakan Hacigumus
- University of California, Irvine
- Bala Iyer
- IBM Silicon Valley Lab.
- Chen Li
- University of California, Irvine
- Sharad Mehrotra
- University of California, Irvine
- SIGMOD 2002, Madison, Wisconsin, USA
2What do we want to do?
User Data
Encrypted User Database
Distrusted
- We want to store the data on a server
Server
- But the problem is we do not trust the server
for sensitive information! - encrypt the data and store it
- but still be able to run queries over the
encrypted data - do most of the work at the server
- If the server is trusted, ICDE 2002
3Why is it important anyway?
User Data
Encrypted User Database
Distrusted Server
(Distrusted) Application Service Provider
- Application Service Provider (ASP) Model for
Database - DB management transferred to service provider for
- backup, administration, restoration, space
management, upgrades etc. - use the database as a service provided by an
ASP - use SW, HW, human resources of ASP, instead of
your own
4Talk Outline
- Service Provider Architecture
- How to create Metadata Relational Encryption and
Storage Model - Query Decomposition and Relational Operators
- Query Decomposition Examples
- Experimental Results
- Conclusion
5Service Provider Architecture
Server Site
Client Site
Encrypted Results
Client Side Query
?
Server Side Query
Service Provider
Original Query
?
Actual Results
?
6Talk Outline
- Service Provider Architecture
- How to create Metadata Relational Encryption and
Storage Model - Query Decomposition and Relational Operators
- Query Decomposition Examples
- Experimental Results
- Conclusion
7Talk Outline
- Service Provider Architecture
- How to create Metadata Relational Encryption and
Storage Model - Query Decomposition and Relational Operators
- Query Decomposition Examples
- Experimental Results
- Conclusion
8Relational Encryption
etuple N_ID S_ID P_ID
fErf!Q!!vddfgtgtlt/ 50 1 10
F3wgfErf! 65 2 10
gfsdf343vltl 50 2 20
33wgfs! 65 2 20
NAME SALARY PID
John 50000 2
Marry 110000 2
James 95000 3
Lisa 105000 4
Server Site
- Store an encrypted string etuple for each
tuple in the original table - This is called row level encryption
- Any kind of encryption technique can be used
- Blowfish encryption algorithm is used for this
work - Create an index for each (or selected)
attribute(s) in the original table
9Building the IndexPartition and Identification
Functions
- Partition function divides domain values into
partitions (buckets) - Partition (R.A) 0,200, (200,400,
(400,600, (600,800, (800,1000 - partitioning function has an impact on
performance as well as privacy
10Mapping Functions
- Mapping function maps a value v in the domain of
attribute A to the id of the partition which
value v belongs to - e.g. MapR.A( 250 ) 7, MapR.A( 620 ) 1
11Storing Encrypted Data
- R lt A, B, C gt ? RS lt etuple, A_id, B_id,
C_id gt - etuple encrypt ( A B C )
- A_id MapR.A( A ), B_id MapR.B( B ), C_id
MapR.C( C ) -
Table EMPLOYEES
Table EMPLOYEE
Etuple N_ID S_ID P_ID
fErf!Q!!vddfgtgtlt/ 50 1 10
F3wgfErf! 65 2 10
gfsdf343vltl 50 2 20
33wgfs! 65 2 20
NAME SALARY PID
John 50000 2
Marry 110000 2
James 95000 3
Lisa 105000 4
12Talk Outline
- Service Provider Architecture
- How to create Metadata Relational Encryption and
Storage Model - Query Decomposition and Relational Operators
- Query Decomposition Examples
- Experimental Results
- Conclusion
13Talk Outline
- Service Provider Architecture
- How to create Metadata Relational Encryption and
Storage Model - Query Decomposition and Relational Operators
- Query Decomposition Examples
- Experimental Results
- Conclusion
14Mapping Conditions
- Q SELECT name, pname FROM emp, proj
- WHERE emp.pidproj.pid AND salary gt 100k
- Server stores attribute indices determined by
mapping functions - Client stores metadata and utilizes that to
translate the query - Conditions
- Condition ? Attribute op Value
- Condition ? Attribute op Attribute
- Condition ? (Condition ? Condition) (Condition
? Condition) - (not Condition)
15Mapping Conditions (2)
- Example
- Attribute Value
- Mapcond( A v ) ? AS MapA( v )
- Mapcond( A 250 ) ? AS 7
16Mapping Conditions (3)
- Attribute1 Attribute2
- Mapcond( A B ) ? ?N (AS identA( pk ) ? BS
identB( pl )) -
- where N is pk ? partition (A), pl ? partition
(B), pk ? pl ? ? -
Partitions A_id
0,100 2
(100,200 4
(200,300 3
Partitions B_id
0,200 9
(200,400 8
- C A B ? C (AS 2 ? BS 9)
- ? (AS 4 ? BS 9)
- ? (AS 3 ? BS 8)
17Talk Outline
- Service Provider Architecture
- How to create Metadata Relational Encryption and
Storage Model - Query Decomposition and Relational Operators
- Query Decomposition Examples
- Experimental Results
- Conclusion
18Relational Operators over Encrypted Relations
- Partition the computation of the operators across
client and server - Compute (possibly) superset of answers at the
server - Filter the answers at the client
- Objective minimize the work at the client and
process the answers as soon as they arrive
without requiring storage at the client - Operators studied
- Selection
- Join
- Grouping and Aggregation
- Sorting
- Duplicate Elimination
- Set Difference
- Union
- Projection
19Selection Operator
?c( R ) ?c( D (?SMapcond(c)( RS ) )
Example
20Join Operator
R c T ?c( D ( RS SMapcond(c) TS )
Example
C A B ? C (A_id 2 ? B_id 9) ?(A_id
4 ? B_id 9) ?(A_id 3 ? B_id 8)
Partitions A_id
0,100 2
(100,200 4
(200,300 3
Partitions B_id
0,200 9
(200,400 8
21Talk Outline
- Service Provider Architecture
- How to create Metadata Relational Encryption and
Storage Model - Query Decomposition and Relational Operators
- Query Decomposition Examples
- Experimental Results
- Conclusion
22Talk Outline
- Service Provider Architecture
- How to create Metadata Relational Encryption and
Storage Model - Query Decomposition and Relational Operators
- Query Decomposition Examples
- Experimental Results
- Conclusion
23Query Decomposition
- Q SELECT name, pname FROM emp, proj
- WHERE emp.pidproj.pid AND salary gt 100k
Client Query
24Query Decomposition (2)
Client Query
?name,pname
Client Query
?name,pname
e.pid p.pid
?salary gt100k
D
e.pid p.pid
?salary gt100k
D
D
E_PROJ
D
?s_id 1 v s_id 2
E_PROJ
E_EMP
E_EMP
Server Query
Server Query
25Query Decomposition (3)
Client Query
Client Query
?name,pname
?name,pname
?salary gt100k ? e.pid p.pid
e.pid p.pid
?salary gt100k
D
D
D
e.p_id p.p_id
?s_id 1 v s_id 2
E_PROJ
?s_id 1 v s_id 2
E_PROJ
E_EMP
E_EMP
Server Query
Server Query
26Query Decomposition (4)
Client Query
?name,pname
- Q SELECT name, pname FROM emp,
proj - WHERE emp.pidproj.pid AND salary gt
100k - QS SELECT e_emp.etuple, e_proj.etuple FROM
e_emp, e_proj - WHERE e.p_idp.p_id AND
s_id 1 OR s_id 2 - QC SELECT name, pname FROM temp
- WHERE emp.pidproj.pid AND
salary gt 100k
?salary gt100k ? e.pid p.pid
D
e.p_id p.p_id
E_PROJ
?s_id 1 v s_id 2
E_EMP
Server Query
27Talk Outline
- Service Provider Architecture
- How to create Metadata Relational Encryption and
Storage Model - Query Decomposition and Relational Operators
- Query Decomposition Examples
- Experimental Results
- Conclusion
28Talk Outline
- Service Provider Architecture
- How to create Metadata Relational Encryption and
Storage Model - Query Decomposition and Relational Operators
- Query Decomposition Examples
- Experimental Results
- Conclusion
29Experimental Evaluation
- Data
- TPC-H database, scale factor 0.1
- Queries
- TPC-H Queries, versions of Q6 and Q3
- Partitioning Strategy
- Equi-depth histograms for the first set of
experiments - Equi-width histograms for the second set of
experiments
30Effect of Number of Buckets in Non-Join Query
- Client and communications costs decreases with
increasing number of buckets due to better
filtering at the server - Server cost doesnt decrease as much, table scan
remains best choice in the optimizer
31Effect of Number of Buckets in Non-Join Query
- Single Server Server is trusted and performs all
operations including decryption on site - Shows that proposed query execution protocol
doesnt introduce significant overhead
32Effect of Number of Buckets in Join Query
- Sharp decrease in query response time with
increase in the number of buckets due to better
filtering at the server - Client side query response time is greater than
server side query response time due to dominant
decryption cost on the query (second graph)
33Effect of Number of Buckets in Join Query
- Single Server Server is trusted and performs all
operations including decryption on site - Consistent with the previous results showing
proposed communication protocol doesnt introduce
significant overhead
34Conclusion
- ASP model is a promising solution for enterprise
computing in Internet era - We studied data privacy problem
- in the context of ASP model
- when the ASP is not trusted
- Proposed solution
- encrypts data, creates coarse indexes and
stores the data at ASP - allows only data owner to decrypt the data
- With query decomposition
- most of query execution performed at ASP
- client only performs filtering and continues to
benefit from ASP model