An Improved Indexing Scheme for Range Queries

About This Presentation

Title:

Description:

Number of Views:53

Avg rating:3.0/5.0

Slides: 21

Provided by: yvonn54

Category:

more less

Transcript and Presenter's Notes

Title: An Improved Indexing Scheme for Range Queries

1
An Improved Indexing Scheme for Range Queries

2
Database-as-a-Service

3
Database-as-a-Service

Security of data is not guaranteed
Service providers are untrusted
Store only an encrypted form of data onto the
remote server
Only users with the correct key(s) can have
access
How then can we query the encrypted data?
Retrieve and decrypt the entire table, and apply
SQL statements on it. Too expensive!
A more realistic approach was discovered

4
Database-as-a-Service
5
Bucketization

Various approaches to build meta-data B-tree
based, hash-based, and bucket-based
What is bucketization?
Partition of attribute data into several buckets
Each bucket is identified by an ID
Bucket IDs are stored, along with encrypted data,
on the remote server
Client keeps partition information as meta-data
General bucketization approach
Equi-width
Equi-depth

6
Example 1
7
Example 1
8
Example 1

9
Query Optimal Bucketization

10
Query Optimal Bucketization

11
Example 2
12
Example 2

Qserver
SELECT
FROM egrades
WHERE gpaID Bucket_1 OR
gpaID Bucket_2 OR
gpaID Bucket_3
Same as the general bucketization method
In most cases, QOB can outperform the
conventional bucketization strategy, but not
always

13
Deviation Bucketization

Built upon QOB, takes the same parameters
Has two levels of buckets
First level same as those produced by QOB
Second level bucketization of deviation values,
the difference between the value itself to the
average of the bucket
Each first-level-bucket has at most M second
level buckets
QOB has at most M buckets, while DB has at most
M2 buckets

14
Deviation Bucketization

15
Example 3
16
Example 3

Qserver
SELECT
FROM egrades
WHERE gpaID Bucket_1 OR
gpaID Bucket_2 OR
gpaID Bucket_3_1 OR
gpaID Bucket_3_2
In this case, no false positives are returned
Generally, false positives will still be
returned, just the number of them will be greatly
reduced

17
Experiments

Two datasets
Synthetic dataset 105 integers from 0, 999
Real dataset 103 data points from the Aspect
column of the Forest CoverType database in UCIs
KDD Archive
Two sets of queries
Qsyn
Qreal