Title: Cardinality-based Inference Control in OLAP Systems An Information Theoretic Approach
1Cardinality-based Inference Control in OLAP
SystemsAn Information Theoretic Approach
- Nan Zhang
- Texas AM University
- This is a joint work with Dr. Wei Zhao and Dr.
Jianer Chen
2Privacy Concern
- Growing Privacy Concern in Database Applications
on the Internet (e.g., Data Mining) - 17 privacy fundamentalists, 56 pragmatic
majority, 27 marginally concerned (ATT Survey) - Challenge Can we build accurate models of the
aggregate data without access to the precise
values of individual data?
3Problem Definition
- Will the application invade privacy?
Application (Data Miner)
OLAP Server
Randomization
Data Providers
DataProviders
4Inference Problem
5Inference Problem
6Goal
- Reject queries that may result in an inference
problem - Answer as many other queries as we can
Application (Data Miner)
OLAP Server
Database
DataWarehouse
7Related Work
- A lot of work on statistical databases
- Survey
- Differences
- Restriction on OLAP queries
- Structure of data cube
- Online response time
8Related Work
- A similar scheme
- Our Advantages
- Much easier approach
- A tighter bound
- More general framework
9Definition Query
1-dimensional queries
2-dimensional queries
10Data Cube and Lattice of Cuboids
11Definition Query
-
- There exists a unique cuboid S such that a cell
of S is the aggregation of W. - Suppose that S is a k-dimensional cuboid. The
dimensionality of Q is defined to be n - k.
12Definition compromisability
SU Sales amount of used books in Feb
13Definition compromisability
- Compromisability
- direct inference
- Compromisability lt 1
14Cardinality-based Inference Control
S3, ST Minimum compromisability 2,
21(43)-222-1 5 gt 2
S1, SB Minimum compromisability 2,
21(43)-222-1 5 5
S1, SD Minimum compromisability 2,
21(43)-222-1 5 gt 4
15Our Approach
- A k-dimensional query Q(F, W) can be safely
answered if every k1 dimensional dice X in X
that - Contains W as a subset
- Can be queries as a cell of a (n-k-1)-dimensional
cuboid - satisfies
-
16Comparison with Previous Result
17Proof of Our Bound
18An Information-Theoretic Definition
19An Information-Theoretic Definition
- Let
- we have
- Thus, no inference problem exists in a data cube
X if
20Bounds on fmax(t0)
21Maximum Non-Compromisable Data Cube
22Main Theorem
23Final Remarks
- Future Work
- Quantitative measure of the inference problem
- Combination of randomization and inference
control approaches
24Thank you