On the Anonymization of Sparse High-Dimensional Data - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

On the Anonymization of Sparse High-Dimensional Data

Description:

On the Anonymization of Sparse High-Dimensional Data Gabriel Ghinita1 Yufei Tao2 Panos Kalnis1 1 National University of Singapore – PowerPoint PPT presentation

Number of Views:130
Avg rating:3.0/5.0
Slides: 20
Provided by: edus140
Category:

less

Transcript and Presenter's Notes

Title: On the Anonymization of Sparse High-Dimensional Data


1
On the Anonymization of Sparse High-Dimensional
Data
Gabriel Ghinita1 Yufei Tao2 Panos Kalnis1 Gabriel Ghinita1 Yufei Tao2 Panos Kalnis1

  • 1 National University of Singapore
  • ghinitag,kalnis_at_comp.nus.edu.sg
  • 2 Chinese University of Hong Kong
  • taoyf_at_cse.cuhk.edu.hk

2
Publishing Transaction Data
  • Publishing transaction data
  • Retail chain-owned shopping cart data
  • Infer consumer spending patterns
  • Correlations among purchased items
  • e.g., 90 of cereals buyers also buy milk
  • What about privacy?

3
Privacy Threat
Quasi-identifying Items
Sensitive Items
4
Privacy Paradigm
  • l-diversity
  • prevent association between quasi-identifier and
    sensitive attributes
  • Create groups of transactions
  • freq. of an SA value in a group lt 1/p
  • Objective
  • Enforce privacy
  • Preserve correlations among items
  • Challenge high data dimensionality

5
Data Re-organization
PRESERVES CORELATIONS!
Band Matrix Organization
6
Published Data
Summary of Sensitive Items
7
Contributions
  • Novel data representation
  • Preserves correlation among items
  • Efficient heuristic for group formation
  • Linear time to data size
  • Supports multiple sensitive items

8
State-of-the-art MondrianFWR06
  • Generalization-based
  • data-space partitioning
  • similar to k-d-trees
  • split recursively until privacy condition does
    not hold
  • constrained global recoding

k 2
Age
20
40
60
40
GENERALIZATION HIGH DIMENSIONALITY UNACCEPTBLE
INFORMATION LOSS
60
Weight
80
100
FWR06 K. LeFevre et al. Mondrian
Multidimensional k-anonymity, Proceedings of the
22nd International Conference on Data Engineering
(ICDE), 2006
9
State-of-the-art AnatomyXT06
  • Permutation-based method
  • discloses exact QID values

Anatomized table
RANDOM GROUP FORMATION DOES NOT PRESERVE
CORRELATIONS
G! permutations
Disease
Ulcer(1) Pneumonia(1)
Flu(1) Dyspepsia(1)
Gastritis(1) Dyspepsia(1)
Age ZipCode
42 52000
47 43000
51 32000
62 41000
55 27000
67 55000
Age ZipCode Disease
42 52000 Ulcer
47 43000 Pneumonia
51 32000 Flu
55 27000 Gastritis
62 41000 Dyspepsia
67 55000 Dyspepsia
XT06 X. Xiao and Y. Tao. Anatomy simple and
effective privacy preservation, Proceedings of
the 32nd international conference on Very Large
Data Bases (VLDB), 2006
10
Band Matrix Representation
  • Bandwidth UL1
  • Minimizing bandwidth is NP-hard

11
Reverse Cuthil-McKee (RCM)
  • Heuristic Bandwidth Minimization
  • Solves corresponding graph labeling problem
  • Permutes rows and columns
  • Complexity N D log D
  • N matrix rows ( transactions)
  • D maximum degree of any vertex

12
Group Formation
  • Correlation-aware Anonymization of
    High-Dimensional Data (CAHD)
  • Use the order given by RCM
  • Consecutive transactions highly correlated
  • O(pN) complexity

13
Group Formation
14
Experimental Evaluation
15
RCM Visualization
16
Experimental Setting
  • BMS dataset
  • Compare with hybrid PermMondrian(PM)
  • Combines Mondrian with Anatomy
  • Query Workload
  • Reconstruction Error

17
Recostruction Error vs p
18
Execution Time
19
Conclusions
  • Anonymizing transaction data
  • High-dimensionality
  • Preserving correlation
  • Future work
  • Different encodings for data representation
  • Enhance correlation among consecutive rows
Write a Comment
User Comments (0)
About PowerShow.com