Privacy-Preserving Support Vector Machines via Random Kernels - PowerPoint PPT Presentation

About This Presentation
Title:

Privacy-Preserving Support Vector Machines via Random Kernels

Description:

Linear kernel: (K(A, B))ij = (AB)ij = AiB j = K(Ai, B j) ... solving for row r of Ai, 1 r mi from the equation ... solutions Ai to the equation BAi0 = Pi ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 17
Provided by: tedw5
Learn more at: https://ftp.cs.wisc.edu
Category:

less

Transcript and Presenter's Notes

Title: Privacy-Preserving Support Vector Machines via Random Kernels


1
Privacy-Preserving Support Vector Machines via
Random Kernels
The 2008 International Conference on Data Mining
February 6, 2014
  • Olvi Mangasarian
  • UW Madison UCSD La Jolla
  • Edward Wild
  • UW Madison

TexPoint fonts used in EMF. Read the TexPoint
manual before you delete this box. AAAAAAAAAAA
2
Data
Horizontally Partitioned Data
Features 1 2 .... n
A
A1
1 2 ........m
Examples
A2
A3
3
Problem Statement
  • Entities with related data wish to learn a
    classifier based on all data
  • The entities are unwilling to reveal their data
    to each other
  • If each entity holds a different set of examples
    with all features, then the data is said to be
    horizontally partitioned
  • Our approach privacy-preserving support vector
    machine (PPSVM) using random kernels
  • Provides accurate classification
  • Does not reveal private information

4
Outline
  • Support vector machines (SVMs)
  • Reduced and random kernel SVMs
  • Privacy-preserving SVM for horizontally
    partitioned data
  • Summary

5
Support Vector Machines
Linear kernel (K(A, B))ij (AB)ij AiBj
K(Ai, Bj) Gaussian kernel, parameter ? (K(A,
B))ij exp(-?Ai0-Bj2)
SVMs
  • x 2 Rn
  • SVM defined by parameters u and threshold ? of
    the nonlinear surface
  • A contains all data points
  • ½ A
  • ?? ½ A?
  • e is a vector of ones

K(A, A0)u e? e
K(A?, A0)u e? ?e
Minimize e0y (hinge loss or plus function or
max, 0) to fit data
Minimize e0s (u1 at solution) to reduce
overfitting
K(x0, A0)u ????
K(x0, A0)u ??
Slack variable y 0 allows points to be on the
wrong side of the bounding surface
K(x0, A0)u ??1?
6
Support Vector Machine
Reduced Support Vector Machine
Random Reduced Support Vector Machine
Using the random kernel K(A, B0) is a key result
for generating a simple and accurate
privacy-preserving SVM
LM, 2001 replace the kernel matrix K(A, A0)
with K(A, A0), where A0 consists of a randomly
selected subset of the rows of A
MT, 2006 replace the kernel matrix K(A, A0)
with K(A, B0), where B0 is a completely random
matrix
7
Error of Random Kernels is Comparable to Full
KernelsLinear Kernels
B is a random matrix with the same number of
columns as A and either 10 as many rows, or one
fewer row than columns
Equal error for random and full kernels
Each point represents one of 7 datasets from the
UCI repository
Random Kernel AB0 Error
Full Kernel AA0 Error
8
Error of Random Kernels is Comparable Full
KernelsGaussian Kernels
Random Kernel K(A, B0) Error
Full Kernel K(A, A0) Error
9
Horizontally Partitioned DataEach entity holds
different examples with the same features
A3
A1
A2
10
Privacy Preserving SVMs for Horizontally
Partitioned Data via Random Kernels
  • Each of q entities privately owns a block of data
    A1, , Aq that they are unwilling to share with
    the other q - 1 entities
  • The entities all agree on the same random basis
    matrix
  • and distribute K(Aj, B0) to all entities
  • K(A, B0)
  • Aj cannot be recovered uniquely from K(Aj, B0)

11
Privacy PreservationInfinite Number of
Solutions for Ai Given AiB0
Feng and Zhang, 2007 Every submatrix of a random
matrix has full rank
  • Given
  • Consider solving for row r of Ai, 1 r mi from
    the equation
  • BAir0 Pir , Air0 2 Rn
  • Every square submatrix of the random matrix B is
    nonsingular
  • There are at least
  • Thus there are
  • solutions Ai to the equation BAi0 Pi
  • If each entity has 20 points in R30, there are
    3020 solutions
  • Furthermore, each of the infinite number of
    matrices in the affine hull of these matrices is
    a solution

B
Pir
Air0

12
Results for PPSVM on Horizontally Partitioned Data
  • Compare classifiers that share examples with
    classifiers that do not
  • Seven datasets from the UCI repository
  • Simulate a situation in which each entity has
    only a subset of about 25 examples

13
Error Rate of Sharing Data is Better than not
SharingLinear Kernels
7 datasets represented by one point each
Error Sharing Data
Error Rate Without Sharing
Error Rate With Sharing
Error Without Sharing Data
14
Error Rate of Sharing Data is Better than not
SharingGaussian Kernels
Error Sharing Data
Error Without Sharing Data
15
Summary
  • Privacy preserving SVM for horizontally
    partitioned data
  • Based on using the random kernel K(A, B0)
  • Learn classifier using all data, but without
    revealing privately held data
  • Classification accuracy is better than an SVM
    without sharing, and comparable to an SVM where
    all data is shared
  • Related work
  • Similar approach for vertically partitioned data
    to appear in ACMTKDD
  • Liu et al., 2006 Properties of multiplicative
    data perturbation based on random projection
  • Yu et al., 2006 Secure computation of K(A, A0)

16
Questions
  • Websites with links to papers and talks
  • http//www.cs.wisc.edu/olvi
  • http//www.cs.wisc.edu/wildt
Write a Comment
User Comments (0)
About PowerShow.com