Title: Achieving Private SVD-based Recommendations on Inconsistently Masked Data
1Achieving Private SVD-based Recommendations on
Inconsistently Masked Data
- Ibrahim Yakut and Huseyin Polat
- iyakut,polath_at_anadolu.edu.tr
- Department of Computer Engineering
- Anadolu University, Turkey
2Collaborative Filtering (CF)
Problem Information Overload
Solution Collaborative Filtering (CF)
3Collaborative Filtering (CF)
- Recent technique for filtering and recommendation
- Relatively new concept very popular
- Used to cope with information overload
- Widely used technique by online vendors
- Many important applications in
- E-commerce
- Search engines
- Direct recommendations (books, movies, CDs, news,
etc.)
4Collaborative Filtering Process
Item for which prediction is sought
i1
i2
iq
im
u1
u2
Prediction
ua
Active user
un
Paq Prediction on item q for active user
5SVD-based CF
- Singular value decomposition (SVD) based schemes
are proposed
6SVD-based CF
m x n
Stored and used as a model
7Motivation
- Privacy-preserving CF schemes proposed so far are
based on consistently masked data - However, privacy concerns differs from user to
user - Users might decide to mask their data differently
to achieve required privacy levels - Can we still achieve CF services on
inconsistently masked data?
8Proposed Scheme
- Use randomized perturbation techniques (RPT) to
inconsistently mask users data. - Employ users mean in normalizing ratings through
z-scores
q
u
9Randomized Perturbation Techniques (RPT)
Collaborative Filtering
Central Database
Rn-1
Rn
R1
R2
Usern-1
Usern
User1
User2
10Data Disguising Ways
- Users are divided into two groups
- Having privacy concerns
- Having no concerns to divulge private data
- Considering users who have privacy concerns, they
decide how and how much data they disguise - Uniform or Gaussian perturbing data
- Different s over (0, ?) per user
- Disguising how many cell
11Inconsistently Masked User-Item Matrix
12Estimation From Inconsistently Masked Data (A)
- SVD-based CF is based on scalar product and sum
- We should show how to estimate SVD from
differently perturbed data - To get rid of the contribution of random numbers
in diagonal entries of ATA, we need - Average s of random numbers
- Number of disguised cells
13Estimation From Inconsistently Masked Data
A not disguised, B disguised by V, B' B V
14Estimation From Inconsistently Masked Data
A masked, uniform B masked, Gaussian
Sum (A') and Sum (B') can be similarly estimated
15Estimation From Inconsistently Masked Data
A B masked, different numbers of cells
ds of commonly filled cells
16Experiments
- Data Sets
- Jester is a web-based joke data
- 17,988 users, 100 jokes
- Ratings over a range (-10,10),continuos
- 50 of all ratings are present
- MovieLens Public (MLP)
- 943 users, 1,682 movies
- Ratings range from 1 to 5, discrete
- 100,000 ratings, totally, present
17Results
Percentage of disguising users
18Results
Changing level of perturbation
19Results
Percentage of gaussian disguised
20Results
Percentage of disguised cells
21Conclusion
- We showed that how to achieve CF tasks using
SVD-based algorithms on inconsistently masked
data - Future work
- How to extend our schemes to other CF algorithms
- How to increase accuracy when aggregate
information disclosed
22THANKS FOR YOUR INTEREST...QUESTIONS?