Anonymity and Privacy Issues --- re-identification - PowerPoint PPT Presentation

About This Presentation

Title:

Anonymity and Privacy Issues --- re-identification

Description:

Around 40% of people would like to remain anonymous on social media or ... Friendster. With explicit identified profiles. Without explicit identified profiles ... – PowerPoint PPT presentation

Number of Views:372

Avg rating:3.0/5.0

Slides: 34

Provided by: zhangy8

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Anonymity and Privacy Issues --- re-identification

1
Anonymity and Privacy Issues--- re-identification

Yimeng Zhang
12/4/07

2
Index

Views on Privacy of Social Media
Overview of Re-identification
You are What You Say Privacy Risks of Public
Mentions, Frankowski et al. SIGIR06

3
Improper Use of Personal Information Online
4
Top Privacy Concerns
5
Remaining Anonymous
6
True Information Provide While Registering
7
Ability to Remain Anonymous
8
Importance of Controlling Personal Information
9
Specifying Who Can ViewPersonal Information
10
Conclusion

Around 40 of people would like to remain
anonymous on social media or social networking
sites
Most people provide their true personal
information while registering
Most people think it is important to have the
control of personal information online

Re-identification Techniques can identify the
users of an anonymous dataset
11
Privacy Loss through Re-identification

Re-identification Linkage of datasets with
explicit identifiers with datasets without
explicit identifiers through common attributes
Datasets without explicit identifiers
Public data which are made anonymous by users
Public data by research groups (after suitable
anonymizing)
Public data from government agencies (census)

People wish to keep private
12
Example of Re-identification
Voter register list of Massachusetts purchased
with only 20
87 of Population in 1990. US are likely to be
uniquely identified based on only on Zip, Birth
and Sex
Sweeney, 2002
13
The Rebus Form

Governors medical records!
From Frankowski, SIGIR06
14
Example of face identification
Without explicit identified profiles
With explicit identified profiles
Friendster
Facebook
Identity violation!
Face Recognizer
Gross and Acquisti, WPES 05
15
You Are What You Say Privacy Risks of Public
Mentions

Dan Frankowski, Dan Cosley, Shilad Sen, Loren
Terveen, John Riedl
University of Minnesota
SIGIR 2006

16
Main Idea

People can be identified by their preferences and
what they talk about
Reviews of books, movies, songs
Mentions on forums or blogs
Friend list on Facebook
Wish or purchase list on Amazon
Method for Re-identification
Datasets are represented in Sparse Relation
Spaces
Re-identification can be done by matching two
Sparse Relation Spaces

17
Sparse Relation Space

Relates people to items
Sparse have few relationships recorded per
person
Dataset that can be represented in a Sparse
Relation Space is vulnerable

i1 i2 i3
p1 X
p2 X
p3 X

18
Research Questions

Risks of dataset release
What are the risks to user privacy when releasing
a dataset
Altering the dataset
How can dataset owners alter the dataset to
preserve user privacy
Self defense
How can users protect their own privacy

19
Experiment Dataset MovieLens
Dataset1 Movie Ratings Users do not allow to
reveal Released for research use Anonymous
Dataset
Dataset2 Movies Reviews Public
20
Feature of the dataset

Both ratings and mentions follow a power law
Important feature for real world sparse relation
space

Frankowski, SIGIR 06
21
Evaluation Measure
Mentions
Mentions by User t
Ratings
Re-identify Algorithm
Top k ratings users ranked by the likelihood they
are user t
K-identified t is in the k users returned by the
algorithm K-identification rate the fraction of
k-identified users
22
Set Intersection Algorithm for Re-identification

Likely list Users in the rating database who
have rated every movie mentions by user t
Problem
Users mention movies but do not rate them

23
TF-IDF Algorithm

Mentions of a user vector of the movies the user
mentioned
Ratings of a user vector of the movies the user
rated
Likelihood TF-IDF cosine similarity

24
Scoring Algorithm

Scoring
emphasize the mentions of rarely rated movies
de-emphasize the number of ratings a user has

Score for one mention/movie of a user
Fraction of users who have not rated mention m
Score for a user Multiplication of scores for
all mentions of this user
25
Scoring Algorithm with Ratings

Suppose we have an magic analyzer which can guess
the rating of a movie from the mention
Eg. Using the context of that mention
Algorithms
ExactRating the analyzer can perfectly determine
the rating
FuzzingRaing the analyzer can guess the rating
value within /-1

26
Percent of users identified by different
algorithms
27
1-identification rate
28
RQ2 Altering the dataset

How can dataset owners alter the dataset they
release to preserve user privacy
Data Suppression
Algorithm Drop rarely rated movies
Not big problem for industry, but harmful for
research

29
Dataset level Suppression
Do not work!
30
RQ3 Self Defence

How can users protect their own privacy
Suppression
Not to mention movies rated rarely
Misdirection
Mention items they have not rated

31
User Level Suppression
Do not work!
32
Misdirection
Works when user mention popular items
33
Conclusion

Simple data mining algorithms can identify the
users who mention in a sparse relation space and
think they are anonymous
Use the algorithms eg. find paper reviewers
(Future work of Frankowski)
Privacy risks for users on Social Media sites
Hard to preserve privacies
Dont reveal your privacies even if it seems to
be anonymous

Write a Comment

User Comments (0)