Anonymity and Privacy Issues --- re-identification - PowerPoint PPT Presentation

About This Presentation
Title:

Anonymity and Privacy Issues --- re-identification

Description:

Around 40% of people would like to remain anonymous on social media or ... Friendster. With explicit identified profiles. Without explicit identified profiles ... – PowerPoint PPT presentation

Number of Views:372
Avg rating:3.0/5.0
Slides: 34
Provided by: zhangy8
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Anonymity and Privacy Issues --- re-identification


1
Anonymity and Privacy Issues--- re-identification
  • Yimeng Zhang
  • 12/4/07

2
Index
  • Views on Privacy of Social Media
  • Overview of Re-identification
  • You are What You Say Privacy Risks of Public
    Mentions, Frankowski et al. SIGIR06

3
Improper Use of Personal Information Online
4
Top Privacy Concerns
5
Remaining Anonymous
6
True Information Provide While Registering
7
Ability to Remain Anonymous
8
Importance of Controlling Personal Information
9
Specifying Who Can ViewPersonal Information
10
Conclusion
  • Around 40 of people would like to remain
    anonymous on social media or social networking
    sites
  • Most people provide their true personal
    information while registering
  • Most people think it is important to have the
    control of personal information online

Re-identification Techniques can identify the
users of an anonymous dataset
11
Privacy Loss through Re-identification
  • Re-identification Linkage of datasets with
    explicit identifiers with datasets without
    explicit identifiers through common attributes
  • Datasets without explicit identifiers
  • Public data which are made anonymous by users
  • Public data by research groups (after suitable
    anonymizing)
  • Public data from government agencies (census)

People wish to keep private
12
Example of Re-identification
Voter register list of Massachusetts purchased
with only 20
87 of Population in 1990. US are likely to be
uniquely identified based on only on Zip, Birth
and Sex
Sweeney, 2002
13
The Rebus Form


Governors medical records!
From Frankowski, SIGIR06
14
Example of face identification
Without explicit identified profiles
With explicit identified profiles
Friendster
Facebook
Identity violation!
Face Recognizer
Gross and Acquisti, WPES 05
15
You Are What You Say Privacy Risks of Public
Mentions
  • Dan Frankowski, Dan Cosley, Shilad Sen, Loren
    Terveen, John Riedl
  • University of Minnesota
  • SIGIR 2006

16
Main Idea
  • People can be identified by their preferences and
    what they talk about
  • Reviews of books, movies, songs
  • Mentions on forums or blogs
  • Friend list on Facebook
  • Wish or purchase list on Amazon
  • Method for Re-identification
  • Datasets are represented in Sparse Relation
    Spaces
  • Re-identification can be done by matching two
    Sparse Relation Spaces

17
Sparse Relation Space
  • Relates people to items
  • Sparse have few relationships recorded per
    person
  • Dataset that can be represented in a Sparse
    Relation Space is vulnerable

i1 i2 i3
p1 X
p2 X
p3 X

18
Research Questions
  • Risks of dataset release
  • What are the risks to user privacy when releasing
    a dataset
  • Altering the dataset
  • How can dataset owners alter the dataset to
    preserve user privacy
  • Self defense
  • How can users protect their own privacy

19
Experiment Dataset MovieLens
Dataset1 Movie Ratings Users do not allow to
reveal Released for research use Anonymous
Dataset
Dataset2 Movies Reviews Public
20
Feature of the dataset
  • Both ratings and mentions follow a power law
  • Important feature for real world sparse relation
    space

Frankowski, SIGIR 06
21
Evaluation Measure
Mentions
Mentions by User t
Ratings
Re-identify Algorithm
Top k ratings users ranked by the likelihood they
are user t
K-identified t is in the k users returned by the
algorithm K-identification rate the fraction of
k-identified users
22
Set Intersection Algorithm for Re-identification
  • Likely list Users in the rating database who
    have rated every movie mentions by user t
  • Problem
  • Users mention movies but do not rate them

23
TF-IDF Algorithm
  • Mentions of a user vector of the movies the user
    mentioned
  • Ratings of a user vector of the movies the user
    rated
  • Likelihood TF-IDF cosine similarity

24
Scoring Algorithm
  • Scoring
  • emphasize the mentions of rarely rated movies
  • de-emphasize the number of ratings a user has

Score for one mention/movie of a user
Fraction of users who have not rated mention m
Score for a user Multiplication of scores for
all mentions of this user
25
Scoring Algorithm with Ratings
  • Suppose we have an magic analyzer which can guess
    the rating of a movie from the mention
  • Eg. Using the context of that mention
  • Algorithms
  • ExactRating the analyzer can perfectly determine
    the rating
  • FuzzingRaing the analyzer can guess the rating
    value within /-1

26
Percent of users identified by different
algorithms
27
1-identification rate
28
RQ2 Altering the dataset
  • How can dataset owners alter the dataset they
    release to preserve user privacy
  • Data Suppression
  • Algorithm Drop rarely rated movies
  • Not big problem for industry, but harmful for
    research

29
Dataset level Suppression
Do not work!
30
RQ3 Self Defence
  • How can users protect their own privacy
  • Suppression
  • Not to mention movies rated rarely
  • Misdirection
  • Mention items they have not rated

31
User Level Suppression
Do not work!
32
Misdirection
Works when user mention popular items
33
Conclusion
  • Simple data mining algorithms can identify the
    users who mention in a sparse relation space and
    think they are anonymous
  • Use the algorithms eg. find paper reviewers
    (Future work of Frankowski)
  • Privacy risks for users on Social Media sites
  • Hard to preserve privacies
  • Dont reveal your privacies even if it seems to
    be anonymous
Write a Comment
User Comments (0)
About PowerShow.com