Geoffrey Greenwell, IHSN/PARIS21 - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Geoffrey Greenwell, IHSN/PARIS21

Description:

... by building on a great deal of technical work undertaken ... The -argus risk for weighted sample. Re-identification rate to individual risk threshold ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 22
Provided by: Oli119
Category:

less

Transcript and Presenter's Notes

Title: Geoffrey Greenwell, IHSN/PARIS21


1
Development of Microdata Anonymization Tools by
the
IHSN
IHSN
  • Geoffrey Greenwell, IHSN/PARIS21
  • IASSIST Conference
  • Tampere, Finland, May 2009

Olivier Dupriez, World Bank Francois Fonteneau,
IHSN/P21 Mark McConaghy, DFID
2
About IHSN
  • International Household Survey Network
  • A network of international agencies
  • Based in Paris at the OECD at PARIS21
  • A coordinating mechanism to
  • Improve quality and use of household survey data
    in developing countries
  • Harmonize international recommendations for
    survey design, data analysis, etc
  • Produce and disseminate international good
    practices


3
Accelerated Data Program
  • Implementing the IHSN Tools in the countries
  • Technical and financial support to establish
    national data archives (in gt 50 countries)
  • Many datasets documented (DDI)
  • Improved access to data by researchers, but not
    yet satisfactory. We can measure demand through
    the NADA
  • Need to anonymize data remains the most
    frequently expressed concern and obstacle to data
    access.
  • The ADP has provided some guidance but there is a
    lack of simple and intuitive tools and guidelines
    available ADP countries.

4
ADP/IHSN in the world
5
Setting up Catalogs
6
Focus Nigeria
  • Effects of data
  • availability on MDG
  • Halving the
  • population without
  • sustainable access to
  • safe drinking water.
  • Providing robust
  • estimates to inform
  • policy makers
  • and
  • sector
  • monitoring.
  • Water and Sanitation
  • Sector. Workshop with
  • WHO/UNICEF

7
Effects of Data Availability
  • Nigeria and the MDG Rural access to improved
    water source

8
Resistance in the countries
  • Nigeria Statistics Law Statistical Act of 2007
    obliges microdata release after due
    anonymization. The legal framework exists.
  • Willing institution (the NBS in Nigeria)
  • Current anonymization strategies undertaken are
    limited to removal of direct identifiers however,
  • Other countries are unable to articulate a proper
    policy for dissemination and tend to use
    confidentiality as a barrier to mask political
    resistance or inertia.
  • IHSN anonymization tools will be a way to deal
    with both real ethical concerns but also
    political resistance

9
Better use of survey data
  • Lots of survey data remain under-exploited
    because not accessible by researchers/users
  • Obstacles
  • Technical
  • Psychological
  • Financial ? Support by many sponsors
  • Legal
  • Ethical
  • Political ? ?

IHSN data documentation and cataloguing tools
and guidelines
?
IHSN Dissemination Policy Guidelines Missing
piece SDC tools
?
10
Anonymize Process
  • Direct identifiers, which are variables such as
    names, addresses, or identity card numbers. They
    permit direct identification of a respondent but
    are not needed for statistical or research
    purposes, and should thus be removed from the
    published dataset.
  • Indirect identifiers, which are characteristics
    that may be shared by several respondents, and
    whose combination could lead to the
    re-identification of one of them. For example,
    the combination of variables such as district of
    residence, age, sex, and profession would be
    identifying if only one individual of that
    particular sex, age and profession lived in that
    particular district. Such variables are needed
    for statistical purposes, and should thus not be
    removed from the published data files.

11
Defining the problem
Once all identifying variables have been removed
we can still have a disclosure problem, the
problem remains dealing with the indirect
identifiers. The IHSN Anonymization tools will
approach these problems by building on a great
deal of technical work undertaken by experts in
the field. The IHSN hosted an expert meeting in
October 2008 to present its tools and
acknowledges the work done by University of
Manchester ISTAT (Italian Statistics) Cornell
University ICPSR
12
Developing SDC tools
  • Building on existing work
  • Not an integrated software
  • A collection of specialized tools for
  • Measuring the risk
  • Reducing the risk
  • Assessing the information loss
  • 12 plug ins developed in C that interface with
    SPSS, STATA or direct Server (Windows/Linux).
  • Need to be thoroughly tested.

13
12 Plug-ins
  • 12 plug-ins
  • The µ-argus risk for weighted sample
  • Re-identification rate to individual risk
    threshold
  • Individual risk to household risk
  • L-diversity for unweighted data
  • SUDA2 DIS-sample data
  • Kanon Micro-aggregation
  • Local recoding
  • Fixed length micro aggregation
  • Noise Addition
  • Pram Post Randomization
  • Rank Swapping
  • Sampling

Risk Measures Intruder Scenarios What does
the intruder know?
Risk Reduction What does the intruder want?
14
Measuring Disclosure Risk
?
Based on CENEX Handbook on Statistical Disclosure
Control Version 1.01
15
Reducing risk disclosure
Based on CENEX Handbook on Statistical Disclosure
Control Version 1.01
16
Measuring Information Loss
Based on CENEX Handbook on Statistical Disclosure
Control Version 1.01
17
Developing SDC toolsProposal
  • In Stata (SPSS, SAS) using C plugins
  • Stata version 9 or gt
  • Log file for easy replication of procedure
  • Informative output
  • Or command-line (plugins with data server)
  • Why Stata (SPSS/SAS)?
  • Because most countries use/know these software
  • Can use all tabulation and analysis functions

18
Beta Interface
19
Target use
  • Large, imperfect datasets in under resourced
    countries
  • For use by official data producers in developing
    countries (IHSN objective)
  • Relevant for other users as well
  • Free to all public source code

20
Work Program for 2009
  • Testing, calibrating and documenting
  • Cornell IHSN selected countries
  • Development/implementation of training and TA
    program
  • Detailed documentation and guidelines
  • Reference manual and training materials
  • Possibly launched before end of the year (IHSN
    website)
  • Participation of others welcome

21
IHSN
  • Adding to the Tools to facilitate data access in
    developing countries
  • Tools
  • Metadata Editor
  • CDROM/HTML developer
  • Web Based National Data Archives
  • Question Bank
  • Guidelines
  • Data Dissemination
  • Documentation Guide
  • Survey Quality Assessment Framework

22
The End
Thank you.
Write a Comment
User Comments (0)
About PowerShow.com