Title: Methods of Ensuring Confidentiality
1Methods of Ensuring Confidentiality
- Michael Griffey
- Energy Information Administration
- February 2000
2DISCLOSURE AVOIDANCE
- Procedures to ensure that confidential company
identifiable data are not disclosed in tables
where company specific responses may be
proprietary and prohibited from public disclosure
by law.
3OBJECTIVES
- To provide useful statistical information to data
users - To assure that the responses of individuals are
protected
4TYPES OF DATA
- Microdata -Records containing information for a
single person, establishment etc. - Frequency - Tables that show percentage of
population with certain characteristics. - Magnitude - Tables showing
- aggregates of reported
- values.
5SENSITIVE DATA VALUE
- A data item which could be used to calculate
another companys data
6CELL SUPPRESSIONA disclosure avoidance technique
in which the sensitive data in the publication is
replaced with a W.
- PRIMARY SUPPRESSION
- Suppression of sensitive data is called primary
suppression
7COMPLEMENTARY SUPPRESSION
- The suppression of nonsensitive cells to prevent
users from calculating the sensitive cells.
8IDENTIFYING SENSITIVE DATA
- Threshold Rule - for example nlt5.
- n,k rule
- p-percent rule
- Linear Equations - pq rule
- Special Rules
- Quality Suppression
9P PERCENT RULE
R T - X1 - X2 T Total Value of a cell. X1
Value of largest unit. X2 Value of 2nd largest
unit. P Percent of protection
required. Suppress if R less than (p/100)X1
10USING NOISE FOR DISCLOSURE LIMITATION
- Perturb each respondents data, e.g. to perturb
by about 10, multiply by close to 1.1 or 0.9. - Sensitive cells contain the most noise and are
flagged to warn the user. - Noise can be used with raking to preserve utility
of the data. -
11NOISE PROCEDURES
- Assign noise factor/multiplier to each unit in
universe in a systematic way. - Whenever unit is in survey, multiply all survey
values by assigned noise factor. - Noise added prior to tabulation.
- Overall distribution of the multipliers should be
symmetric about 1. - All establishments for same company should be
perturbed in same direction.
12NOISE - ADVANTAGES
- Simpler than cell suppression/not table specific.
- If normally many complementary suppressions,
provides more information. - Allows easier fulfillment of requests for special
tabulations, customized data products, and user
defined tables.
13NOISE - DISADVANTAGES
- Insufficient protection for single-unit cells.
Respondent concern about confidentiality. - Respondent/users perception of data quality.
- Affects all estimates, not just those at risk.
- Not required if survey has little suppression.
14PETROLEUM MARKETING SUPPRESSION PROGRAM
- Monthly publication
- 8,100 cells contained in 50 tables
- Four data dimensions
Geography Sales Type
Product Seller Type
15Initial Linear Sensitivity Formula
16p/q Rule
- S(T) x1 - (p/q) (T-x1-x2)
- Sensitive if S(X) is nonnegative.
- x1 largest reported value in cell
- x2 second largest reported value
- p/q input sensitivity parameter
- T cell total
17Complementary Suppression Objectives
- Minimize number of cells suppressed
- Suppress small volume cells
- Minimize suppression of totals
- Use consistent pattern month to month
- Automate the process
- Dont suppress key items
18Two 4-Dimensional Arrays
- SW array - Holds status of suppression switches
for each cell. - N array - Holds corresponding volumes for each
cell in SW array.
19Step OneIDENTIFY SUPPRESSED CELLS
SW Array
N Array
20Step TwoIDENTIFY CELLS FOR ADDITIONAL
SUPPRESSION
SW Array
N Array
21Step ThreeSUPPRESS A CELL AS A COMPLEMENT
SW Array
N Array
22Step FourSELECT ADDITIONAL CELLS FOR SUPPRESSION
SW Array
N Array
23Step FiveUPDATE SUPPRESSION FLAGS
SW Array
N Array
24Step SixMINIMIZE THE NUMBER OF SUPPRESSED CELLS
SW Array
25Step SevenCONTINUOUS REVIEW OF ALL TABLES
SW Array
N Array
26Step EightFINAL TABLE
27NO DATA ADJUSTMENT (1)
28NO DATA ADJUSTMENT (2)
29WITH DATA ADJUSTMENT (1)
30WITH DATA ADJUSTMENT (2)
31RESID SALES TO END USERS
32RECOMMENDED PRACTICES (1)
- Publish rules,
- not parameters.
- Do not disclose sample members
- Be aware of coalitions
- Use only subadditive primary disclosure rules
33RECOMMENDED PRACTICES (2)
- Consult with respondents and users
- Inform respondents of confidentiality provisions
- Use consistent policies throughout the
organization - Do not reveal parameter values.
- Audit tables to ensure protection
34RECOMMENDED PRACTICES (3)
- Omit names from
- analysis files
- Use only number codes
- to link respondents to questionnaires
- Destroy questionnaires after responses entered
into computer
35OTHER CONFIDENTIALITY ISSUES
- Embargo of data.
- Data sharing.
- Granting access through licensing or Research
Data Centers.
36INTERAGENCY CONFIDENTIALITY DATA ACCESS GROUP
(ICDAG)
- Formed by FCSM to
- -promote goals of SPWP22.
- -increase sharing of disclosure methods.
- Sixteen statistical agencies are members.
- Developing audit software.
- Developed a Web site of papers.
37ICDAG AUDIT SOFTWARE
- Issued task for development of software using
linear programming and SAS code. - Will measure quality of suppression using
measures such as protection range. - Will audit tables up to five dimensions.
- Must perform on different platforms, i.e. OS390,
UNIX, and PCs.
38THE END
39METHODS OF ENSURING CONFIDENTIALITY
- Michael Griffey
- Energy Information Administration
- February 2000