Title: Access to Confidential Data for Statistical Analysis
1Access to Confidential Data for Statistical
Analysis
Kenneth Harris, Director of Research Data Center
2National Center for Health Statistics (NCHS)
- Despite the wide dissemination of its data
through publications, CD-ROMs, etc., the
inability to release files with, for instance,
lower levels of geography, severely limits the
utility of some data for research, policy, and
programmatic purposes and sets a boundary on one
of the Centers goals to increase its capacity to
provide state and local area estimates.
3NCHS (cont.)
- In pursuit of this goal and in response to the
research communitys interest in restricted data,
NCHS established the Research Data Center (RDC),
a mechanism whereby researchers can access
detailed data files in a secure environment,
without jeopardizing the confidentiality of the
respondents.
4Research Data Center
- The NCHS Research Data Center, established in
1998, is a facility at the NCHS headquarters in
Hyattsville, Maryland, where researchers are
granted access to restricted data files needed to
complete approved projects. Restricted data
files may contain information, such as lower
levels of geography, but do not contain direct
identifiers (e.g., name or social security
number).
5Data Restrictions
- Section 308 (d) of the Public Health Service Act
and the NCHS Staff Confidentiality Manual do not
permit the release of data that are either
identified or identifiable to persons outside of
NCHS. -
6Data Restrictions (cont.)
- Identifiable data include not only direct
identifiers such as name, social security number,
etc., but also data that can serve to allow
inferential identification of either individual
or institutional respondents by a number of means.
7Data Restrictions (cont.)
- Research indicates that identifiability is
greatly enhanced if geographic identifiers for
state, county, census tract, block-group or block
are released on public use files.
8Key Issues for Research Data Availability
- CONFIDENTIALITY
- The dissemination of data in a manner that would
allow public identification of the respondent or
would in any way be harmful to him/her is
prohibited and the data are immune from legal
process.
9Key Issues for Research Data Availability (cont.)
- DISCLOSURE
- Disclosure relates to inappropriate attribution
of information to a data subject, whether an
individual or an organization. Disclosure occurs
when a data subject is identified from a released
file (identity disclosure), sensitive information
about a data subject is revealed through the
released file (attribute disclosure), or the
released data make it possible to determine the
value of some characteristic of an individual
more accurately than otherwise would have been
possible (inferential disclosure).
10Appendix I Rules for the Release of Micro Data
Files
- The data file must not contain any detailed
- information about the subject that could
facilitate identification and that is not
essential for research purposes (e.g., exact date
of the subjects birth). - Geographic places that have fewer than 100,000
people are not to be identified on the data file. - Characteristics of an area are not to appear on
the data file if they would uniquely identify an
area of less than 100,000 people.
11Appendix I Rules for the Release of Micro Data
Files (cont.)
- Information on the drawing of the sample which
might assist in identifying a data subject must
not be released outside the Center. Thus, the
identities of primary sampling units are not to
be made available outside the Center. - Before any new or revised micro data files are
published, they, together with their full
documentation, must be approved for publication
by the NCHS Director or Deputy Director. - A micro data file containing confidential data on
unidentified individuals or facilities may not be
released to any person or organization outside
NCHS until that person, or a responsible
representative of that organization, has first
signed the statement on the Order Form, whereby
he gives assurance that the data provided will be
used only for statistical reporting or research
purposes.
12Why NCHS Does Not Release Files With Lower Levels
of Geography
- Research suggests that in the case of personal
surveys nine commonly collected variables result
in the table below.
Population of Geopolitical Area Percent of Sample Identifiable
25,000 24
50,000 20
100,000 14
200,000 8
300,000 5
400,000 4
500,000 3
13Why NCHS Does Not Release Files With Lower Levels
of Geography (cont.)
- Notes A geopolitical area may be a county,
city, town, or other place with well- defined
boundaries. - In this case, identification refers to
certainty identification.
14How Does RDC Operate?
- On-Site Access
- Remote Access
- Staff Assisted Analytical Session
15User Procedures
- To gain access to NCHS restricted data through
- either method, user must
- Submit a research proposal.
- An advisory and proposal review committee
receives, reviews, and approves researcher
proposals - Proposals are evaluated primarily on the
confidentiality disclosure risk. - Scientific merit is not an evaluation criteria.
- Sign an affidavit of confidentiality and promise
not to use any method to attempt to identify
respondents.
16User Procedures (cont.)
- Not take any materials or equipment into RDC
unless approved by RDC staff. - Submit data files to be merged onto NCHS data
ahead of time all merging is done by RDC staff. - Subject all output and/or materials removed from
the RDC to a disclosure limitation review. - May not remove any NCHS restricted data files nor
linked data files.
17Researcher Affidavit of Confidentiality
- I certify that no confidential data or
information viewed or otherwise obtained while I
am a researcher in the National Center for
Health Statistics (NCHS), Research Data Center
(RDC) will be removed from NCHS. Further, I
understand that NCHS will perform a disclosure
review and must provide approval to me before I
remove any data from the RDC, whether it be in
electronic or paper form. I acknowledge NCHS
Confidentiality Statute, 308(d) of the Public
Health Service Act stated below and fully
understand my legal obligations to NCHS to
protect all confidential data. Further I
understand any violation I may perform is
punishable under 18 United States Code (USC),
1001 which carries a fine of up to 10,000 or up
to 5 years in prison.
18Researcher Affidavit of Confidentiality (cont.)
- NCHS 308(d) Confidentiality Statute - No
information, if an establishment or person
supplying the information or described in it is
identified, obtained in the course of activities
undertaken or supported under section 304, 305,
306, 307, or 309 may be used for any purpose
other than the purpose for which it was supplied
unless such establishment or person has consented
to its use for such other purpose and in the case
of information obtained in the course of health
statistical or epidemiological activities under
section 304 or 306, such information may not be
published or released in other form if the
particular establishment or person supplying the
information or described in it is identifiable
unless such establishment or person has consented
to its publication or release in other form.
19Researcher Affidavit of Confidentiality (cont.)
- 18 United States Code, 1001 - Deliberately making
a false statement in any matter within the
jurisdiction of any Department or Agency of the
Federal Government violates 18 USC 1001 and is
punishable by a fine of up to 10,000 or up to 5
years in prison. - ____________________ _______________
Researchers Signature Date - ____________________ _______________
- NCHS Witness Date
20Can Researcher Merge his/her Data with NCHS ?
- Must Interact with RDC staff to ensure
- that their data can be merged with the
- NCHS data.
- User-supplied data will be merged with
- NCHS data by RDC staff only.
- The NCHS RDC policy states that merged
- and user-supplied data will not be made
- available for analysis to anyone without
- the written consent of the user.
21The Cost per Project
- On Site
- 200 per day (2 day minimum)
- Remote Access
- NSFG-CDF 500/ year
- NHIS-polio 500/ year
- NHIS Linked Mort. File 250/Month
- NHANES Linked Mort. File 250/Month
22The Cost per Project (cont.)
- Files lt 130k records 500 per month
- Files gt 130k records 1000 per month
- Staff Assisted Variable
- File Construction and Setup
- For Mortality Files 250 per day
- For all Other Files 500 per day
23Do Doctors perform defensive Cesareans?
- Overview This topic re-examined the issues of
defensive medicine and state reforms designed
to limit malpractice risk on the use of cesarean
section delivery. - NCHS Data Used National Hospital Discharge
Survey (NHDS) - Years of Data Used 1980 through 1992,
inclusive. - Users Data Merged with NCHS? Yes
- Method of Access to NCHS Data Remote and
- On-site Access
- Statistical Software Used SAS
24Economic Model to Explain the Incidence of Sexual
Activity, Contraceptive Use, STD, and Pregnancy
Among Teenage Girls.
- Overview National Survey of Family Growth Data
provide extensive socio-demographic information
and reports of the sexual histories of these
women. Researcher focused on the effects of a
number of policies measured at the state-level.
These included - Parental notification of consent laws.
- Medicaid funding of abortions.
- Welfare generosity.
- NCHS Data Used National Survey of Family Growth
(NSFG) - Users Data Merged with NCHS? Yes
- Method of Access to NCHS Data Remote Access
- Statistical Software Used SAS
25Nursing Home Admission and Payment Source?
- Overview This project tested if patients with
Medicare were being discriminated against because
their reimbursement rate was significantly below
the private pay rate for nursing homes. - NCHS Data Used National Nursing Home Survey
(NNHS) - Years of Data Used 1985, 1995, and 1997
- Users Data Merged with NCHS? No
- Method of Access to NCHS Data Remote Access
- Statistical Software Used SAS
26Hardware and Software
- All RDC hardware and software are standard.
- Hardware
- Pentium IV computers with Windows 2000
- Software
- SAS (only language on ANDRE)
- Sudaan
- Fortran
- HLM
- Stata
- Limdep
- text editors/viewers
- Onsite workstations do NOT have email or internet
access - Only access to printer is through RDC staff
27Record Linkage for Epidemiologic Research
Accessing Linked data at the NCHS Research Data
Center
Christine S. Cox NCHS Data Users Conference July
12, 2006
28What is Record Linkage?
Administrative records
NCHS Surveys
Linked Data File
29NCHS Linked Data Major Activities
- Mortality
- National Death Index
- Health Care Utilization and Costs
- Medicare Data
- Retirement and Disability
- Social Security Data
30NCHS Linked Data Mortality
- Eligibility status
- Assigned vital status
- Date of death
- Age at death
- Underlying and multiple causes of death
- Adjusted sample weights
31Research Potential of Linked Mortality Data
The Income-Associated Burden of Disease in the
United States P Muennig, P Franks, H Jia, E
Lubetkin and MR Gold
Excess Deaths Associated with Underweight,
Overweight, and ObesityKM Flegal, BI Graubard,
DF Williamson MH GailJAMA. 20052931861-1867.
Living and Dying in the USA Behavioral, Health,
and Social Differentials of Adult Mortality RG
Rogers, CB Nam, RA Hummer
A Semiparametric Analysis of the Body Mass
Indexs Relationship to Mortality JT Gronniger
32NCHS Linked Data Medicare
- Medicare entitlement and health care utilization
and payment data for 1991-2000 - Denominator file
- MEDPAR Inpatient hospitalization
- MEDPAR Skilled nursing facility
- Hospital outpatient
- Home Health Care
- Hospice
- Carrier (physician/supplier Part B file)
- Durable Medical Equipment
33Research Potential ofLinked Medicare Data
- Examine risk factors for health conditions
- Examine reliability of survey data
- Examine survey report of disability with program
participation eligibility criteria - Compare survey reported health conditions to
claims records - Examine disparities in Medicare service
utilization
34NCHS Linked Data Retirement/Disability
- Social Security data from Retirement, Survivors,
and Disability Insurance (RSDI) and Supplemental
Security Insurance (SSI) programs - Master Beneficiary Record (MBR)
- 1962-2003
- Payment History Update System (PHUS)
- 1984-2003
- Supplemental Security Record (SSR)
- 1974-2003
35Research Potential of Linked Social Security Data
- Examine reliability of survey information for SSA
program participation and benefits - Compare the health characteristics of those who
take early (age 62) Social Security benefits to
those who postpone benefits - Policy analysis using validated survey data
- Predicting the number of people who will become
disabled based upon survey reported health
conditions - Determining whether current disability
entitlement funding levels will be adequate as
the population ages
36Summary NCHS Data Linkage
37www.cdc.gov/nchs/rd/nchs_datalinkage/data_linkage
_activities.htm
38Why cant you just give me the data?
- NCHS does not own the linked administrative
data - NCHS data confidentiality rules prohibit the
release of potentially identifiable data
special considerations concerning the protection
of linked data - The RDC is the only option for access for now.
39Overview Data Access Procedures
- Proposal Requirements
- Access Methods
- Helpful Tips
- Where to get help?
40Proposal Requirements
- Proposal is evaluated by review committee
- Review criteria
- Scientific and technical feasibility
- Availability of RDC resources
- Disclosure risk for restricted information
- The extent to which project is in accordance with
the mission of NCHS - Special note NCHS does not try to determine if
proposals are duplicative
41Proposal Requirements
- Cover letter
- Project title
- Abstract (maximum 300 words summarizing project)
- Full contact information
- Institutional affiliation
- Mail address, phone, email
- Dates of proposed time at RDC (or indication of
using remote access) - Source of funding for proposed research
42Proposal Requirements
- Study background
- Key study questions or hypotheses
- Public health benefits
- Methods
- Analytic approach and statistical methods
- Statistical software requirements
- Description of intended output for nondisclosure
review, e.g. - Table shells
- Model equations
- Test statistics that researcher plans to remove
from RDC
43Proposal Requirements
- Explanation of why restricted data are needed,
e.g. describe why publicly available data are
insufficient - Summary of data requirements to be included in
analytic file - Identification of sample
- Identification of variables
- Description of additional data to be supplied by
researcher to be merged with NCHS or other data
source (must clearly identify source of other
data)
44Proposal Requirements Appendices
- Current Curriculum Vitae or resume for each
investigator - Data dictionary complete listing of specific
data requested and its source(s) and indicate if
public use or restricted access variables - specific files and years
- sample
- variables (dependent, independent,
matching/linking)
45Proposal Requirements Appendices
- For remote-access applicants
- Description of the computer and email system to
be used to receive output - Security provisions for the computer and email
systems - For students
- Letter from department chair or academic advisor
stating that student is working under the
direction of the department
46Overview RDC Data Access Procedures
- Proposal Requirements
- Access Methods
- Helpful Tips
- Where to get help?
47Access Methods
- Once approved, three methods to access restricted
data - on-site - use local computing resources in the
NCHS RDC, Hyattsville, MD - remote submit programs electronically to be
executed in the RDC with output returned by email - staff assisted RDC staff provide on-site
programming for off-site approved researchers - For all methods of access, restricted data files
remain in RDC and output is inspected for
disclosure violations
48On-Site Access
- RDC staff constructs necessary data files,
including merged user data - Most statistical packages available with
sufficient lead time - Output subject to disclosure review
- Open only during normal working hours
49Remote Access Method
- RDC staff constructs necessary data files,
including merged user data - SAS programs only (certain procedures and
functions not allowed) additional software
options expected - Both submitted programs and output undergo a
programmed disclosure limitation review
50RDC Staff-assisted Programming Method
- Subcontract with the RDC staff to perform
programming tasks - Useful for those planning to use statistical
software not available for the remote system and
who are not able to travel to the RDC facility - Cost is estimated for each research project
51Overview RDC Data Access Procedures
- Proposal Requirements
- Access Methods
- Helpful Tips
- Where to get help?
52RDC Helpful Tips
- Be clear about research and data requirements
(helps to determine feasibility of project) - Clearly identify the sample to be included in the
analytic file - Provide data dictionaries for both
- Public use data
- Restricted data
- Provide examples of expected output
53Overview RDC Data Access Procedures
- Proposal Requirements
- Access Methods
- Helpful Tips
- Where to get help?
54Visit the RDC at www.cdc.gov/nchs/rd/rdc.htm
or email rdca_at_cdc.gov
55LINKED DATA, CONTEXTUAL DATA, and
GEO-CODINGON-SITE and STAFF-ASSISTED DATA ACCESS
Christopher Rogers Research Data
Center cor2_at_cdc.gov
56Why Link Data Sets?
- Improve modeling and make use of existing data.
- Compensate for increased difficulties taking
surveys. - Open your mind.
- Common Example
- Economic variables versus Ethnic variables
57Historical Trends
- More linking of scientific data sets between
government agencies. Confidential Information
Protection and Statistical Efficiency Act of 2002
(CIPSEA.) - Confused political and social situation in US.
58Quality NCHS Resources
- Linked Birth and Infant Death Data with Fetal
Death Data. - Geo-coded NHIS 1986-2003 (2004-2005).
- Geo-coded NHANES III.
- Cycles 4, 5, and 6 NSFG Contextual Data.
- Linked Data Sets described earlier.
59Linked Birth and Infant Death
- Designed to study factors in infant death.
- Links birth and death certificates for deaths
under one year of age. Includes fetal deaths for
1995-1997 - Years 1983-1991 and 1995-1997
- Numerator File (for deceased children) Parental
information and behavior, prenatal care, infant
health variables, demographics, cause of death. - Denominator File (for control group) Parental
information and behavior, prenatal heath, infant
health, demographics. - Fetal Death Data 1995-1997
- Restricted Data County/City of mothers
residence or County of childs birth or death
when under 250,000. 100,000 starting 1989.
60Data Example
- From the Division of Vital Statistics. Proposals
or questions can go either to the RDC or the DVS. - Fetal Death Data portion. Given 1989-1999.
- Linked to county level contextual data.
- Goal to model fetal death with emphasis on ground
water quality. Estimates death rates for each
county.
61Geo-Coded NHIS
- National Health Interview Survey. RDC has access
to files from 1963 to present. Previously
geo-coded households for 1986-1994. Recently
geo-coded by RDC from 1995-2003. 2004-2005
coding in progress. - State (2 digits), County (3 digits), Tract (6
digits), Block Group (1 digit), and Block (3-4
digits) levels. Households coded to 1990 and
2000 Censuses.
62Geo-Coded NHANES III
- NHANES III is also linked to NDI Mortality data.
- NHANES III has been geo-coded twice. The RDC has
done it at the same level of detail as NHIS. - Continuous NHANES has not been geo-coded yet.
- Example Large project with neighborhood,
economic, ethnic, and individual medical and
behavioral variables. Multi-level models.
63NSFG Contextual Data
- Contextual variables available with Cycles 4, 5,
and 6. Supplied for each individual in sample. - Cycle 6 1054 contextual variables at the state,
county, tract, and block group levels. For
respondent addresses in 2000 and 2002. - Contextual data include both economic and
demographic characteristics of locations. Easily
merged by case ID to individual characteristics,
behaviors, and histories.
64Simple NSFG Example
- A simple example relating economics on state
level, ethnicity, and behavior, but not using
contextual variables. - Treatment States given waiver to offer more
family planning services (FPS). - Questions
- FPS effects on behavior
- FPS effect on pregnancy rates
- Differential impacts across demographic subgroups?
65Change of Topic Accessing Data
- On-site access to data at the RDC in Hyattsville.
- Staff-assisted remote access to data via e-mail.
- Researchers often use both types of access.
- Potential Designated Agent status. (CIPSEA)
- The RDC has put many resources into automated
remote access.
66On-Site Access
- Rules in 24 page file GuidelinesRDC11-8-05.pdf
available on-line. - The RDC and NCHS surveys have knowledgeable
professional staffs that review proposals
carefully. Clients can only remove what has been
approved. Checked by staff. - Exploratory Data Analysis. If needed, ask.
Recent example Checking general shapes of
variables for model validity. OKed by survey. - Modeling needs. Recent example Nested
randomized geo-codes. - Estimation problems. Example Single PSU in a
Stratum.
67Staff-Assisted Remote Access
- Analysis done through a particular staff member.
Usually efficient, but could be very busy. - Staff member determines costs based on time.
- Staff usually not asked to do much programming.
- Staff creates data, runs e-mailed programs,
checks, and returns output to researcher. - Staff can do exploratory analysis, if needed.
- Staff can help check modeling problems.
- Commonly done after on-site visit.
68Our Mission
- The RDC has a professional staff dedicated to
helping researchers uncover knowledge and advance
understanding.
69Remote Access System
Vijay Gambhir
70Remote Access System
- Envisioned as an integral Part of RDC
- Pre onsite usage
- Post onsite usage
- Super store/ Convenience store
71Basics of Remote Access System
- Object oriented, event driven system based upon
the principles of distributed computing - About two years of development efforts
- Set of applications called in service by resident
component - Advanced pattern recognition techniques
72Analytic Data Research by Email (ANDRE)
- NCHS has been providing remote data access to
researchers through ANDRE since April 1998. - In the past five years, ANDRE has served 45
different data analysts and executed over 9,500
SAS programs for their research programs.
73Main Features of ANDRE
- Completely automated system
- Operates round the clock
- without any human intervention
- Registered subscribers only
- Proposals already reviewed and approved
- Have an agreement with NCHS/RDC
- Unlimited Access during the subscription period
74Data Requests
- Registered user can submit data requests by email
from anywhere and at any time. - Results of the data request released to a
specified email address that has been certified
as secure by the subscriber and approved by
NCHS/RDC.
75Authentication
- Multi-levels of system security
- Submission syntax
- User id
- Password
- Email/code word
- Package
- Path info
76Data Request Analysis
- Compliance with the disclosure limitation
constraints of NCHS - Integrity of the system
- Resource constraints (CPU time Storage
requirements) - Protection of ANDREs work environment
77Prevention of Direct Disclosure
- Cleaning up of the Log File
- Categorization of SAS commands/words
- Forbidden Commands
- Modifications to the Commands
- Output suppression
78Sample Original Log
- 1 options nocenter
- 2 Data one
- 3 Infile 'd\nchs\respnd95.dat' lrecl13064
- 4 Input
- 5 TODAYSPG 6847-6847
- 6 CONSTAT1 11934-11935
- 7 CONSTAT2 11936-11937
- 8 CONSTAT3 11938-11939
- 9 CONSTAT4 11940-11941
- 10 SEX1MTHD 11945-11946
- 11 POST_WT 12350-12359
- 12 if constat1 'ab' then vjvar1 else vjvar
2 - 13 WGT1000POST_WT/1000
- 14 title 'NSFG cycle 1995'
-
- NOTE Character values have been converted to
numeric values at the places given by
(Line)(Column). - 1215
- NOTE The infile 'd\nchs\respnd95.dat' is
- File Named\nchs\respnd95.dat,
79Sample Original Log (cont.)
-
- 12901 11232521101 0526721310303392181193101
1103 01030000000321120000392702210611511200403
1344 1316 - 13001 622501001006034
- TODAYSPG1 CONSTAT15 CONSTAT288 CONSTAT388
CONSTAT488 SEX1MTHD1 POST_WT2545.7569 vjvar2
WGT10002.5457569 _ERROR_1 - _N_20
- NOTE 10847 records were read from the infile
'd\nchs\respnd95.dat'. - The minimum record length was 13064.
- The maximum record length was 13064.
- NOTE The data set WORK.ONE has 10847
observations and 9 variables. - NOTE DATA statement used
- real time 39.88 seconds
- cpu time 12.10 seconds
-
-
- 15 proc freq
- 16 tables CONSTAT1 vjvar
- 17 run
-
80Sample Cleaned Log
- 1 options nocenter
- 2 Data one
- 3 Infile 'd\nchs\respnd95.dat' lrecl13064
- 4 Input
- 5 TODAYSPG 6847-6847
- 6 CONSTAT1 11934-11935
- 7 CONSTAT2 11936-11937
- 8 CONSTAT3 11938-11939
- 9 CONSTAT4 11940-11941
- 10 SEX1MTHD 11945-11946
- 11 POST_WT 12350-12359
- 12 if constat1 'ab' then vjvar1 else vjvar
2 - 13 WGT1000POST_WT/1000
- 14 title 'NSFG cycle 1995'
-
- NOTE Character values have been converted to
numeric values at the places given by
(Line)(Column). - 1215
- NOTE The infile 'd\nchs\respnd95.dat' is
- File Named\nchs\respnd95.dat,
81Sample Cleaned Log (cont.)
- NOTE 10847 records were read from the infile
'd\nchs\respnd95.dat'.
- The minimum record length was 13064.
- The maximum record length was 13064.
- NOTE The data set WORK.ONE has 10847
observations and 9 variables.
- NOTE DATA statement used
- real time 39.88 seconds
- cpu time 12.10 seconds
-
-
- 15 proc freq
- 16 tables CONSTAT1 vjvar
- 17 run
-
- NOTE There were 10847 observations read from the
data set WORK.ONE.
- NOTE PROCEDURE FREQ used
- real time 0.49 seconds
- cpu time 0.04 seconds
82Forbidden Commands
- Commands That Pose Unacceptable Disclosure
Risks - OR
- Disallowed to Protect Integrity/Internal
Environment of ANDRE - Add firstobs report iml
- Print first. Pctn nofreq
- Obs last. Pctsum nocum
- Firstobs nocol tabulate editor
- Browse summary list put
83Commands Modification
- Modify users program to enforce restrictions on
options allowed with certain SAS procedures to
prevent objectionable info appearing in the
output - PROC MEANS n mean std
84Output Suppression
- Wiping out of extreme values from the output of
Proc Univariate. - Suppressing complete output line (Procs Means,
corr, Univariate, etc) where sample size less
than the minimum acceptable value.
85Proc Means Suppression
- The MEANS Procedure
- Variable Label
N Mean Std Dev - --------------------------------------------------
------------------------------------------ - EXPEND_R Current expend/pupil in public
schl/1000 5424 5.0830820 1.3958710 -
Values Suppressed - RPUB87 exp. for contr. serv. and supplies
1997 5424 23472052.60 18806802.86 - RPUB92 exp. for contr. serv. and supplies
1997 5424 34800922.98 30481634.59 - PRGPRO Coordinated Pregnancy Prevention
Program 1708 0.0679157 0.2516749 - HIVED HIV/AIDS Education
1708 3.5146370 0.8044378 -
Values Suppressed - PRGPRO87 Coordinated Pregnancy Prevention
Program 5424 0.0540192 0.2260764 - HIVED87 HIV/AIDS Education
5424 3.4968658 0.8008324 - WT_PER15 Wt females aged 15-19/total 15-19
5424 0.7279681 0.1265796 - BK_PER15 Bk females aged 15-19/total 15-19
5424 0.1409869 0.0932332 - HS_PER15 Hs females aged 15-19/total 15-19
5424 0.0962413 0.1055191 - TEENMMC2 Teenmom by cohort (1,2,3r)
1201 1.7119067 0.7715351 - C18_2_1S R in C2 (vs 1) at 18-19 endpt (1,2)
1770 1.5248588 0.4995228 - TM2_1S18 R tnmm in Coh 2 (vs 1)-age 18 _at_ ext
358 1.4804469 0.5003168
86Proc Univariate OutputUnsuppressed
- The SAS System
9 -
1409 Sunday, October 24, 1999 - Univariate
Procedure - VariableAVHRATET
- Moments
Quantiles(Def5) - N 2283 Sum Wgts 2283
100 Max -0.25314 99 -1.62008 - Mean -4.66219 Sum -10643.8
75 Q3 -3.56179 95 -2.37588 - Std Dev 1.892017 Variance 3.57973
50 Med -4.50491 90 -2.79152 - Skewness -2.11919 Kurtosis 6.892929
25 Q1 -5.30374 10 -6.07639 - USS 57792.36 CSS 8168.944
0 Min -13.5463 5 -7.19645 - CV -40.5821 Std Mean 0.039598
1 -12.7402 - TMean0 -117.738 PrgtT 0.0001
Range 13.29321 - Num 0 2283 Num gt 0 0
Q3-Q1 1.741949 - M(Sign) -1141.5 PrgtM 0.0001
Mode -13.5463 - Sgn Rank -1303593 PrgtS 0.0001
- Extremes
- Lowest Obs
Highest Obs - -13.5463( 1547)
-0.90519( 649) - -13.5397( 1836)
-0.81756( 1094)
87Proc Univariate OutputSuppressed
- The SAS System
9 -
1409 Sunday, October 24, 1999 - Univariate
Procedure - VariableAVHRATET
- Moments
Quantiles(Def5) - N 2283 Sum Wgts 2283
100 Max -0.25314 99 -1.62008 - Mean -4.66219 Sum -10643.8
75 Q3 -3.56179 95 -2.37588 - Std Dev 1.892017 Variance 3.57973
50 Med -4.50491 90 -2.79152 - Skewness -2.11919 Kurtosis 6.892929
25 Q1 -5.30374 10 -6.07639 - USS 57792.36 CSS 8168.944
0 Min -13.5463 5 -7.19645 - CV -40.5821 Std Mean 0.039598
1 -12.7402 - TMean0 -117.738 PrgtT 0.0001
Range 13.29321 - Num 0 2283 Num gt 0 0
Q3-Q1 1.741949 - M(Sign) -1141.5 PrgtM 0.0001
Mode -13.5463 - Sgn Rank -1303593 PrgtS 0.0001
-
88Proc Univariate OutputSuppressed (sample size
1)
- Univariate
Procedure - VariableFREQ (sum) freq
- Moments
Quantiles(Def5) -
- Serious Disclosure limitation Violations
- Values too low to release
- Output of Proc Univariate withheld
89Proc Freq Suppression (One-Way Tables)
- Suppress at least two consecutive rows to prevent
derivation of suppressed values from cumulative
totals. - Disallow single row output.
90One-Way Freq TableSuppressed
- Cumulative Cumulative
- LOGRNTOPAT Frequency Percent
Frequency Percent - --------------------------------------------------
--------------- - 0.2277839309 ????? ?????
????? ????? - 0.2277839309 ????? ?????
????? ????? - 0.2305236586 5 0.08
6429 97.99 - 0.231111721 5 0.08
6434 98.06 - 0.232058915 ????? ?????
????? ????? - 0.232058915 ????? ?????
????? ????? - 0.2436220827 ????? ?????
????? ????? - 0.2436220827 ????? ?????
????? ????? - 0.2498117984 6 0.09
6456 98.40 - 0.2504106777 6 0.09
6462 98.49 - 0.2513144283 18 0.27
6480 98.77 - 0.2595111955 6 0.09
6486 98.86 - 0.2670627852 ????? ?????
????? ????? - 0.2670627852 ????? ?????
????? ????? - 0.2736958305 5 0.08
6500 99.07 - 0.2814124594 5 0.08
6505 99.15
91One-Way Freq Tablesuppressed (cont.)
- Cumulative Cumulative
- LOGRNTOPAT Frequency Percent
Frequency Percent - --------------------------------------------------
--------------- - 0.3403258059 ????? ?????
????? ????? - 0.3403258059 ????? ?????
????? ????? - 0.3715635564 6 0.09
6537 99.63 - 0.3856624808 ????? ?????
????? ????? - 0.3856624808 ????? ?????
????? ????? - 0.6931471806 6 0.09
6550 99.83 - 1.2527629685 ????? ?????
????? ????? - 1.2527629685 ????? ?????
????? ????? - 1.2527629685 ????? ?????
????? ?????
92Proc Freq Suppression (Two-way Tables)
- Rows and columns totals preserved
- Cells with values less than the acceptable
minimum are suppressed - Additional suppressions to ensure that no row and
no column has single suppression. - Logical stitching of horizontal and vertical
splits.
93Proc Freq Two-way Tables Suppression
- TABLE OF FAMREL BY FAMSIZER
- FAMREL FAMSIZER
- Frequency
- Percent
- Row Pct
- Col Pct 2 3
4 5 Total - -------------------------------
---------- - 3 94 388
792 533 2206 - 3.97 16.40
33.47 22.53 93.24 - 4.26 17.59
35.90 24.16 - 98.95 96.28
96.12 94.34 - -------------------------------
---------- - 4 ?????? 9
22 27 104 - ?????? 0.38
0.93 1.14 4.40 - ?????? 8.65
21.15 25.96 - ?????? 2.23
2.67 4.78 - -------------------------------
---------- - 6 ?????? 6
10 5 56 - ?????? 0.25
0.42 0.21 2.37
94Proc Freq Two-way Tables Suppression (Cont.)
- checking frequencies
4 -
1201 Thursday, May 6, 1999 - TABLE OF FAMREL BY
FAMSIZER - FAMREL FAMSIZER
- Frequency
- Percent
- Row Pct
- Col Pct 6 7
8 9 Total - -------------------------------
---------- - 3 209 98
19 73 2206 - 8.83 4.14
0.80 3.09 93.24 - 9.47 4.44
0.86 3.31 - 90.48 83.05
59.38 74.49 - -------------------------------
---------- - 4 13 10
?????? 12 104 - 0.55 0.42
?????? 0.51 4.40 - 12.50 9.62
?????? 11.54 - 5.63 8.47
?????? 12.24 - -------------------------------
----------
95Fully Automated and Expert system?
- Fully automated?
- Reboot to deal with memory leakage.
- Confidentiality Expert? How reliable?
- As good as underlying algorithms. Needs constant
monitoring
96(No Transcript)
97(No Transcript)
98(No Transcript)
99(No Transcript)
100(No Transcript)
101What is new?
- Improved and expanded hardware platform
- Two machines dedicated to heavy remote access
usage - Three additional machines dedicated to general
remote access usage
102What is New?
- Sudaan now available to remote access users
- Proc Crosstab
- Proc Rlogist
- Proc Regress
- Proc Multilog
- Proc Survival
103What is new
- Proc Descript
- Other new Sudaan procedures will be made
available shortly - Plans to make Stata available through remote
access
104What is new
- Web Component of ANDRE under construction.
- On-line scanning of users code
- Valuable research tools and information readily
available to the users.
105Contact Information
- For general Questions/Comments
- Email rdca_at_cdc.gov Phone (301) 458-4732
- For On-site Info
- Email Neb9_at_cdc.gov Phone (301) 458-4097
- For Remote Access Info
- Email vgambhir_at_cdc.gov Phone (301) 458-4226