NHANES 1999-2004 - PowerPoint PPT Presentation

About This Presentation

Title:

NHANES 1999-2004

Description:

This homogeneity or clustering results in a reduction of our ... Using JK-1/Jackknife/'leave-one-out' procedure. ... What is your sample from each survey age? ... – PowerPoint PPT presentation

Number of Views:155

Avg rating:3.0/5.0

Slides: 44

Provided by: cdcGo

Learn more at: https://www.cdc.gov

Category:

more less

Transcript and Presenter's Notes

Title: NHANES 1999-2004

1
NHANES 1999-2004 Analytic Strategies
Deanna Kruszon-Moran, MS
2
Analyzing Data NHANES 1999-2004Preparing your
data files

Downloading demographic, questionnaire, exam and
lab files.
Files are no longer available as self-extracting
zip files.
Documentation and procedure files are now in
Adobe PDF format and can be viewed or accessed
directly via the web link
Clicking on the data link will allow you to store
the data file or open it directly with SAS.
Data files are in SAS transport (.xpt) format.

3
Know your data

Read the documentation !!
Read the documentation !!
Read the documentation !!
Read the documentation!!

4
Preparing your data files

Merging
Merge all files by sequence number to the
demographic file.
Verify the numbers of records merged and the
final sample number against the published
frequencies on the web.
Be sure they are what you expected and all merges
worked correctly.

5
Know your data

Run basic frequencies.
Know your target population.
Understand how item was measured
(how is the item defined, topcoded, recoded)
Recode variables as necessary
(example age groups, positive/negative lab
tests, high/low BP, high/low cholesterol etc.).
Recode unknown/refusals as missing data
(77, 99 recode to missing).
Check your coding run frequencies in SAS.

6
Know your data

Continuous Outcome Data
Look for outliers in your measure.
Run Proc Univariate.
Look for outliers among the weights.
Use Proc Univariate on the weight variable.
Outlying variables especially those with large
weights can really influence your estimates.
Look at normality.
Consider transformations.
Log, square root, power.

7
NHANES Sample Design

NHANES is a complex, multistage,
probability cluster design of the civilian,
noninstitutionalized US population.

8
Sample Weights

To analyze NHANES data you must use the sample
weights to account for

9
1. The base probability of selection
10
2. Over sampling

NHANE 1999-2004 - Oversampled
African Americans
Mexican Americans
Persons with low income
Adolescents aged 12-19
Persons aged 60

11
Non-response to the interview exam Sample
persons age 20
12
Non-response issues for NHANES

Non-response
Most components have some level of individual
item or component non-response.
ONLY non-response to the interview and exam has
already been accounted for in the weights.
All additional non-response to the outcome
measure of interest should be examined against
all possible predictors.
Potential biases should be discussed.
If non-response is high, re-weighting should be
considered.

13
Why weight?
Sample Subdomain US Population sample unweighted sample weighted
Non-Hispanic Blacks 13 25 12
Mexican Americans 9 28 9
12-19 year olds 12 24 12
14
Sample weights Which weights?
Weight Variables to Use Household Interview Data ONLY ANY Data from Exam/Lab/MEC Interview
Any 2 yrs of data (1999-2000 or 2001-2002 or 2003-2004) WTINT2YR WTMEC2YR
4 yrs of data (1999-2002) WTINT4YR WTMEC4YR
4 or 6 yrs of data (1999-2004) or (2001-2004) Combine appropriate 2 or 4 year weights as follows Combine appropriate 2 or 4 year weights as follows
15
Two, Four, Six, Eight - How can we estimate?

For 4 years of data from 2001-2004 -
MEC4YR 1/2 WTMEC2YR
For 6 years of data from 1999-2004
if sddsrvyr1 or sddsrvyr2 then
MEC6YR 2/3 WTMEC4YR / for 1999-2002 /
If sddsrvyr3 then
MEC6YR 1/3 WTMEC2YR / for 2003-2004 /
Only when analyzing years 1999-2002, you should
not combined 2 year weights but use the 4 year
weights provided.

16
Two, Four, Six, Eight - How can we estimate?

Future years of data will be combined similarly
For 6 years of data from 2001-2006 -
if sddsrvyr in (1,2,3) then
MEC6YR 1/3 WTMEC2YR
For 8 years of data from 1999-2006
if sddsrvyr1 or sddsrvyr2 then
MEC8YR 1/2 WTMEC4YR / for 1999-2002 /
if sddsrvyr3 or sddsrvyr4 then
MEC8YR 1/4 WTMEC2YR etc / for 2003-2006 /

17
Sample Weights - Subsamples

Subsamples and appropriate weights
Look at your primary variable of interest and the
corresponding weight.
Look at all other variables you want to combine
with it.
Are all from the interview? Exam? Subsample (i.e.
fasting, audiometry, dioxin, VOCs ) ?
Use the weight from the smallest subsample for
your analysis.
Be consistent!

18
Sample Weights - Subsamples

Subsamples and appropriate weights
Be careful about combining subsamples beyond MEC
VOCs, Interview Dioxin etc.
Combining subsamples such as Environmental AM
fasting could be problematic.
Some subsamples are mutually exclusive.
Weights were not designed for combining
subsamples and may not produce good estimates.

19
Preparing for Analyses

Subsetting the data for SUDAAN
If using MEC exam weights - SUBSET the data on
those MEC EXAMINED in SAS before using SUDAAN.
If using other subsample weights subset the
data on those in the subsample corresponding to
the weights you are using.
Then use the SUBPOPN statement in the SUDAAN
procedure to further subset your data by age,
gender etc. to reflect the target population you
are interested in analyzing.

20
Sample Weights

Example
You are interested in examining the association
of high triglycerides, blood pressure, and body
mass index (BMI) controlling for race/ethnicity
on females age 20-59 from the 6 years of data
from 1999-2004.

21
Sample Weights

Step 1 Determine the smallest sample
population for the analysis to determine the
correct weight to use.
Race/ethnicity, gender and age are in the
interview.
Blood pressure and weight come from the MEC exam
a subset of those interviewed.
Triglycerides were measured on a subsample of
those MEC examined who fasted for 8 hours and
came to the AM MEC exam.
Therefore, the fasting subsample is the smallest
subsample in the analysis and you would use the
AM fasting weights (WTSAF2YR and WTSAF4YR).

22
Sample Weights

Step 2 Combine weights in SAS prior to the
SUDAAN procedure for the 6 years from 1999-2004
If sddsrvyr in (1,2) then
WEIGHT6 2/3WTSAF4YR / 1999-2002 /
If sddsrvyr3 then
WEIGHT6 1/3WTSAF2YR / 2003-2004/

23
Sample Weights

Step 3 Subset your data set in SAS to reflect
the weight being used (AM fasting weights
WTSAF2YR or WTSAF4YR)
SAS Code
IF WTSAF2YR ne . or WTSAF4YR ne .

24
Sample Weights

Step4 Last specify the correct weight to use
using the weight statement in SUDAAN
and subset your data to obtain the subpopulation
of interest using the SUBPOPN statement in SUDAAN
(females age 20-59)
WEIGHT WEIGHT6
SUBPOPN riagendr2 and ridageyr gt 19 and
ridageyr lt 60

25
NHANES 1999-2000Variance Estimation

Why must you use the sample design to estimate
the variance?
NHANES is a cluster design
Individual within a cluster are more similar than
those in other clusters.
This homogeneity or clustering results in a
reduction of our effective sample size because we
choose individuals within cluster vs randomly
throughout the population.

26
NHANES 1999-2004Variance Estimation

Why must you use the sample design to estimate
the variance?
Variance estimates that do not account for this
intra cluster correlation are too low and biased.
Survey software such as SUDAAN or SAS Survey
procedures must be used to account for the
complex design and produce unbiased variance
estimates
These procedures require information on the
sample design (i.e. identification of the PSU and
strata) for each sample person.

27
NHANES 1999-2000Variance Estimation

For the initial 1999-2000 data release we
recommended
Using JK-1/Jackknife/leave-one-out procedure.
Required 52 replicate weights for each of 52
groups created. Only provided for 1999-2000.
Can still be used if you have software that can
produce the replicate weights.
Replicate weights for this procedure will no
longer be created on the data set.
Too cumbersome

28
NHANES 1999-2004Variance Estimation

We now recommend
Using the Taylor series (linearization) method
Same as that used in NHANES III.
We now provide Masked Variance Units (MVUs) in
place of primary sampling units (PSUs) to
maintain confidentiality.
Design variables are called - SDMVSTRA and
SDMVPSU.

29
Design Variables

SDMVSTRA and SDMVPSU
Found in the demographic file.
Found in all two year data sets and can be
combined for 4 or 6 or year data sets.
Can be used the same as the actual stratum and
PSU variables.
Produce variance estimates close to those using
the true design.
Data MUST be sorted by SDMVSTRA and SDMVPSU
first, before using SUDAAN.

30
Sample SUDAAN Code
31
Preparing for AnalysisSetting up the procedure
in SAS Surveymeans
32
Other data analysis issues from NHANES

Calculating Population Totals
Estimates of the number of persons in the U.S.
population with a particular condition must be
done carefully.
Recommended procedure is to
First, estimate the proportion with the condition
for each subdomain of interest.
Mutliply that by the population control totals
for that subdomain.
Tables are available on the NCHS web site with
the current March 2001 CPS control totals as part
of the analytic guidelines.

33
Other data analysis issues from NHANES

Calculating Population Totals
Estimates of number of persons with a condition
can be obtained by summing the weights of those
positive.
These estimates will be less reliable due to
item non response
and sampling error
Not the recommended method.

34
Analyzing within NHANES 1999-2004

Things to consider
Data released in two year cycles.
We STRONGLY RECOMMEND using two or more cycles (4
or more years )to produce reliable estimates.
Verify data items collected were comparable in
wording and methods.
When combining years remember to use correct
combined weights.

35
Analyzing trends with NHANES NHANES III to
NHANES 1999-2004

Things to consider
What is your sample from each surveyage?
How different was the question worded or the
interview methods ?
How different were the lab or exam methodologies
? Cutoffs used? Definitions?
For current NHANES 1999-2004 sample sizes may be
smaller depending on number of years measured -
especially in sub domains
Larger sampling variation.
May need to limit comparisons.

36
Race/Ethnicity NHANES 1999-2004

Two variables available
RIDRETH1
RIDRETH2

37
Race/Ethnicity NHANES 1999-2004

Ridreth1- Use for analyses of 1999-2004 data
alone.
1Mexican American
2other Hispanic
3non-Hispanic white
4non-Hispanic black
5other races including multiracial.
For 2 and 4 years of data we know there is
insufficient sample size to analyze other
Hispanics (group 2) alone or to analyze all
Hispanics.
Analyses to evaluate whether 6 years of data
(1999-2004) are sufficient to analyze these
Hispanic groups are ongoing.
Groups 2 and 5 can AND should continue to be
combined to represent all other races.

38
Race/Ethnicity NHANES 1999-2004

Ridreth2
Use for analyzing trends from NHANES III to
NHANES 1999-2004.
Most comparable to race/ethnicity variable
collected in NHANES III.
Coded as
1non-Hispanic white
2non-Hispanic black
3Mexican American
4other including Multi-Racial
5other Hispanic

39
Analyzing data from NHANES 1999-2004

Crude versus Age Standardized Estimates
Age distributions within survey samples vary by
racial/ethnic group.
Age distributions also vary by survey NHANES
III vs. NHANES 1999-2004.
When comparing estimates across racial/ethnic
groups or between surveys you may need to age
standardize.
Also present all age specific estimates!

40
Analyzing data from NHANES 1999-2004

When Age Standardizing
Use the 2000 U.S. Census Population for
consistency for both NHANES III and all NHANES
1999-2000 or above.
For guidelines and population proportions see the
website below for the Klein and Schoenborn HP2010
Statistical Notes on Age Adjustment using the
2000 Projected U.S. Population.
http//www.cdc.gov/nchs/data/statnt/statnt20.pdf

41
Analyzing data from NHANES 1999-2004

When Age Standardizing
In SUDAAN, use the STDVAR and STDWGT statements.
STDVAR variable name for the age groups.
STDWGT corresponding proportion of the 2000
U.S. Census population for that age subgroup.

42
Age standardization for NHANES

Crude vs. Age Standardized Estimates Example

Hepatitis B NHANES III Non-Hispanic White Non-Hispanic Black Mexican American
Crude Prevalence 3.1 (2.6-3.6) 11.9 (10.6-13.2) 3.6 (2.8-4.6)
Age Standardized 2.6 (2.2-3.1) 11.9 (10.7-13.3) 4.4 (3.4-5.6)
43
Analyzing Data from NHANES 1999-2004

Analytic Guidelines
Detailed guidelines for working with NHANES data
can be found at
http//www.cdc.gov/nchs/nhanes.htm
This document contains everything discussed today
and will continue to grow to include guidelines
for statistical tests, multivariate analyses,
modeling and more!
Web based tutorial also currently in creation.
Target date for release is Dec 31st 2006.