Issue in Analysis and Presentation - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Issue in Analysis and Presentation

Description:

... in Analysis and Presentation. of Dietary Data. Nutritional Epidemiology. Walter Willet. The underlying objectives of data analysis and presentation are to learn ... – PowerPoint PPT presentation

Number of Views:141
Avg rating:3.0/5.0
Slides: 48
Provided by: ibmsSin
Category:

less

Transcript and Presenter's Notes

Title: Issue in Analysis and Presentation


1
  • Issue in Analysis and Presentation
  • of Dietary Data
  • Nutritional Epidemiology
  • Walter Willet

2
  • The underlying objectives of data analysis and
    presentation are to learn as much as possible
    from the available data and to present what has
    been learned to readers completely and w/ maximum
    clarity.
  • The approaches for presentation should vary
    depending on the intended readership
  • - simpler analytic approaches and greater use
    of figures may be appropriate for a general
    medical journal
  • - for an epidemiologic publication, more
    complex methods and primarily tabular results may
    be best.

3
  • DATA CLEANING BLANKS and OUTLIERS
  • A common issue w/ dietary data is the treatment
    of questionnaires in which some food items have
    been left blank.
  • Two issues arise frequently (1) should subjects
    w/ more than a specific number of blanks be
    excluded, and (2) how should blanks be treated in
    calculating nutrient intakes?
  • It is useful to understand why participants may
    not have completed a response for a specific
    food. This could be due to inattention or
    carelessness or because the participant did not
    eat the food (even though they should have
    answered never).
  • Several patterns can be seen when examining
    questionnaires w/ multiple blank items
  • - For many forms, blank items are interspersed
    w/ plausible and seemingly carefully completed
    responses to other foods and the never category
    is not used, suggesting that blanks meant that
    the food was not consumed.
  • - In occasional questionnaires, whole sections
    are left blank, suggesting that they were missed.
  • ?In calculating nutrient intakes, it seems best
    to consider intermittent blanks as no consumption
    of the food.

4
  • A firm rule for allowable number of blanks cannot
    be made for all situations, and it is desirable
    to conduct evaluations of decision rules whenever
    possible
  • - Nurses Health Study allowed up to 70 (out
    of about 130 items) blanks as long as no whole
    sections or pages were blank.
  • - This criterion has been evaluated empirically
    within a validation study by examining the
    correlation between number of blanks on a
    questionnaire and measurement error (calculated
    for each person as the absolute value of the
    difference between the FFQ and diet record
    values).
  • - For all nutrients examined there was no
    appreciable correlation.

5
  • Once nutrients are calculated, some responses
    will be implausibly high or low, necessitating
    additional decisions regarding allowable ranges.
  • - The use of total energy intake as a primary
    criterion can be justified because it is the only
    nutrient for which intake is physiologically
    fixed within a fairly narrow and predictable
    range.
  • - It is generally considered that total energy
    intakes below approximately 1.2 times the resting
    or basal metabolic rate estimated from age,
    gender and weight are unlikely to be correct, and
    intakes of gt4000 kcal/day are unlikely to be true
    for even relatively active men.
  • - Usually an arbitrary allowable range of 500
    to 3500 kcal/day for women and 800 to 4000
    kcal/day for men are used. Although the extremes
    within this range rarely are correct, adjustment
    of nutrient intakes for total energy intake will,
    to a large extent, compensate for overall under
    or overreporting.

6
  • Dietary data are highly sensitive to coding and
    data entry errors because nutrients are
    calculated from large numbers of foods.
  • - Miscoding a teaspoon to a cup for one food on
    a questionnaire or in a 1-week food record can
    seriously misclassify an individual for many
    nutrients.
  • - Multiple choice formats and machine-readable
    questionnaires are less prone to such errors, but
    if any hand-coded or open-ended questions are
    included, such errors may occur.
  • - Extreme values are primarily at the high end
    of the distribution due to the skewed
    distributions of most nutrients, and they can be
    heavily influential when nutrients are considered
    as continuous variables.
  • - Sometimes such extreme values will be
    indicative of improper completion of
    questionnaires, such as marking the top category
    for all foods in a section.
  • - In other cases, coding, data entry, or food
    composition database errors may be discovered.
  • - Some values will just reflect unusual food
    intake patterns without obvious error.

7
  • CATEGORIZED vs. CONTINUOUS PRESENTATION of
    INDEPENDENT VARIABLES
  • Intakes of nutrients and food groups are
    primarily continuous.
  • As the traditional presentation of epidemiologic
    data has been in the form of rate ratios and rate
    differences for levels of exposure, and
    statistical methods have been developed for such
    purposes, it is not surprising that most
    continuous dietary data have been categorized for
    analysis in nutritional epidemiologic studies.
  • Approaches for the creation of categories (1)
    use of arbitrarily defined quantiles (e.g.,
    quartiles or quintiles) (2) use of standard
    round-numbered cutpoints (3) use of cut points
    that are determined a priori to have biologic
    relevance such as RDA or the intake at which an
    enzyme is saturated.
  • - Finer divisions of extreme categories may
    often useful to extend an examination of the
    dose-response relationship.

8
  • Arguments for using continuous variables
  • (1) the greatest statistical power is
    provided by a continuous variable if the function
    reasonably fits the data, although this advantage
    may be slight w/ the use of five of more
    categories combined w/ an overall test for trend.
  • (2) when used as a covariate, a crudely
    categorized variable may not fully account for
    the effect of that variable, resulting in
    possible residual confounding.
  • (3) the use of continuous variables may
    facilitate comparisons among studies because a
    single relative risk is reported for an
    arbitrarily specified increment of intake (e.g.,
    RR for 100 mg of cholesterol per day) that does
    not depend on the distribution of the dietary
    factor in the particular population or on the
    choice of cut-points of individual studies.
  • Tests for nonlinearity, such as the addition of a
    quadratic term, can be used to evaluate the
    presence of nonlinearity.

9
  • GRAPHICAL PRESENTATION of DATA
  • The central data of epidemiologic studies should
    be presented in numerical form to provide the
    actual numbers of exposed subjects and the
    numbers of endpoints.
  • Judicious ancillary use of figures summarizing
    the primary findings can be helpful to many
    readers. Also, a clear and attractive summary
    figure is likely to enhance the probability that
    others will include your data in their
    presentations.
  • Single Variable Effects
  • For presenting the effects of one or a few
    dichotomous exposure variables, a graphic display
    provides little additional perspective and tables
    should be used.
  • With multiple ordinal categories, a figure can
    assist in visualizing an overall relationship
    (Fig 13-1).
  • The use of histograms to present RRs has
    generally been disfavored because this parameter
    is more correctly represented as a point, and
    C.Is are less readily presented.

10
(No Transcript)
11
  • It might be argued that, if absolute risk is
    really of interest, then the dependent variable
    should be expressed as absolute risk. However,
    one reason that relative rather than absolute
    risks are generally utilized in epidemiologic
    studies is that age is usually the most powerful
    determinant thus, absolute risks are usually
    arbitrary depending on the age to which the data
    have been standardized.
  • The actual width of categories of the exposure
    variable should be represented in the graphic
    display.
  • Example in Fig 13-1, trans-fatty acid intake
    was divided into quintiles ? the distances
    between quintiles in the figure were made to be
    proportional to the differences between quantile
    medians.
  • Usual quantiles
  • two groups, tertiles, quartiles, quintiles

12
  • Actual vs. Predicted Relationships
  • In graphic displays of the relationship between
    two variables, one issue that naturally arises is
    whether to provide the actual data or the
    prediction from a model derived from the data.
  • - Many epidmiologists believe that the actual
    data should be provided so that the reader can
    view the findings w/o being forced to assume the
    appropriateness of any model assumptions, such as
    whether a linear relationship adequately
    describes the relationship.
  • - The best solution may be to do both for
    example, provide the data points for categories
    and superimpose the fitted regression line (Fig
    13-2).

13
(No Transcript)
14
  • One clearly inappropriate approach is to
    analyze the data as continuous, but then present
    the findings as though they were categorical, for
    example, by displaying the odds ratios (and C.Is)
    for multiple discrete levels of intake that are
    all based on a single regression coefficients.
  • ? Provides the potentially misleading impression
    of a clearly monotonic relationship and C.I. that
    are too narrow for a specific level because they
    are based on the overall data.

15
  • Display of Joint Effects
  • A common approach to represent the joint effects
    of two exposure variables is the 3-D histogram
    (Fig 13-3A).
  • - This has been criticized by some for the same
    reasons that the use of histograms to present
    univariate findings has been discouraged.
  • ? alternative display using points (Fig
    13-3B).
  • - A display of C.I. is usually incompatible w/
    the 3-D histogram, but this may also be
    problematic w/ the use of points as their C.I.
    are frequently overlapping.
  • ? critical C.I. will usually need to be
    presented in tabular form or in text.

16
(No Transcript)
17
  • Locally Smoothed Regression Curves and Regression
    Splines
  • Locally smoothed regression curves and regression
    splines have been used increasingly to display
    epidemiologic findings w/ a continuous exposure
    variable.
  • - With smoothed regression curves, the values
    of the dependent variable, such as a relative
    risk, are estimated for a continuously moving
    window of values of the independent variable.
  • - In using regression splines, separate linear
    or nonlinear functions are fit between specified
    points (knots) on the exposure distribution,
    and functions are connected at the knots to
    produce a smooth curve.
  • The principal advantages of these approaches are
    that a priori assumptions are not imposed
    regarding the shape of the dose-response
    relationship and that maximal use is made of the
    continuous nature of the dependent variable.
  • Example relation of alcohol intake to risk of
    breast cancer
  • (Fig 13-4)

18
Example - those data fit using regression
splines provide a strong sense that the
relationship is approximately linear and that a
significant increase in risk is seen even at
about 10g (1 drink) per day.
19
  • Example2 Use of smoothing techniques to
    evaluate the relationship between vitamin A
    intake and risk of neural crest congenital
    malformations (Fig 13-5)
  • - This analysis suggested the existence of a
    threshold at 10,000 IU, above which risk
    increased substantially.
  • - One possible concern is that inflection
    points may be accepted too literally, especially
    when the data are sparse (e.g., the method does
    not provide a C.I. For the apparent threshold,
    which would probably be quite wide).
  • ? this might be regarded as an over-fitting of
    the data (analogous to an extreme form of
    selecting optimal cut-points to demonstrate a
    relationship).

20
  • The degree to which the conclusions are affected
    by somewhat arbitrary choices such as the width
    of the window used for smoothing and the number
    and spacing of knots deserves further
    consideration.
  • Although smoothing methods and regression splines
    for analyses involving dietary intake may prove
    valuable, particularly in exploratory data
    analysis, their use deserves further evaluation.
  • EXAMINATION of FOODS and NUTRIENTS
  • A full evaluation of the relationship between
    diet and a disease should involve the analysis of
    data on both food and nutrient intakes.
  • - If an association w/ disease is found for a
    specific nutrient, it is important to examine and
    report whether the major foods contributing to
    this nutrient (as seen in the same dataset,
    defined in terms of either absolute contribution
    or contribution to between-person variance) are
    also related similarly to risk of disease.

21
  • One serious problem w/ analyses of specific foods
    is the large number of items on a typical
    questionnaire.
  • - Some argue that the p value used for
    statistical significance should be adjusted
    according to the number of variables examined.
  • - The general consensus in epidemiology is that
    this unduly reduces power and that individual
    associations should be evaluated on their own
    merits, and conclusions should be made in the
    light of consistency w/ other information
    internal and external to the study.
  • - When a large number of foods (or nutrients)
    are screened for associations w/o a prior
    hypothesis, the likelihood that some
    statistically significant relationships will
    occur by chance must be considered when
    interpreting the findings.
  • - This issue is complicated by the large number
    of foods because an association w/ a food is
    generally more likely to be reported if
    statistically significant, particularly if
    consistent w/ prior expectations. Reporting the
    association for each food on a questionnaire, and
    possibly for groups of foods, is impossible in
    most journals.
  • ? the literature on foods is likely to be
    highly biased, and any summary of the published
    literature cannot avoid this bias.

22
  • No simple solution exists for publication bias
  • - A partial solution may be to deposit data in
    the National Auxiliary Publications Service or on
    the internet for all foods when an analysis on a
    particular disease is published.
  • ? at least such data will be available to
    others attempting to summarize the literature.
  • - The best approach for avoiding publication
    bias on specific foods is probably to analyze
    collaboratively the primary data from all
    available studies on a topic.

23
  • The widespread use of multiple vitamins and other
    nutritional supplements adds complexity to
    dietary analyses, but can also provide important
    insight by greatly extending the range of
    observable nutrient intakes.
  • - Details on dose and duration of supplement
    use are usually obtainable w/ substantially
    greater precision than is possible for foods.
  • - When examining associations w/ foods or w/
    nutrient intakes, not including supplements, it
    will be important to conduct analyses excluding
    supplement users, because any effects of
    nutrients from foods may be swamped by the
    relatively high levels of intakes from
    supplements.
  • - Stratification by nutrient intake from foods
    may also be important when examining the effect
    of the same nutrient from supplements because
    little effect of supplementation might be
    expected when intake from foods is high.
  • - The greatest contrast in risk would usually
    be expected when long-term supplement users w/
    high intakes from diet are compared w/
    nonsupplement users w/ low intakes from diet.

24
  • The Effect of Time
  • Dietary factors may operate at various stages of
    the sequence of events (e.g., an antioxidant
    might reduce the effect of ionizing ration, which
    is an early event or alcohol may influence
    endogenous hormone metabolism, which is most
    likely to be most important later in
    carcinogenesis).
  • The effect of diet may also be cumulative so that
    risk is related to a function of both dose and
    duration of exposure.
  • Dietary factors may also have effects at specific
    periods in life far removed from the time of
    diagnosis (e.g., high growth rates before puberty
    appear to increase breast cancer risk by
    advancing the age at menarche, and effects of
    maternal diet during pregnancy on the offsprings
    risk of breast cancer have been hypothesized).
  • Our knowledge is often not sufficient to be
    confident that an effect of diet would be limited
    to only a particular period, it will usually be
    difficult to exclude an effect of a dietary
    factor until a fairly wide range of temporal
    relationships have been examined.

25
  • Ideally, a comprehensive dietary assessment would
    include a measurement of current diet and also
    diet at various times in the past.
  • - Unfortunately, a comprehensive assessment of
    even current diet is already a major burden on
    participants. Moreover, the validity of recall
    diminishes w/ time, and the reporting of past
    diet is heavily influenced by current diet, so
    that truly independent retrospective assessments
    of various periods appear impossible.
  • - In practice, in cohort studies an assessment
    of current diet (usually over the past year) is
    used, and in case-control studies a period in the
    past thought to be most plausibly relevant to the
    disease (typically 5 to 10 years ago for cancer)
    is the focus of recall.
  • For intake of vitamin and mineral supplement,
    information on duration of use can be readily
    collected and can be critical (e.g., vitamin E
    supplements in relation to CHD risk, or for
    vitamin C supplements in relation to risk of
    cataracts the associations were limited to longer
    term users).

26
  • Prospective studies provide important additional
    means of assessing temporal relationships w/
    diet.
  • - Baseline data on current diet can be examined
    in relation to disease incidence at various
    follow-up periods under the reasonable
    assumption that diet varies over time, the
    maximum relative risk should provide information
    on the true induction period.
  • - The limitation of this approach is largely
    practical as few cohorts will be sufficiently
    large to provide statistically stable estimates
    of risk during multiple time periods.
  • Prospective studies w/ replicate dietary
    assessment provide the opportunity to examine
    various intervals between dietary intake and
    disease diagnosis w/ much greater power.

27
  • THE USE of MULTIPLE DIETARY ASSESSMENTS in
    PROSPECTIVE STUDIES
  • A powerful feature of cohort studies is the
    opportunity to collect repeated dietary data over
    time. Such repeated measurement of dietary
    intake provide many possible analytic
    opportunities to reduce the effects of
    measurement error and to evaluate various
    hypothesized temporal relationships between the
    dietary factor

and the disease outcome (Table 13-2).
28
  • Measured changes in diets of individuals over
    time are a mix of true variation and measurement
    error.
  • - the comparison of persons whose intakes are
    consistently high w/ those whose intakes are
    consistently low provide a strong test of
    cumulative exposure, as well as both long and
    short latency, because it is highly likely that
    these persons were truly high or truly low over
    long durations.
  • - The major limitation of this strategy is the
    loss of power due to the exclusion of the many
    persons who changed categories and the need to
    exclude cases that occur before the repeated
    measurement if the analyses are to be truly
    prospective.
  • The use of cumulative average measurements (i.e.,
    the average of all measurements for an individual
    up to the start of each follow-up interval) takes
    advantage of all prior data and thus should
    provide a statistically more powerful test of an
    association of cumulative exposure.
  • - This approach deserves further methodologic
    development, though, to take into account the
    different degrees of measurement error and
    information provided at each interval.

29
  • Because our understanding of disease etiology is
    often inadequate to specify a temporal
    relationship w/ confidence, the use of several
    rather than just one analytic strategy to examine
    various temporal relationships will generally be
    appropriate.
  • Clear evidence that an association is strongest
    w/ a particular temporal relationship can provide
    important information on the pathogenetic process
    and possibilities for intervention.
  • - If no association is observed, the
    demonstration that this lack of relationship is
    seen when a full range of temporal relationship
    is examined provides the most compelling evidence
    that an important association has not been
    missed.
  • Example Analysis of the relationship between
    coffee consumption and risk of CHD in women
    (Table 13-3)

30
(No Transcript)
31
  • MULTIVARIATE ANALYSES
  • Multivariate methods may be particularly
    important in nutritional epidemiology because
    dietary factors tend to be intercorrelated,
    sometimes strongly so.
  • A common reason for using multivariate analysis
    in a study of diet and disease is to address the
    question of whether an observed association
    between a specific dietary factor and disease
    risk is only secondary to its correlation w/
    another, truly causal dietary factor.
  • - A standard approach is simply to include both
    variables together in the same model.
  • - As the focus will typically be on dietary
    composition rather than on absolute amounts, the
    specific nutrients should usually be expressed as
    energy-adjusted residuals or nutrient densities,
    and total energy should be included as a term
    unless it is unrelated to disease risk.
  • - Many possible alternative dietary factors
    could be considered as potential confounding
    variables, and the temptation may be to simply
    include these all simultaneously in a model.

32
  • A problem w/ including a large number of dietary
    factors simultaneously is that the remaining
    independent variation in the primary dietary
    factor may become quite small because,
    collectively, the other variables can account for
    almost all of its variation.
  • An alternative strategy is to conduct a series of
    analyses including standard non-dietary factors
    at a time. In this process, it may be possible
    to eliminate several or all alternative variables
    by showing that they have no independent
    association w/ disease and that the association
    w/ the primary variable remains.
  • There is no clear limits as to the number of
    nutrients that may be included simultaneously as
    this will depend on their intercorrelation as
    well as size of the dataset. However, because
    many dietary variables are strongly correlated w/
    many others, the maximum number is likely to be
    modest before C.I. became uninformatively wide.

33
  • Another common situation arises when one or more
    dietary factors are subcomponents of another
    (e.g., saturated, monounsaturated, and
    polyunsaturated fats are the components of total
    fat) entering all four variables simultaneously
    is impossible as they are redundant.
  • - Options (using types of fat as an example)
    are listed in Table 13-4.
  • - Model 1a a standard multivariate model and
    does address the independent effect of saturated
    fat. The term total fat no longer has the
    biologic meaning of total fat because a major
    component, saturated fat, is included separately
    its meaning then becomes mono- and
    polyunsaturated fat.

34
  • - Model 1b Total fatres E sat fatres total
    fat energy
  • the residual from the regression of
    energy-adjusted saturated fat on energy-adjusted
    saturated fat is included. This will provide the
    same coefficient for saturated fat as in model
    1a, but the full biologic meaning of total fat is
    retained ? this model describes disease risk in
    relation both to the total fat composition of the
    diet and to the type of fat.
  • - Model 1c Total fat/E sat fat/E energy
  • the term for saturated fat (as a nutrient
    density) can be interpreted as substituting
    certain percentage of energy from saturated fat
    for the same amount of other types of fat. The
    term for total fat as a nutrient density reflects
    primarily the energy density of mono- and
    polyunsaturated fats.
  • - Models 2a Total fat sat fat poly fat
    energy
  • 2b Total fatres E sat fatres total fat
    poly fatres total fat energy
  • 2c Total fat/E sat fat/E poly fat/E
    energy
  • analogous to models 1a-1c, but more
    specifically address the substitution of
    saturated fat for monounsaturated fat because
    polyunsaturated fat is included as a separate
    term. Can be used for testing the general
    question Does the type of fat add independently
    to the prediction of disease above and beyond
    total fat?

35
  • - Models 3a Sat fat mono fat energy
  • 3b Sat fatres E mono fatres E poly
    fatres E energy
  • 3c Sat fat/E mono fat/E poly fat/E
    energy
  • are not fat substitution models because the
    total fat composition of the diet is not
    constrained. These models address a somewhat
    different issue Are each of the types of fat,
    substituted for other sources of energy,
    independently associated w/ disease risk?
  • Another common example of a dietary factor w/
    nested subcomponents is alcohol intake, where the
    question frequently arises whether an observed
    association is due to a certain type of alcoholic
    beverage.
  • The interpretation of multivariate analyses
    including two or more dietary factors should
    always be tempered by knowledge that non of the
    dietary variables are measure perfectly.
    Moreover, the degree of measurement error may
    vary among different dietary factors.
  • - One dietary factor may appear to be the true
    predictor and the other a confounder only because
    the former is better measured.

36
  • EMPERICAL DIETARY SCORES
  • Two traditional methods of combining data on
    intakes of various foods have been (1) to compute
    nutrient intakes using food composition tables,
    and (2) to create food groups based on
    similarities in nutrient content.
  • - Use of global scores to describe dietary
    patterns or quality has also been suggested.
  • Factor Analysis
  • Can be used to identify two or more uncorrelated
    dietary patterns based on foods that tend to be
    used (or avoided) by the same persons.
  • A score is created for each person for each
    factor by assigning weights to their frequency of
    use of each food. Once the scores are computed
    for each person, their relation to risk of
    disease can be examined.

37
  • The role of factor analysis or other multivariate
    methods (e.g., principal components or cluster
    analysis) to create scores for dietary patterns
    in nutritional epidemiology remains unclear.
  • - In contrast to calculations of nutrient
    intakes, there is no biologic basis for these
    scores.
  • - This approach may be useful for describing
    the intercorrelation of foods and thus the
    identification of potential confounders.
  • Empirically Selected Variable Score
  • A tempting strategy for developing a prediction
    score is to examine the relation of each food (or
    nutrient) w/ risk of disease, pick the
    significant associations, and create a summary
    score comprised of these variables.
  • - The problem w/ this approach can be
    appreciated by considering that, if 100 foods are
    examined as disease predictors, by chance alone
    about 5 will be statistically significant. A
    score based on these 5 variables will be
    extremely significantly predictive of disease,
    all on the basis of chance.

38
  • One common strategy for cross-validation is to
    divide the dataset into halves, create an
    empirical prediction score in one half (training
    set) and evaluate the score in the other half
    (test or validation set).
  • More statistically efficient alternatives for
    cross-validation exist, such as the jack knife,
    which involve successively leaving out one
    observation and fitting the model w/ the
    remaining data to predict the omitted
    observation.
  • Nutrient Prediction Models Using an Independent
    Gold Standard
  • In calculating nutrient intakes from a FFQ, foods
    are weighted by their frequency of use and their
    nutrient content using a food composition
    database.
  • - Ideally, the weight would also take into
    account the validity w/ which intake of each food
    was assessed and the bioavailability of the
    nutrient from each food.
  • - This additional weighting can be accomplished
    by using an independent quantitative assessment
    of nutrient intake or a biochemical indicator of
    nutrient intake in a sample of the population.

39
  • Diet records, which are often used to assess the
    validity of a food-frequency instrument, could be
    used as an independent estimate of true nutrient
    intake to develop a prediction score from foods
    on a FFQ.
  • - To do so, the nutrient intake from the diet
    record would be used as the independent variable
    in a multiple regression analysis w/ all foods
    from the FFQ being allowed to enter in a stepwise
    multiple regression analysis. Foods that explain
    the most between-person variance in the nutrient
    intake enter first.
  • - If the validity of a food item on the
    questionnaire is low (e.g., if it was worded
    poorly), it should not contribute appreciably to
    the prediction of the nutrient.
  • - The nutrient score based on the coefficients
    from the stepwise regression could then be
    computed for each person and used in analyses
    predicting disease.
  • - Limitations of this approach (1) a large
    number of subjects are needed to provide stable
    estimates of regression coefficients, probably
    considerably larger than most validation studies
    (2) the regression coefficients will reflect in
    part the respondent characteristics which is a
    desirable feature because they may influence
    validity- but this means that they may not be
    generalizable to other population.

40
  • A biochemical indicator could also serve as the
    standard for developing a prediction score from
    foods on a FFQ.
  • - Such an approach would take into account
    factors such as the bioavailability of the
    nutrient in each food and the validity of each
    food item.
  • Example Giovannucci et al. (1995) use this
    approach to develop a prediction score for
    lycopene intake using plasma lycopene levels as a
    standard. They found that cooked tomato products
    predicted plasma lycopene levels better than did
    raw tomato products so this was incorporated
    into the empirical prediction equation. When
    this empirical score was used to examine the
    relation between lycopene intake and risk of
    prostate cancer in the total cohort, a stronger
    association was observed than using the standard
    calculation of intake.
  • - The size of substudy is a critical issue
    because it determines the precision of
    coefficients, but the desirable size is not
    clear.

41
  • SUBGROUP ANALYSES AND INTERACTIONS
  • The effects of most dietary factors are likely to
    vary among subgroups, depending on the intake of
    other dietary factors and characteristics of the
    subjects. This issue is generally known as
    effect modification or interaction.
  • - A fundamental issue is whether the
    interaction should be assessed on an absolute
    scale (whether the rate difference is constant
    across categories of the third variable) or
    relative scale (whether the rate ratio or RR is
    constant across these categories).
  • One general concern has been that an extensive
    search for interactions and associations within
    subgroups of other variables creates a high
    likelihood of statistically significant
    associations arising by chance.
  • Although there is no simple solution for
    evaluating subgroups w/ confidence, this should
    not deter investigators from examining them.
  • - Some subgroup analyses are so important a
    priori that they must be examined, and failure to
    find evidence of effect modification may even
    cast doubt on an association.

42
  • When two dietary factors act by a similar
    mechanism, it may be difficult to observe an
    association by examining only one of these
    variables at a time an examination of joint
    exposures may be most powerful.
  • Purely exploratory analysis of associations among
    subgroups of known risk factors is also a good
    practice even when little a priori reason exists
    because new knowledge may be gained
  • - Such explorations w/o strong prior
    expectations should be clearly described as such,
    and the reader should be skeptical of any
    findings.
  • - Some would suggest not reporting p values in
    such circumstances.
  • Ultimately, the only fail-safe protection against
    spurious conclusions based on subgroups is
    demonstration of reproducibility, perhaps over
    time within the same study and, most importantly,
    in other independent datasets.

43
  • ERROR CORRECTION
  • Methods to correct observed associations for
    errors in measurement of exposure variables
    require data on either reproducibility or
    validity this information is now being collected
    as part of many large studies.
  • The most common use of error correction
    procedures is the de-attenuation of correlation
    coefficients, probably because this only requires
    replicates of one or both measurements being
    compared.
  • - This procedure has become quite routine in
    validation studies, where a small number of days
    of diet records or 24-hour recall data are
    collected as an independent representation of
    true long-term intake.
  • In studies of disease incidence, the calculation
    of RRs and C.I. Adjusted for measurement error
    has usually been done as a secondary analysis.
  • - These analyses address 3 interrelated
    objectives (1) to obtain the best estimate of
    the RR after accounting for attenuation due to
    imperfect measurement of the primary exposure
    (2) to obtain the best estimate of the true C.I.,
    which is particularly important when little
    association is seen, because the central question
    becomes whether the adjusted C.I. Are
    sufficiently narrow to be informative (3) to
    account for residual confounding by imperfectly
    measured covariates.

44
  • ROLE of META-ANALYSIS and POOLED ANALYSIS in
    NUTRITIONAL EPIDEMIOLOGY
  • The place of meta-analysis in epidemiology has
    been controversial.
  • - Some have argued that the combining of data
    from randomized trials is appropriate because
    statistical power is increased w/o concern for
    validity since the comparison groups have been
    randomized, but that in observational
    epidemiology the issue of validity is determined
    large by confounding and bias rather than
    limitations of statistical power ? the great
    statistical precision obtained by the combining
    of data may be misleading because the findings
    may still be invalid.
  • - The combining of all available epidemiologic
    data can be of treat value, particularly when a
    body of evidence becomes substantial and
    difficult to assimilate at once and if the
    potential for bias is not ignored.

45
  • An alternative to the combining of published
    epidemiologic data is to pool and analyze the
    primary data from all available studies on a
    topic that meet specified criteria.
  • - Because of the complexity of dietary data,
    this approach has great advantages in nutritional
    epidemiology and can address many limitations of
    individual studies.
  • Any attempt to combine published data on diet and
    disease is immediately confronted w/ the problem
    that various investigators have usually used
    different approaches for presenting their
    findings that make them difficult to combine
  • - Sometimes RRs are given for arbitrary
    quantiles and other times for specified
    increments using continuous variables.
  • - Adjustments for total energy intake are often
    done using a variety of methods or not at all,
    and the inclusion of other covariates typically
    differs among studies.
  • A major advantage of pooling primary data is that
    all data can be analyzed simultaneously using
    common approaches and definitions of exposure.

46
  • In a pooled analysis, the range of dietary
    factors that can be addressed can be considerably
    greater than in the separate analyses because any
    one study will have few subjects in the extremes
    of intake and, sometimes, because the studies
    will vary in distribution of dietary factors.
  • Example in the pooled analysis of
    prospective studies of diet and breast cancer, it
    was possible to evaluate associations from lt15
    to gt45 of energy from fat, which was a far
    greater range than possible in the individual
    studies.
  • - Although few pooled analyses have been
    conducted at the level of specific foods, this
    can provide a less biased assessment of
    relationships based on the total body of
    evidence.
  • Evaluation of the consistency of findings in
    subgroup analyses across studies will reduce the
    likelihood of overinterpreting findings that may
    have occurred by chance.

47
  • The data quality may differ among studies due to
    differences in the questionnaires used, study
    designs, or populations.
  • - This can be addressed if each study includes
    a validation/ calibration substudy so that
    corrections can be made for the study-specific
    measurement error, and the studies w/ more valid
    assessments of diet can be given more weight.
  • - The advantages of pooled analyses in
    nutritional epidemiology are so substantial that
    this should become common practice for important
    issues.
Write a Comment
User Comments (0)
About PowerShow.com