Epidemiology for mathematicians - PowerPoint PPT Presentation

About This Presentation
Title:

Epidemiology for mathematicians

Description:

( how many of each sex, race, etc.) And combinations of ... City A = 58/25,000/1yr = 232/100,000/yr. City B = 35/7,000/2 yrs = 17.5/7000/yr = 250/100,000/yr ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 84
Provided by: davido2
Category:

less

Transcript and Presenter's Notes

Title: Epidemiology for mathematicians


1
Epidemiology for mathematiciansLooking at
wildflowers from horseback
  • David Ozonoff, MD, MPH
  • Boston University
  • School of Public Health

DIMACS Working Group on Order Theory in
Epidemiology March 7, 2005
2
Tutorial overview and goals
  • The landscape of epidemiology
  • What is epidemiology?
  • Who is an epidemiologist?
  • Who employs them?
  • Kinds of epidemiology
  • How epidemiologists think
  • What kinds of things do they work with?
  • What kinds of things are they interested in?

3
Tutorial overview and goals, contd
  • Some language and concepts of epidemiology
  • Language of occurrence measures
  • Study designs
  • Causal inference

4
I. Landscape, perspective, language
  • What is epidemiology?
  • Who is an epidemiologist?
  • Who employs epidemiologists?
  • Flavors of epidemiology Descriptive, analytic
  • Epi and mathematics models and patterns
  • Some examples of epidemiological thinking

5
Some definitions of epidemiology
  • Study of health and illness in populations
    (Kleinbaum, Kupper and Morgenstern)
  • Study of the distribution and determinants of
    disease frequency in human populations (MacMahon
    and Pugh Susser)
  • Study of the occurrence of illness (Rothman I)
  • Theoretical epidemiology discipline of how to
    study the occurrence of phenomena of interest in
    the health field (Miettinnen) NB not illness
    centered

6
Some more (cynical) definitions
  • Rothman II Unfortunately, there seem to be more
    definitions of epidemiology than there are
    epidemiologists. Some have defined it in terms of
    its methods. While the methods of epidemiology
    may be distinctive, it is more typical to define
    a branch of science in terms of its subject
    matter rather than its tools.If the subject of
    epidemiologic inquiry is taken to be the
    occurrence of disease and other health outcomes,
    it is reasonable to infer that the ultimate goal
    of most epidemiologic research is the elaboration
    of causes that can explain patterns of disease
    occurrence.
  • Schneiderman Epidemiology is the practice of
    criticizing other epidemiologists

7
Consensus notions
  • Deals with populations, not individuals
  • Deals with (frequency of) occurrences of health
    related events
  • Has a (major but not exclusive) concern with
    causes (determinants) of disease patterns in
    populations

8
Remarks
  • Public health perspective
  • Flavors Analytic versus descriptive
    epidemiology
  • Causal inference assumptions
  • Disease occurrence is not random.
  • Systematic investigation of different populations
    can identify causal and preventive factors
  • Observational versus experimental sciences
  • Chronic disease and infectious disease
    epidemiology
  • What is theoretical epidemiology?

9
Some examples
  • Do environmental exposures increase risk of
    disease?
  • John Snow cholera epidemic of 1854
  • Contaminated water and leukemia in Woburn, MA
  • Are vitamin supplements beneficial?
  • Does Vitamin E lower risk of Alzheimers Disease
  • Folic acid and risk of neural tube (birth)
    defects
  • Do behavioral interventions reduce risk
    behaviors?
  • Communitybased studies to change diets
  • Peer interventions to reduce HIV-risk behaviors

10
Who is an epidemiologist?
  • Relatively new in medical science
  • Precursors John Graunt (17th century), John Snow
    (19th century)
  • Rise as a profession Wade Hampton Frost at JHU
  • 1950s and 1960s CDC and consolidation as
    professional discipline, still mainly physicians
  • 1960s Infectious disease -gt Chronic disease epi
  • Professonalization
  • Doctoral degrees in epidemiology
  • Now most epidemiologists are not docs

11
Who employs epidemiologists?
  • Public sector
  • State and federal health officials
  • Communicable and chronic disease programs
  • Infectious disease, outbreak investigations
  • Cancer registries, environmental studies, program
    areas in substance abuse, health services, etc.,
    etc.
  • Research at CDC, NIH, academia, etc.
  • Private sector
  • Industry (chemical companies, drug companies)
  • Consultants
  • Academia, NGOs

12
Flavors of epidemiology
  • Descriptive epidemiology
  • Analytic epidemiology (finding risk factors,
    a.k.a. causes)

13
Descriptive epidemiology
  • Describe patterns of disease by person, place,
    time
  • Good for monitoring publics health (e.g.,
    surveillance, vital events)
  • Used for administrative purposes (e.g., planning)
  • Good for generating hypotheses

14
NB Disease patterns and the Science of patterns
15
Description
  • Two kinds
  • Tabulations or summaries only (no inference or
    estimation)
  • Inference
  • Prediction to other populations
    (generalization surveys and polling)
  • True value in face of noise
  • May also assume data produced by underlying
    population model and try to describe it
  • Parametric particular functional form assumed
  • Parameter value that indexes family functions,
    e.g., mean and std deviation of Normal
    distribution
  • Non-parametric data-driven estimate of
    underlying density or distribution

16
A word about models and patterns (our usage)
  • Models are high level, global descriptions of
    all or most of dataset
  • Descriptive or inferential component
  • Examples
  • Regression models, mixture models, Markov models
  • Patterns are local features of data
  • Perhaps only a few people or a few variables
  • Also descriptive or inferential
  • Descriptive look for people with unusual
    features
  • Inferential Predict which people have unusual
    features
  • Examples Association rules, mode or gap in
    density function, outliers, inflection point in
    regression, symptom clusters, geographic hot
    spots, predict disease from symptoms

17
Models and patterns, contd
  • Epidemiologists use both but more interested in
    patterns, i.e., more interested in structure
    that is local than structure that is global
  • George Box All models are wrong but some models
    are useful describes epi viewpoint
  • But epidemiologists tend to think of patterns as
    real, even if misleading

18
Warning word model differs by context but is
usually some kind of metaphor
  • Metaphor a figure of speech literally denoting
    one kind of thing but used to represent or reason
    about another kind of thing
  • Examples fashion model, model citizen (represent
    an ideal) scale model animal model
    mathematical model model of an axiomatic system
    regression model

19
Question What do we learn from the following
examples?
Describing populations by person, place and time
illustrating how epidemiologists think
20
Person (age, sex, race) Death rates per 105 US
population from coronary disease by age and sex,
1981
Age White Men White Women
25-34 9 4
35-44 60 16
45-54 265 71
55-64 708 243
65-74 1670 769
75-84 3752 2359
85 8596 7215
21
Place
  • Where are the rates of disease the highest and
    lowest?
  • Malignant Melanoma of Skin

22
Place
23
A Variation on Place Migrant StudiesMortality
rates (per 100,000) due to stomach cancer
Japanese in Japan 58.4
Japanese Immigrants to California 29.9
Sons of Japanese Immigrants 11.7
Native Californians (Caucasians) 8.0
24
TimeDoes frequency of disease differ now from in
the past?
25
  • What is a Population?
  • How an epidemiologist would put it
  • Group of people with a common characteristics
    like age, race, sex, geographic location,
    occupation, etc.
  • Two types of populations, based on whether
    membership is permanent or transient
  • Fixed population or cohort membership is
    permanent and defined by an event
  • Ex. Atomic bomb survivors, Persons born in
    1980
  • Dynamic population membership is transient and
    defined by being in or out of a "state.
  • Ex. Members of HMO Blue, residents of the
    City of Boston

26
First step, summary description
  • Tabulate data by selected features of person,
    place, time
  • What are characteristics of population members?
    (how many of each sex, race, etc.) And
    combinations of these features (How many white
    women? Employed? Etc.)

27
Constructing contingency table from raw data
  • raw data consists of listing of each subject
    and his or her attributes

28
One-way tables
  • One dimensional Contingency Table (CT) is just a
    frequency table, i.e., a table that gives number
    of subjects with each attribute

29
Two-way tables
  • Most contingency tables are (at least) two-way,
    i.e., they cross-classify two attributes

30
Or in more familiar form
Sex by handedness and age
But this is only part of the possible two way
tables as it does not represent handedness versus
age, for example
31
What is a Population?How a mathematician might
put it
  • A population is a triple, (G, M, I)
  • Two sets, G and M G is a set of people or
    subjects, M is a set of features the subjects
    might have
  • A relation I, I ? G ? M
  • Interpretation r (g, m) ? I means that subject
    g ? G has attribute m ? M

32
Contingency tables (cross-tabs)
  • Mainstay of data preparation, inspection and
    analysis
  • Requires study design based operations
  • Sampling ? set of n subjects in set G
  • Variable selection (classification scheme) ? set
    of m variables in set M
  • E.g., age, sex, disease status (as indicator
    variables)
  • Measurement ? binary relation I ? G ? M
  • E.g., ordered pair (case 2, femaleyes) is
    typical member of I
  • We call the triple (G, M, I) a data structure for
    the contingency table (also called a formal
    context in FCA literature)
  • Simple formulation allows use of rich
    mathematical theory
  • Much more about this from Alex Pogel

33
Quantification Disease frequency
  • Goal will be to see if occurrence of disease
    differs in populations with different
    characteristics or experiences (note comparison
    is at heart of this)
  • Quantify disease occurrence in a population at
    certain point or period of time
  • Population (counting, absolute scale)
  • How big?
  • Composition?
  • Occurrence (counting, absolute scale)
  • Existing cases? New cases?
  • Time
  • Calendar time? (NB interval scale, preserved
    under pos. lin. xform)
  • Duration of time (NB ratio scale, preserved
    under similarity xform)
  • More about this in Fred Robertss tutorial

34
  • Ex. Hypothetical Frequency of
  • AIDS in Two Cities
  • new cases time period population
  • City A 58 1985 25,000
  • City B 35 1985-86 7,000
  •  
  • Annual "rate" of AIDS
  • City A 58/25,000/1yr 232/100,000/yr
  • City B 35/7,000/2 yrs 17.5/7000/yr
  • 250/100,000/yr
  •  Make it easy to compare rates (i.e., make them
    commensurable) by using same population unit
    (say, per 100,000 people) and time period (say, 1
    year)
  • NB Commensurability is property of underlying
    relational system used in measurement (treated in
    Roberts tutorial)

35
Three kinds of quantitative measures of frequency
of occurrence
Used to relate number cases of disease, size of
population, time
  • Proportion numerator is subset of denominator,
    often expressed as a percentage
  • Ratio division of one number by another, numbers
    don't have to be related
  • Rate time (sometimes space) is intrinsic part of
    denominator, term is often misused (e.g.,
    birthrate)
  • Need to specify if measure represents events or
    people

36
(Point) Prevalence (P) Quantifies number of
existing cases of disease in a population at a
point in time
  • P Number of existing cases of disease (at a
    given point in time)/ total population
  • Ex. City A has 7000 people with arthritis on Jan
    1st, 2002
  • Population of City A 70,000
  • Prevalence of Arthritis on Jan 1st .10 or 10

Prevalence is a proportion
37
  • Incidence - quantifies number of
  • new cases of disease that
  • (b) develop in a population at risk
  • (c) during a specified time period
  • Three key ideas
  • New disease events, or for diseases that can
    occur more than once, usually first occurrence of
    disease
  • Population at risk (candidate population) - can't
    have disease already, should have relevant organs
  • Enough time must pass for a person to move from
    health to disease

38
Two Types of Incidence Measures
  • Cumulative Incidence
  • (Attack Rate) (Abbreviated Cum Inc. CI)
  • Incidence Rate
  • (Incidence Density) (Abbreviated I, IR, ID)

39
  • Incidence rate (I, IR) new cases
    of disease
  • Total person-time of observation
  • Also called incidence density (ID)
  •  

40
Accrual of Person-Time
  • Jan Jan Jan
  • 1981 1982
  • -----------------------x
  • -------------------------x
  • --------------------------------------------

1.1 Person-Year (PY) 1.2 PY 2.2 PY 4.5 PY
Subject 1 Subject 2 Subject 3
X outcome of interest, incident rate 2/4.5 PY
41
Some Ways to Accrue 100PY
  • 100 people followed 1 year each 100 py
  • 10 people followed 10 years each 100 py
  • 50 people followed 1 year plus 25 people followed
    2 years 100 py
  • Time unit for person-time year, month or day
  • Person-time person-year, person-month,
    person-day

42
Ex. (Cohort) study of risk of breast cancer
among women with hyperthyroidism
  • Followed 1,762 women ---gt 30,324 py
  • Average of 17 years of follow-up per woman
  • Ascertained 61 cases of breast cancer
  • Incidence rate 61/30,324 py .00201/y
  • 201/100,000 py (.00201 x 100,000
    p/100,000 p)

43
Dimensions
  Prevalence people
people no dimension   Cumulative
incidence people
people no dimension   Incidence
rate people people-time dimension
is time 1
44
Types of (instantaneous) rates
Relative rate (person-time or incidence rate)
Absolute rate (used in infectious disease epi and
health services)
Also where units do not involve time, such as
accidents per passenger mile or cases per square
area
45
Relationship between prevalence and incidence
  • P IR x D
  • Prevalence depends on incidence rate and duration
    of disease (duration lasts from onset of disease
    to its termination)
  • If incidence is low but duration is long -
    prevalence is relatively high
  • If incidence is high but duration is short -
    prevalence is relatively low
  • This is an example of Littles equation in
    queuing theory time-avg number of units in the
    system arrival rule x avg delay time/unit
  • This equation is true if ...

46
Conditions for equation to be true
  • Steady state
  • IR constant
  • Distribution of durations constant
  • Prevalence of disease is low (less than 10)
  • In queuing theory terms strictly stationary
    process in steady state conditions

47
Figuring duration from prevalence and incidence
  • Lung cancer incidence rate 45.9/100,000 py
  • Prevalence of lung cancer 23/100,000
  • D P 23/100,000 p 0.5 years
  • IR 45.9/100,000 py
  • Conclusion Individuals with lung cancer survive
    6 months from diagnosis to death

48
Uses of Prevalence and Incidence Measures
  • Prevalence administration, planning
  • Incidence etiologic research (problems with
    prevalence since it combines IR and D), planning

49
  • Common measures of disease
  • frequency for public health
  • Crude death (mortality) rate
  • Total number of deaths from
    all causes
  • 1,000 people
    For one year
  • (also cause-specific, age-specific,
    race-specific death rate)
  • Live-birth rate
  • total number of live births
    For one year
  • 1,000 people
    (sometimes women of childbearing age)
  •  
  • Infant mortality rate
  • deaths of infants under 1 year of age
    For one year
  • 1,000 live-births

50
Frequency measures used in infectious disease
epidemiology
  • Attack rate
  • cases of disease that develop during defined
    period
  • in pop. at risk at start of period
  • (usually used for infectious disease outbreaks)
  • Case fatality rate
  • of deaths for a defined period of
    time
  • cases of disease
  • Survival rate
  • living cases for a defined period of
    time
  • cases of disease

51
Tutorial part 2 Exposure - Disease Relationship
  • Analytic epidemiology

52
Reprise Epidemiology is a science within public
health
  • This means that it adopts a population
    perspective
  • As a science, it is also quantitative
  • As a science, it is also interested in
    explanation and prediction, not just describing

53
Questions asked by communities
  • Exposure driven questions
  • What will happen to me, my family, my community?
  • Outcome driven questions
  • Why me, why my child, why us?
  • Mixed
  • Are we sicker than our neighbors?

54
The usual notion of causation John Stuart Mills
Method of Difference
  • A causes B if, all else being held constant, a
    change in A is accompanied by a subsequent change
    in B.
  • This of course does not mean that nothing else
    can produce a change in B.
  • The formal method to detect such an occurrence is
    the Experiment, whereby all things are held
    constant except A and B, A is varied, and B
    observed

55
Exptl vs. Observational Science
  • Epidemiology is an observational science
  • We do not control the independent variable (or
    most other variables)
  • What is the implication of this for the status of
    epidemiology as a science?
  • What does it mean about epidemiologys ability to
    prove causation?

56
Sources of information
  • Case studies
  • Experimental studies
  • Observational studies

Once results are observed, it remains to explain
or interpret the observation, whether the result
is a difference or a lack of a difference in the
compared entities.
57
Types of observational study designs
  • Descriptive
  • Case study and case-series
  • No comparison Person, place and time
  • Cross-sectional comparison (Are we sicker than
    our neighbors?)
  • ecological (comparing communities/environments
    not individual level)
  • Notice how descriptive and analytic shade into
    each other (as per examples we did earlier)
  • Cohort (Whats going to happen to me?)
  • Analog of the laboratory experiment
  • Case-control (Why me?)

58
Central idea compare frequencies of occurrence
in two groups
  • Example Summarize relationship between exposure
    and disease by comparing two measures of disease
    frequency
  • Overall rate of disease in an exposed group says
    nothing about whether exposure is a risk factor
    for (causes) a disease
  • This can be evaluated by comparing disease
    incidence in an exposed group to another group
    that is not exposed, (a comparison group)
  • Comparison or contrast is the essence of
    epidemiology

59
 Two Main Options for Comparing disease
frequencies
  • 1. Calculate ratio of two measures of disease
    frequency ( a measure in exposed group and a
    measure in unexposed comparison group)
  • 2. Calculate difference between two measures of
    disease frequency (a measure in exposed group and
    a measure in unexposed comparison group)

60
At the heart of an epidemiological study ...
  • Lies a comparison
  • Between 2 rates, ratios, proportions
  • Is the difference/lack of difference due to
  • Bias?
  • Chance?
  • Real effect?

61
Determinants of the comparison
  • Compared measures differ or they dont (? is
    linearly ordered)
  • Either way, the comparison may be affected by
  • Chance (sample variation)
  • Bias
  • Real effect or lack of effect
  • To interpret the comparison and evaluate the last
    factor, we need to account for effects of the
    first two

62
Role of statistics
  • Evaluates role that chance might play in the
    absence of any other factor
  • Also used for summary purposes or to express a
    model mathematically
  • Not the main preoccupation of epidemiologists,
    however
  • Bias is main preoccupation of epidemiologists

63
Evaluating the role of bias
  • Epidemiology is observational discipline, so
    uncontrolled variables abound
  • Most of training is in recognizing and accounting
    for sources of bias, often extremely subtle
  • Less emphasis on role of chance, often handed
    over to biostatisticians
  • Extent to which content area (real effect)
    taken into account varies with investigator and
    who collaborators are

64
I. Definition of Bias
  • Bias is a systematic error that results in
    an incorrect (invalid) estimate of the measure of
    association
  • A. Can create spurious association when there
    really is none (bias away from the null)
  • B. Can mask an association when there really is
    one (bias towards the null)
  • Bias is primarily introduced by the investigator
    or study participants

65
I. Definition of Bias (cont)
  • D. Bias does not mean that the investigator is
  • prejudiced or not objective
  • E. Bias can arise in all study types
    experimental, cohort,
  • case-control
  • F. Bias occurs in the design and conduct of a
    study. It
  • cannot be fixed in the analysis phase.
  • G. Two main types of bias are selection and
    information
  • bias, but there are many other types of bias
  • H. We will consider only selection and
    information bias for purposes of illustration of
    epidemiologic practice

66
II. Selection Bias
  • A. Results from procedures used to select
    subjects into a study that lead to a result
    different from what would have been obtained from
    the entire population targeted for study
  • B. Most likely to occur in case-control or
    retrospective cohort because exposure and outcome
    have occurred at time of study selection

67
Selection Bias in a Case-Control Study
  • A. Occurs when controls or cases are more (or
    less) likely to be included in study if they have
    been exposed -- that is, inclusion in study is
    not independent of exposure
  • B. Result Relationship between exposure and
    disease observed among study participants is
    different from relationship between exposure and
    disease in individuals who would have been
    eligible but were not included -- OR from a
    study that suffers from selection bias will
    incorrectly represent the relationship between
    exposure and disease in the overall study
    population

68
Selection Bias Case-Control Study
  • Question Do PAP smears prevent cervical cancer?
    Cases diagnosed at a city hospital. Controls
    randomly sampled from household in same city by
    canvassing the neighborhood on foot. Here is the
    true relationship

OR (100)(100) / (150)(150) .44 There was a
54 reduced risk of cervical cancer among women
who had PAP smears as compared to women who did
not. (40 of cases had PAP smears versus 60 of
controls)
69
Selection Bias Case-Control Study (cont)
  • Recall Cases from the hospital and controls come
    from the neighborhood around the hospital.
  • Now for the bias Only controls who were at home
    at the time the researchers came around to
    recruit for the study were actually included in
    the study. Women at home were more likely not to
    work and were less likely to have regular
    checkups and PAP smears. Therefore, being
    included in the study as a control is not
    independent of the exposure. The resulting data
    are as follows

70
Selection Bias (cont)
OR (100)(150) / (150)(100) 1.0 There is no
association between Pap smears and the risk of
cervical cancer. Here, 40 of cases and 40 of
controls had PAP smears.
71
Selection Bias Case-Control Study (cont)
  • Ramifications of using women who were at home
    during the day as controls
  • These women were not representative of the whole
    study population that produced the cases. They
    did not accurately represent the distribution of
    exposure in the study population that produced
    the cases, and so they gave a biased estimate of
    the association.

72
When interpreting study results, ask yourself
these questions
  • Given conditions of the study, could bias have
    occurred?
  • Is bias actually present?
  • Are consequences of the bias large enough to
    distort the measure of association in an
    important way?
  • Which direction is the distortion? is it towards
    the null or away from the null?

73
Imputation of Causality
  • What are the roles of
  • Bias The critique checklist
  • Chance Statistical significance
  • Real effect
  • The Hill viewpoints
  • Not necessary criteria (not even criteria)
  • Not a checklist
  • The way its really done...

74
Marks of causality
  • Strength of association
  • Biologically plausible
  • Biological gradient (dose-response)
  • Appropriate temporal relationship
  • Specificity
  • Consistency

75
The Fundamental Question (according to Hill)
  • "Clearly none of these nine viewpoints can bring
    indisputable evidence for or against a
    cause-and-effect hypothesis and equally none can
    be required as a sine qua non. What they can do,
    with greater or less strength, is to help us to
    answer the fundamental question--is there any
    other way of explaining the set of facts before
    us, is there any other answer equally, or more,
    likely than cause and effect?

76
How its really done
  • Assemble the evidence from the literature. What
    are the pieces of the jigsaw?
  • How do you decide?
  • Where do they fit?
  • How do you decide?

77
Interpretation
  • Evaluate the evidence (a study) for internal
    validity
  • Evaluate the evidence for external validity
  • Bottom line
  • What roles are played by bias, chance, real
    effect?

78
Assemble the jigsaw pieces into a picture
  • The picture is your version of causality
  • Your picture may disagree with other scientists
  • Disagreement among scientists is the rule, not he
    exception

79
Mathematics in epidemiology
  • Traditional
  • Evaluate role of chance (statistical hypothesis
    testing estimation)
  • Descriptive (compact summary or generative model)
  • Infectious disease epidemiology dynamics

80
Comparing chronic and infectious disease
epidemiology
?
?
?, ?
S
P
?
?
?
S
I
R
?1
?2
81
?
?
?, ?
S
P
?
?
?
S
I
R
?1
?2
?birth rate or migration in-rate ?incidence
rate or infectivity rate ?, ? mortality and
recovery rates with ?1case fatality rate,
?2background mortality rate Prevalence rate
P/(SP)
82
Comparing chronic and infectious epi (contd)
  • Chronic
  • Usually concentrate on ? (incidence) because
    interested in etiology
  • Have to account for fact that ? is function of
    calendar time and age, exposure (?metric), sex,
    race, SES, occupation, co-morbid conditions,
    latency
  • But not usually population size or density,
    number of other cancer cases, etc.
  • Infectious
  • Interest in ? usually limited to its value as a
    parameter we know the etiology
  • Interested in dynamics over time and space,
    existence of thresholds or periods, effect of
    parameters and initial conditions like size
    initial population, infectivity, mode of contact

Difference is one of emphasis and interest, not
concepts
83
Some new uses for mathematics in epidemiology
  • Formalization and theoretical tools
  • Pattern and rule detection (data mining)
  • Descriptive modeling
  • Prediction from data
  • Classification
  • Taxonomy
  • Data organization and retrieval from large
    databases
  • Patient confidentiality/coding/cryptography
  • Multi-scale inference
  • Network construction/applications, etc.
Write a Comment
User Comments (0)
About PowerShow.com