Title: Rate versus Risk
1Rate versus Risk
- Two basic measures of the occurrence of new
events (disease) - Cumulative incidenceRiskProbability
- Incidence rateRateevents per time units
- Last week we discussed the concept of cumulative
incidence - Commonly calculated by the Kaplan-Meier method
when different follow-up times exist - Incidence rate of disease is somewhat less
intuitive but is the more fundamental measure
2Main Points to be Covered
- Cumulative incidence and person-time incidence
rate different but related - Hazard
- Calculating a person-time incidence rate
- Uses of person-time incidence rates
- STATA commands for rates
- Assumptions of survival and person-time analyses
3The Three Elements in Measures of Disease
Incidence
- E an event a disease diagnosis or death
- N number of at-risk persons in the population
under study - T time period during which the events are
observed
4Two Measures of Incidence
- The proportion of individuals who experience the
event in a defined time period (E/N during some
time T) cumulative incidence - The number of events divided by the amount of
person-time observed (E/NT) incidence rate
5Person-Time Incidence Rates
- The numerator is the same as incidence based on
proportion of persons events (E) - The denominator is the sum of the follow-up times
for each individual - The resulting ratio of E/NT is not a
proportion--may be greater than 1 - Value depends on unit of time used
6Incidence rate value depends on the time units
used
- Incidence rate of 8 cases per 100 person-years
- 0.67 cases per 100 person-months
- 0.15 cases per 100 person-weeks
7Assumption of Person-Time Incidence Estimation
- A time units of follow-up on B persons is the
same as B time units on A persons - Observing 20 deaths in 200 persons followed for
50 years gives the same incidence rate as 20
deaths in 10,000 persons followed 1 year - The rate is constant for the time period during
which it is calculated - Rates calculated over long time periods may be
less meaningful
8Understanding the Difference between a Rate and
Cumulative Incidence
- Rate can be thought of as how likely an event is
to happen at any moment in time - Cumulative incidence is the result of applying
that rate to a defined population for a specified
period of time - A rate is calculated by using data from a time
period, but the rate is assumed constant during
that period (i.e., at any moment in time during
the period the rate is the same)
9Illustration of Rate versus Cumulative Incidence
- The mortality rate in the U.S. population in 2001
was 855 per 100,000 person-years (or 0.855 per
100 person-years) - If everyone alive at the beginning of the period
were followed for 5 years, the cumulative
incidence of death (if the rate held constant)
would be 4.2 at 5 years at 10 years it would be
8.2.
10Relationship between Incidence Rate and
Cumulative Incidence
- A constant rate produces an exponential
cumulative incidence (or survival) distribution - If know the instantaneous incidence rate, can
derive the cumulative incidence/survival function
or vice-versa -
- where F(t) cumulative incidence and
- 1 - F(t) cumulative survival
- e 2.71828 ? rate t time units
11Constant Rate
Increasing Rate
12Effect of high and low constant incidence rates
on cumulative incidence
13Hazard
- Hazard is an instantaneous incidence rate
- h(t) P(event in interval between t and t?t
alive at t) - ?t
- Hazard function
- Shape of the relationship between time and
hazard. - Constant rate, or constantly increasing rate,
shown in previous slides are particular examples - Can take on any shape
14Hazard function for mortality in general
population
Years
15Note on Person-Time Rates
- Person-time concept may seem unfamiliar because
often described as annual rate or annual rate
per 100,000 persons or per 100,000 persons
(i.e., person-time denominator is not made
explicit) - Example The incidence of Pediatric
Cardiomyopathy in two regions of the United
States (NEJM, 2003) - 467 cases of cardiomyopathy in registry of 38
centers (New England, Southwest) 1996 - 1999 - denominator population estimates1990 census
with an in- and out-migration algorithm ages 1 -
18 - overall annual incidence of 1.13 per 100,000
children - Better to make person-time explicit incidence
among children was 1.13 per 100,000 person-years
16How to Calculate a Person-Time Rate Obtaining
the Denominator
- Method 1 If have exact entry, censoring, and
event times for each person, can sum person-time
for each person for denominator - Method 2 If no individual data but have the
time interval and average population size, can
take their product as denominator - Some datasets may only have the average
population size at risk
17c
18Rate 6/9.583 0.626 per person-year 62.6 per
100 person-years
19Method 2 Using average number of persons at risk
during time interval
10 persons at baseline 1 person at end of 2
years (6 deaths 3 censored before 2 years 9
losses) Formula Average number of persons at
risk N baseline N end / 2 11 / 2
5.5 Rate 6/5.5 over 2 years 0.545 per
person-year or 54.5 per 100 person-years
20Person-time incidence based on grouped vs.
individual data
- Szklo and Nieto use incidence rate when based on
group data (average population at risk) and
incidence density when based on individual data - This terminology distinction is not followed by
most - Average population method assumes uniform
occurrence of events and of censoring during the
interval (like life table)
21Waiting Time Property of Incidence Rates
- Waiting time to an event is reciprocal of the
incidence rate (1/rate) - Eg, if rate 300 per 100 person-years, reciprocal
is 1 - (300/100 person-years)
- (1/3) person-year
- Average waiting time between events is 0.33
person-year 4 person-months
22Why Use Incidence Rates?
- To calculate incidence from population-based
disease registries
23(1) Calculating a rate from population-based
registry of diagnoses
- Research question What is the incidence rate for
first diagnoses of breast cancer in Marin County
and how does it compare with rates from other
counties? - Nearly all new breast cancer diagnoses are
reported to the SEER cancer registry - How to obtain a denominator for a rate?
24Large Population Person-Time Rates
Since the production of stable rates for cancers
at most individual sites requires a population
of at least one million subjects, the logistic
and financial problems of attempting to maintain
a constant surveillance system of everyone
in the population are usually prohibitive.
Breslow and Day, Statistical Methods in Cancer
Research Solution Do surveillance of all the
cancer diagnoses and estimate the population
denominator to get person-time at risk. To get
an incidence rate person-time denominator by the
group method requires only an estimate of the
average population size during the year (the
population at mid-year).
25Average Population (Group data) rates versus
individual data rates
- If losses are perfectly uniform, total
person-time calculation for the denominator (and
thus the rate) is the same whether based on
average population size or individual follow-up - For large populations the rate will be nearly
identical calculated by either method
26Potential Weakness of Using Census Data
- Calculating rates from census population data is
very useful but caution is required as a full
census is only done every 10 years - Interim estimates of population change are made
by the Census but over 10 years denominators may
become inaccurate
27Invasive Breast Cancer Rates for Marin County
versus Other California, 1995-2000
Rates per 100,000 person-years Excluding 5 Bay
Area Counties
28Census Denominators for Incidence
Rates are Estimates
The estimates of breast cancer incidence (number
of new cancers per year) most recently reported
for Marin and other areas of the country were
based on 1990 census information. Data from
Census 2000 have enabled researchers to
recalculate rates for Marin. Preliminary results
show that revised incidence rates for Marin
County based on the 2000 census are substantially
lower than the rates calculated using 1990 census
information. The discrepancy between using the
1990 and 2000 census data is due to projected
population growth differing considerably from
actual population growth.
29Why Use Incidence Rates?
- To calculate incidence from population-based
disease registries - To compare disease incidence in a cohort
(individual-level data) with rate from the
general population OR to compare incidences
between 2 or more general populations
30(2) Comparing a rate from a cohort to the rate in
the general population
- A cohort study of petroleum refinery workers
followed up subjects for mortality for 36 years
and found 765 deaths. - Research question Was the cohort mortality
incidence high, low, or just average for those
calendar years? - How would you calculate the mortality incidence
in the cohort?
31Example of Using Person-Time Rates for Cohort
Analysis
- Cohort of petrochemical workers
- 6,588 white male employees of Texas plant
- Mortality determined from 1941-1977
- 137,745 person-years of follow-up time
- 765 deaths
- Overall death rate 765 / 137,745 person-years
- 5.6 per 1000 person-years
- Question Is this a high death rate?
-
- Austin SG, et al., J Occupat Med, 1983
32Cohort of petrochemical workers
- Could calculate KM estimate of cumulative
incidence (for 36 years of follow-up), but what
is the comparison group? - Using the person-time rate, the observed rate can
be compared to the rate that would be expected if
the person-time rate from a reference population
(eg, U.S. population) is applied to the cohort
33Standardized Mortality Ratio
- If U.S. death rates for age-sex-race-calendar
period groups applied to the cohort, 924 deaths
were expected in the cohort versus the 765
observed. - Ratio of 765 observed/924 expected 0.83. This
is called a Standardized Mortality Ratio (SMR).
34Obtaining an expected rate for comparison
35Cause Specific SMRs
Austin SG, et al., J Occupat Med, 1983
36Example of using both cumulative incidence and
incidence rates in the same analysis for
different purposes
End stage renal disease Cumulative incidence
(survival) within cohorts defined by age at
diagnosis Ratios of mortality incidence rates
in renal disease children compared with national
child mortality rates
McDonald et al., NEJM 2004
37Another example of SMR Is mortality higher after
a fracture?
Bluic et al. JAMA 2009
38(2b) Comparing hip fracture incidence in
different populations
Per 100,000 person-years
e Standardized to 1990 non-Hispanic white US
population
39Why Use Incidence Rates?
- To calculate incidence from population-based
disease registries - To compare disease incidence in a cohort with a
rate from the general population OR to compare
incidence in 2 or more populations - To compare incidence from a time-varying exposure
in persons while exposed and unexposed
40(3) To compare incidence from a time-varying
exposure in persons while exposed and unexposed
- Research question In a Medicaid database is
there an association between use of non-aspirin
non-steroidal anti-inflammatory drugs (NSAID) and
coronary artery disease (CAD)? - How would you study the relationship between
NSAID use and CAD?
41Calculating stratified person-time incidence
rates in cohorts
- For persons followed in a cohort some potential
risk factors may be fixed but some may be
variable - gender is fixed
- taking medications or getting regular exercise
are behaviors that can change over time - Adding up person-time in an exposure category to
get a denominator of time at risk is a way to
deal with risk factors that change over time
42Analysis of changing exposure and disease
incidence
- Tennessee Medicaid data base, 1987-1998 are
NSAIDs associated with CAD risk? - Same person could both use and not use NSAIDs at
different times over the 11 years - Cant do cumulative incidence because would have
to define groups by baseline characteristics
without accounting for changes in subsequent
behavior
Ray, Lancet, 2002
43Analysis of changing exposure with person-time
rates
- Person-time totaled for using and not using
NSAIDs MI or CAD death outcome - 181,441 periods of new NSAIDS use in 128,002
individuals 181,441 periods of non-use in
134,642 individuals (matched by age, sex, and
calendar date) - A person can contribute to the denominator both
for use and non-use but only after a 365 day
wash out period between use and non-use
44Analysis of changing exposure with person-time
rates
- Rate ratio 1.01
- Concluded no evidence that NSAIDS reduced risk of
CHD events
Ray, Lancet, 2002
45Calculating Rates in STATA
Declare data set survival data . stset timevar,
fail(failvar) .strate gives person-years
rate .strate groupvar gives rates within
groups Example Biliary cirrhosis time to death
data .use biliary cirrhosis data, clear .stset
time, fail(d) .strate D Y
Rate Lower Upper 96
747.04 0.1285 0.1052
0.1570 .strate treat Treat D Y
Rate Lower Upper Placebo
49 355.0 0.138 0.104
0.183 Active 47 392.0 0.120
0.090 0.160
46Hazard function in Stata
K-M survival curve for same data
Incidence rate (from previous slide) 0.13
deaths per person-year
10 yr cum incidence 0.2375
47Immediate Commands in STATA
STATA has an option to use it like a calculator
for various computations without using a data
set. Called immediate commands. Example, to
calculate the confidence interval around a
person-time rate . cii person-time units
events, poisson E.g. 6 events occur in 10
person-years of follow-up . cii 10 6,
poisson 95 CI 0.220 1.306
48(No Transcript)
49Incidence rate
Cumulative incidence
50Survival changing over calendar time
51Summary Points
- Person-time incidence rate (or density)
- E/NT
- Related to cumulative incidence
- Not a proportion
- Person-time incidence rate can be calculated with
individual or average population data - Allows incidence estimates in large populations
that are not completely enumerated - Allows comparison with population reference rates
from other data sources - Allows accumulation of time at risk for different
exposure strata