Title: Relating models to data: A review
1Relating models to data A review
- P.D. ONeill
- University of Nottingham
2Caveats
- Scope is strictly limited
- Review with a view to future challenges
3Outline
- Why relate models to data?
- How to relate models to data
- Present and future challenges
4Outline
- Why relate models to data?
- How to relate models to data
- Present and future challenges
51. Why relate models to data?
- 1. Scientific hypothesis testing
- e.g. Can within-host heterogeneity of
susceptibility to HIV explain decreasing
prevalence? - e.g. Did control measures alone control SARS in
Hong Kong?
61. Why relate models to data?
- 2. Estimation
- e.g. What is R0?
- e.g. What is the efficacy of a vaccine?
71. Why relate models to data?
- 3. What-if scenarios
- e.g. What would have happened if transport
restrictions were in place sooner in the UK foot
and mouth outbreak? - e.g. How much would school closure prevent
spread of influenza?
81. Why relate models to data?
- 4. Real-time analyses
- e.g. Has the epidemic finished yet?
- e.g. Are control measures effective?
91. Why relate models to data?
- 5. Calibration/parameterisation
- e.g. What range of parameter values are
sensible for simulation studies? -
10Outline
- Why relate models to data?
- How to relate models to data
- Present and future challenges
112. How to relate models to data
- 2.1 Fitting deterministic models
- Options include
- (i) Estimation from the literature
- (ii) Least-squares / minimise metric
- (iii) Can be Bayesian (Elderd, Dukic and Dwyer
2006) -
122. How to relate models to data
- 2.2 Fitting stochastic models
- Available methods depend heavily on the model and
the data.
132. How to relate models to data
- 2.2 Fitting stochastic models
- (i) Explicit likelihood
- e.g. Longini-Koopman model for household data
(Longini and Koopman, 1982) -
142. How to relate models to data
P (Avoid infection from housemate) p
SEIR model within household
P (Avoid infection from outside) q
Given data on final outcome in (independent)
households, can formulate likelihood L (p,q)
152. How to relate models to data
- 2.2 Fitting stochastic models
- (i) Explicit likelihood (continued)
- Related household models examples
- Bayesian analysis (ONeill at al., 2000)
- Multi-type models (van Boven et al., 2007)
162. How to relate models to data
- 2.2 Fitting stochastic models
- (i) Explicit likelihood (continued)
- Methods include
- Max likelihood (e.g. Longini and Koopman, 1982)
- EM algorithm (e.g. Becker, 1997)
- MCMC (e.g. ONeill et al., 2000)
- Rejection sampling (e.g. Clancy and ONeill,
2007)
172. How to relate models to data
- 2.2 Fitting stochastic models
- (ii) No explicit likelihood
- Can arise due to model complexity and/or
insufficient data
182. How to relate models to data
Ever-infected
Two-level mixing model
Never-infected
Sample
Unseen
192. How to relate models to data
Individual-based transmission models involve
unseen infection times
202. How to relate models to data
Even detailed data from studies generally only
give bounds on unseen infection times e.g.
infection occurs between last ve test and
first ve test
212. How to relate models to data
- 2.2 Fitting stochastic models
- (ii) No explicit likelihood
- Solutions include
-
- Use a simpler approximating model
- e.g. use pseudolikelihood, e.g. Ball, Mollison
and Scalia-Tomba, 1997 -
-
222. How to relate models to data
Ever-infected
Two-level mixing model
Never-infected
Explicit interactions between households
232. How to relate models to data
Ever-infected
Two-level mixing model -gt independent households
model
Never-infected
In a large population, households are
approximately independent
242. How to relate models to data
- 2.2 Fitting stochastic models
- (ii) No explicit likelihood
- Solutions include
-
- Use a simpler approximating model
- e.g. discrete-time model instead of a continuous
time model (e.g. Lekone and Finkenstädt, 2006) -
-
252. How to relate models to data
- 2.2 Fitting stochastic models
- (ii) No explicit likelihood
- Solutions include
-
- Direct approach e.g. Martingale methods
(Becker, 1989) -
-
262. How to relate models to data
- 2.2 Fitting stochastic models
- (ii) No explicit likelihood
- Solutions include
-
- Data augmentation add in missing data or extra
model parameters to formulate a likelihood -
-
272. How to relate models to data
- 2.2 Fitting stochastic models
- (ii) No explicit likelihood Data augmentation
(continued) -
- Common example
- - model describes individual-to-individual
transmission - - observe times of case ascertainment, test
results, etc, but not times of infection/exposure - - augment data with missing infection/exposure
times -
282. How to relate models to data
Infectivity starts
Infectivity ends
TI
TE
Exposure time
ve test
Not observed
Observed data
-ve test
Höhle et al. (2005)
292. How to relate models to data
- 2.2 Fitting stochastic models
- (ii) No explicit likelihood Data augmentation
(continued) - Data-augmentation methods include
- MCMC (e.g. Gibson and Renshaw, 1998 ONeill and
Roberts, 1999 Auranen et al., 2000) - EM algorithm (e.g. Becker, 1997)
-
302. How to relate models to data
- 2.2 Fitting stochastic models
- (ii) No explicit likelihood Data augmentation
(continued) - Data-augmentation methods can also be used in
less obvious settings - e.g. final size data for complex models
-
-
312. How to relate models to data
Ever-infected
Two-level mixing model
? Data
Never-infected
Augment parameter space using links to describe
potential infections
Demiris and ONeill, 2005
32Outline
- Why relate models to data?
- How to relate models to data
- Present and future challenges
333. Present future challenges
- 3.1 Large populations/complex models
- Current methods often struggle with large-scale
problems. - e.g
- Large population,
- Many missing data,
- Many hard-to-estimate parameters/covariates
-
343. Present future challenges
- 3.1 Large populations/complex models
- e.g. UK foot Mouth outbreak 2001
- Keeling et al. (2001) stochastic discrete-time
model, parameterised via likelihood estimation
and tuning/ simulation. - Attempting to fit this kind of model using
standard Bayesian/MCMC methods does not work
well. -
353. Present future challenges
Large data set and many missing data can cause
problems for standard (and also non-standard) MCMC
363. Present future challenges
- 3.1 Large populations/complex models
- e.g. Measles data
- Cauchemez and Ferguson (2008) discuss the
problems that arise when fitting a standard SIR
model to large-scale temporal aggregated data in
a large population using standard methods.
373. Present future challenges
- 3.1 Large populations/complex models
- Problems of this kind are usually tackled via
approximations (e.g. of the model itself). - Challenge Can generic non-approximate methods be
found?
383. Present future challenges
- 3.2 Data augmentation
- Comment this technique is surprisingly powerful
and is (probably) under-developed. -
393. Present future challenges
- 3.2 Data augmentation
- e.g. Cauchemez and Ferguson (2008) use a novel
MCMC data-augmentation scheme using a diffusion
model to approximate an SIR epidemic model. -
403. Present future challenges
- 3.2 Data augmentation
- e.g. For final size data, instead of imputing a
graph describing infection pathways, could
instead impute generations of infection (joint
work with Simon White). - This can lead to much faster MCMC algorithms.
-
413. Present future challenges
Ever-infected
Two-level mixing model
Never-infected
Imputing edges in graph
423. Present future challenges
Ever-infected
Two-level mixing model
Never-infected
2
Infection chain 1, 3, 1, 2, 1
1
2
3
4
2
5
4
433. Present future challenges
- 3.2 Data augmentation
- e.g. Augmented data can also (sometimes) be used
to bound quantities of interest. - Clancy and ONeill (2008) show how to obtain
stochastic bounds on R0 and other quantities by
considering minimal and maximal
configurations of unobserved infection times in
an SIR model.
443. Present future challenges
x
x
x
x
x
Observed removal times
x
Imputed infection times
453. Present future challenges
x
x
x
x
x
Observed removal times
Soon as possible
x
Imputed infection times
463. Present future challenges
x
x
x
x
x
Observed removal times
Late as possible
x
Imputed infection times
Can show that Soon as possible maximises R0
but that minimal value is not necessarily given
by Late as possible use Linear Programming to
find actual solution.
General idea also applicable to final outcome data
473. Present future challenges
- 3.3 Model fit and model choice
- Various methods are used in the literature to
assess model fit, e.g. - Simulation-based methods use of Bayesian
predictive distribution standard methods where
applicable Bayesian p-values
483. Present future challenges
- 3.3 Model fit and model choice
- Likewise for model choice methods include AIC,
RJMCMC - Challenge Better understanding of pros and cons
of such methods
49References
- B. D. Elderd, V. M. Dukic, and G. Dwyer (2006)
Uncertainty in predictions of disease spread and
public health responses to bioterrorism and
emerging diseases. PNAS 103, 15693-15697 - I.M. Longini, Jr and J.S. Koopman (1982)
Household and community transmission parameters
from final distributions of infections in
households. Biometrics 38, 115-126. - P.D. O'Neill, D. J. Balding, N. G. Becker, M.
Eerola and D. Mollison (2000) Analyses of
infectious disease data from household outbreaks
by Markov Chain Monte Carlo methods. Applied
Statistics 49, 517-542. - M. Van Boven, M. Koopmans, M. D. R. van Beest
Holle, A. Meijer, D. Klinkenberg, C. A. Donnelly
and H.A.P. Heesterbeek (2007) Detecting emerging
transmissibility of Avian Influenza virus in
human households. PLoS Computational Biology 3,
1394-1402. - D. Clancy and P.D. O'Neill (2007) Exact Bayesian
inference and model selection for stochastic
models of epidemics among a community of
households. Scandinavian Journal of Statistics
34, 259-274. - N.G. Becker (1997) Uses of the EM algorithm in
the analysis of data on HIV/AIDS and other
infectious diseases. Statistical Methods in
Medical Research 6, 24-37. - F.G. Ball, D. Mollison and G-P. Scalia-Tomba
(1997) Epidemic models with two levels of mixing.
Annals of Applied Probability 7, 46-89. - M. Höhle, E. Jørgensen. and P.D. O'Neill (2005)
Inference in disease transmission experiments by
using stochastic epidemic models. Applied
Statistics 54, 349-366.
50References
- N. G. Becker (1989) Analysis of Infectious
Disease Data. Chapman and Hall, London. - G. Gibson and E. Renshaw (1998). Estimating
parameters in stochastic compartmental models
using Markov chain methods. IMA Journal of
Mathematics Applied in Medicine and Biology 15,
19-40. - P.D. ONeill and G.O. Roberts (1999) Bayesian
inference for partially observed stochastic
epidemics. Journal of the Royal Statistical
Society Series A 162, 121-129. -
- K. Auranen, E. Arjas, T. Leino and A. K. Takala
(2000) Transmission of pneumococcal carriage in
families a latent Markov process model for
binary longitudinal data. Journal of the American
Statistical Association 95, 1044-1053. - P.E. Lekone and B.F. Finkenstädt (2006)
Statistical Inference in a stochastic epidemic
SEIR model with control intervention Ebola as a
case study. Biometrics 62, 1170-1177. - M.J. Keeling, M.E.J. Woolhouse, D.J. Shaw, L.
Matthews, M. Chase-Topping, D.T. Haydon, S.J.
Cornell, J. Kappey, J. Wilesmith, B.T. Grenfell
(2001). Dynamics of the 2001 UK Foot and Mouth
Epidemic Stochastic Dispersal in a Heterogeneous
Landscape. Science 294, 813-817. - S. Cauchemez and N.M. Ferguson (2008).
Likelihood-based estimation of continuous-time
epidemic models from time-series data
application to measles transmission in London.
Journal of the Royal Society Interface 5,
885-897. - D. Clancy and P.D. O'Neill (2008) Bayesian
estimation of the basic reproduction number in
stochastic epidemic models. Bayesian Analysis, in
press.