Relating models to data: A review - PowerPoint PPT Presentation

1 / 50

About This Presentation

Title:

Relating models to data: A review

Description:

Scope is strictly limited. Review with a view to future challenges ... Direct approach e.g. Martingale methods (Becker, 1989) 2. How to relate models to data ... – PowerPoint PPT presentation

Number of Views:46

Avg rating:3.0/5.0

Slides: 51

Provided by: maH75

Category:

more less

Transcript and Presenter's Notes

Title: Relating models to data: A review

1
Relating models to data A review

P.D. ONeill
University of Nottingham

2
Caveats

Scope is strictly limited
Review with a view to future challenges

3
Outline

Why relate models to data?
How to relate models to data
Present and future challenges

4
Outline

Why relate models to data?
How to relate models to data
Present and future challenges

5
1. Why relate models to data?

1. Scientific hypothesis testing
e.g. Can within-host heterogeneity of
susceptibility to HIV explain decreasing
prevalence?
e.g. Did control measures alone control SARS in
Hong Kong?

6
1. Why relate models to data?

2. Estimation
e.g. What is R0?
e.g. What is the efficacy of a vaccine?

7
1. Why relate models to data?

3. What-if scenarios
e.g. What would have happened if transport
restrictions were in place sooner in the UK foot
and mouth outbreak?
e.g. How much would school closure prevent
spread of influenza?

8
1. Why relate models to data?

4. Real-time analyses
e.g. Has the epidemic finished yet?
e.g. Are control measures effective?

9
1. Why relate models to data?

5. Calibration/parameterisation
e.g. What range of parameter values are
sensible for simulation studies?

10
Outline

Why relate models to data?
How to relate models to data
Present and future challenges

11
2. How to relate models to data

2.1 Fitting deterministic models
Options include
(i) Estimation from the literature
(ii) Least-squares / minimise metric
(iii) Can be Bayesian (Elderd, Dukic and Dwyer
2006)

12
2. How to relate models to data

2.2 Fitting stochastic models
Available methods depend heavily on the model and
the data.

13
2. How to relate models to data

2.2 Fitting stochastic models
(i) Explicit likelihood
e.g. Longini-Koopman model for household data
(Longini and Koopman, 1982)

14
2. How to relate models to data
P (Avoid infection from housemate) p

SEIR model within household
P (Avoid infection from outside) q
Given data on final outcome in (independent)
households, can formulate likelihood L (p,q)
15
2. How to relate models to data

2.2 Fitting stochastic models
(i) Explicit likelihood (continued)
Related household models examples
Bayesian analysis (ONeill at al., 2000)
Multi-type models (van Boven et al., 2007)

16
2. How to relate models to data

2.2 Fitting stochastic models
(i) Explicit likelihood (continued)
Methods include
Max likelihood (e.g. Longini and Koopman, 1982)
EM algorithm (e.g. Becker, 1997)
MCMC (e.g. ONeill et al., 2000)
Rejection sampling (e.g. Clancy and ONeill,
2007)

17
2. How to relate models to data

2.2 Fitting stochastic models
(ii) No explicit likelihood
Can arise due to model complexity and/or
insufficient data

18
2. How to relate models to data
Ever-infected
Two-level mixing model
Never-infected
Sample
Unseen
19
2. How to relate models to data
Individual-based transmission models involve
unseen infection times
20
2. How to relate models to data
Even detailed data from studies generally only
give bounds on unseen infection times e.g.
infection occurs between last ve test and
first ve test
21
2. How to relate models to data

2.2 Fitting stochastic models
(ii) No explicit likelihood
Solutions include
Use a simpler approximating model
e.g. use pseudolikelihood, e.g. Ball, Mollison
and Scalia-Tomba, 1997

22
2. How to relate models to data
Ever-infected
Two-level mixing model
Never-infected
Explicit interactions between households
23
2. How to relate models to data
Ever-infected
Two-level mixing model -gt independent households
model
Never-infected
In a large population, households are
approximately independent
24
2. How to relate models to data

2.2 Fitting stochastic models
(ii) No explicit likelihood
Solutions include
Use a simpler approximating model
e.g. discrete-time model instead of a continuous
time model (e.g. Lekone and Finkenstädt, 2006)

25
2. How to relate models to data

2.2 Fitting stochastic models
(ii) No explicit likelihood
Solutions include
Direct approach e.g. Martingale methods
(Becker, 1989)

26
2. How to relate models to data

2.2 Fitting stochastic models
(ii) No explicit likelihood
Solutions include
Data augmentation add in missing data or extra
model parameters to formulate a likelihood

27
2. How to relate models to data

2.2 Fitting stochastic models
(ii) No explicit likelihood Data augmentation
(continued)
Common example
- model describes individual-to-individual
transmission
- observe times of case ascertainment, test
results, etc, but not times of infection/exposure
- augment data with missing infection/exposure
times

28
2. How to relate models to data
Infectivity starts
Infectivity ends

TI
TE
Exposure time
ve test
Not observed
Observed data
-ve test
Höhle et al. (2005)
29
2. How to relate models to data

2.2 Fitting stochastic models
(ii) No explicit likelihood Data augmentation
(continued)
Data-augmentation methods include
MCMC (e.g. Gibson and Renshaw, 1998 ONeill and
Roberts, 1999 Auranen et al., 2000)
EM algorithm (e.g. Becker, 1997)

30
2. How to relate models to data

2.2 Fitting stochastic models
(ii) No explicit likelihood Data augmentation
(continued)
Data-augmentation methods can also be used in
less obvious settings
e.g. final size data for complex models

31
2. How to relate models to data
Ever-infected
Two-level mixing model
? Data
Never-infected
Augment parameter space using links to describe
potential infections
Demiris and ONeill, 2005
32
Outline

Why relate models to data?
How to relate models to data
Present and future challenges

33
3. Present future challenges

3.1 Large populations/complex models
Current methods often struggle with large-scale
problems.
e.g
Large population,
Many missing data,
Many hard-to-estimate parameters/covariates

34
3. Present future challenges

3.1 Large populations/complex models
e.g. UK foot Mouth outbreak 2001
Keeling et al. (2001) stochastic discrete-time
model, parameterised via likelihood estimation
and tuning/ simulation.
Attempting to fit this kind of model using
standard Bayesian/MCMC methods does not work
well.

35
3. Present future challenges
Large data set and many missing data can cause
problems for standard (and also non-standard) MCMC
36
3. Present future challenges

3.1 Large populations/complex models
e.g. Measles data
Cauchemez and Ferguson (2008) discuss the
problems that arise when fitting a standard SIR
model to large-scale temporal aggregated data in
a large population using standard methods.

37
3. Present future challenges

3.1 Large populations/complex models
Problems of this kind are usually tackled via
approximations (e.g. of the model itself).
Challenge Can generic non-approximate methods be
found?

38
3. Present future challenges

3.2 Data augmentation
Comment this technique is surprisingly powerful
and is (probably) under-developed.

39
3. Present future challenges

3.2 Data augmentation
e.g. Cauchemez and Ferguson (2008) use a novel
MCMC data-augmentation scheme using a diffusion
model to approximate an SIR epidemic model.

40
3. Present future challenges

3.2 Data augmentation
e.g. For final size data, instead of imputing a
graph describing infection pathways, could
instead impute generations of infection (joint
work with Simon White).
This can lead to much faster MCMC algorithms.

41
3. Present future challenges
Ever-infected
Two-level mixing model
Never-infected
Imputing edges in graph
42
3. Present future challenges
Ever-infected
Two-level mixing model
Never-infected
2
Infection chain 1, 3, 1, 2, 1
1
2
3
4
2
5
4
43
3. Present future challenges

3.2 Data augmentation
e.g. Augmented data can also (sometimes) be used
to bound quantities of interest.
Clancy and ONeill (2008) show how to obtain
stochastic bounds on R0 and other quantities by
considering minimal and maximal
configurations of unobserved infection times in
an SIR model.

44
3. Present future challenges

3.2 Data augmentation

x
x
x
x
x
Observed removal times
x
Imputed infection times
45
3. Present future challenges

3.2 Data augmentation

x
x
x
x
x
Observed removal times
Soon as possible
x
Imputed infection times
46
3. Present future challenges

3.2 Data augmentation

x
x
x
x
x
Observed removal times
Late as possible
x
Imputed infection times
Can show that Soon as possible maximises R0
but that minimal value is not necessarily given
by Late as possible use Linear Programming to
find actual solution.
General idea also applicable to final outcome data
47
3. Present future challenges

3.3 Model fit and model choice
Various methods are used in the literature to
assess model fit, e.g.
Simulation-based methods use of Bayesian
predictive distribution standard methods where
applicable Bayesian p-values

48
3. Present future challenges

3.3 Model fit and model choice
Likewise for model choice methods include AIC,
RJMCMC
Challenge Better understanding of pros and cons
of such methods

49
References

B. D. Elderd, V. M. Dukic, and G. Dwyer (2006)
Uncertainty in predictions of disease spread and
public health responses to bioterrorism and
emerging diseases. PNAS 103, 15693-15697
I.M. Longini, Jr and J.S. Koopman (1982)
Household and community transmission parameters
from final distributions of infections in
households. Biometrics 38, 115-126.
P.D. O'Neill, D. J. Balding, N. G. Becker, M.
Eerola and D. Mollison (2000) Analyses of
infectious disease data from household outbreaks
by Markov Chain Monte Carlo methods. Applied
Statistics 49, 517-542.
M. Van Boven, M. Koopmans, M. D. R. van Beest
Holle, A. Meijer, D. Klinkenberg, C. A. Donnelly
and H.A.P. Heesterbeek (2007) Detecting emerging
transmissibility of Avian Influenza virus in
human households. PLoS Computational Biology 3,
1394-1402.
D. Clancy and P.D. O'Neill (2007) Exact Bayesian
inference and model selection for stochastic
models of epidemics among a community of
households. Scandinavian Journal of Statistics
34, 259-274.
N.G. Becker (1997) Uses of the EM algorithm in
the analysis of data on HIV/AIDS and other
infectious diseases. Statistical Methods in
Medical Research 6, 24-37.
F.G. Ball, D. Mollison and G-P. Scalia-Tomba
(1997) Epidemic models with two levels of mixing.
Annals of Applied Probability 7, 46-89.
M. Höhle, E. Jørgensen. and P.D. O'Neill (2005)
Inference in disease transmission experiments by
using stochastic epidemic models. Applied
Statistics 54, 349-366.

50
References

N. G. Becker (1989) Analysis of Infectious
Disease Data. Chapman and Hall, London.
G. Gibson and E. Renshaw (1998). Estimating
parameters in stochastic compartmental models
using Markov chain methods. IMA Journal of
Mathematics Applied in Medicine and Biology 15,
19-40.
P.D. ONeill and G.O. Roberts (1999) Bayesian
inference for partially observed stochastic
epidemics. Journal of the Royal Statistical
Society Series A 162, 121-129.
K. Auranen, E. Arjas, T. Leino and A. K. Takala
(2000) Transmission of pneumococcal carriage in
families a latent Markov process model for
binary longitudinal data. Journal of the American
Statistical Association 95, 1044-1053.
P.E. Lekone and B.F. Finkenstädt (2006)
Statistical Inference in a stochastic epidemic
SEIR model with control intervention Ebola as a
case study. Biometrics 62, 1170-1177.
M.J. Keeling, M.E.J. Woolhouse, D.J. Shaw, L.
Matthews, M. Chase-Topping, D.T. Haydon, S.J.
Cornell, J. Kappey, J. Wilesmith, B.T. Grenfell
(2001). Dynamics of the 2001 UK Foot and Mouth
Epidemic Stochastic Dispersal in a Heterogeneous
Landscape. Science 294, 813-817.
S. Cauchemez and N.M. Ferguson (2008).
Likelihood-based estimation of continuous-time
epidemic models from time-series data
application to measles transmission in London.
Journal of the Royal Society Interface 5,
885-897.
D. Clancy and P.D. O'Neill (2008) Bayesian
estimation of the basic reproduction number in
stochastic epidemic models. Bayesian Analysis, in
press.