Relating models to data: A review - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

Relating models to data: A review

Description:

Scope is strictly limited. Review with a view to future challenges ... Direct approach e.g. Martingale methods (Becker, 1989) 2. How to relate models to data ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 51
Provided by: maH75
Category:

less

Transcript and Presenter's Notes

Title: Relating models to data: A review


1
Relating models to data A review
  • P.D. ONeill
  • University of Nottingham

2
Caveats
  • Scope is strictly limited
  • Review with a view to future challenges

3
Outline
  • Why relate models to data?
  • How to relate models to data
  • Present and future challenges

4
Outline
  • Why relate models to data?
  • How to relate models to data
  • Present and future challenges

5
1. Why relate models to data?
  • 1. Scientific hypothesis testing
  • e.g. Can within-host heterogeneity of
    susceptibility to HIV explain decreasing
    prevalence?
  • e.g. Did control measures alone control SARS in
    Hong Kong?

6
1. Why relate models to data?
  • 2. Estimation
  • e.g. What is R0?
  • e.g. What is the efficacy of a vaccine?

7
1. Why relate models to data?
  • 3. What-if scenarios
  • e.g. What would have happened if transport
    restrictions were in place sooner in the UK foot
    and mouth outbreak?
  • e.g. How much would school closure prevent
    spread of influenza?

8
1. Why relate models to data?
  • 4. Real-time analyses
  • e.g. Has the epidemic finished yet?
  • e.g. Are control measures effective?

9
1. Why relate models to data?
  • 5. Calibration/parameterisation
  • e.g. What range of parameter values are
    sensible for simulation studies?

10
Outline
  • Why relate models to data?
  • How to relate models to data
  • Present and future challenges

11
2. How to relate models to data
  • 2.1 Fitting deterministic models
  • Options include
  • (i) Estimation from the literature
  • (ii) Least-squares / minimise metric
  • (iii) Can be Bayesian (Elderd, Dukic and Dwyer
    2006)

12
2. How to relate models to data
  • 2.2 Fitting stochastic models
  • Available methods depend heavily on the model and
    the data.

13
2. How to relate models to data
  • 2.2 Fitting stochastic models
  • (i) Explicit likelihood
  • e.g. Longini-Koopman model for household data
    (Longini and Koopman, 1982)

14
2. How to relate models to data
P (Avoid infection from housemate) p

SEIR model within household
P (Avoid infection from outside) q
Given data on final outcome in (independent)
households, can formulate likelihood L (p,q)
15
2. How to relate models to data
  • 2.2 Fitting stochastic models
  • (i) Explicit likelihood (continued)
  • Related household models examples
  • Bayesian analysis (ONeill at al., 2000)
  • Multi-type models (van Boven et al., 2007)

16
2. How to relate models to data
  • 2.2 Fitting stochastic models
  • (i) Explicit likelihood (continued)
  • Methods include
  • Max likelihood (e.g. Longini and Koopman, 1982)
  • EM algorithm (e.g. Becker, 1997)
  • MCMC (e.g. ONeill et al., 2000)
  • Rejection sampling (e.g. Clancy and ONeill,
    2007)

17
2. How to relate models to data
  • 2.2 Fitting stochastic models
  • (ii) No explicit likelihood
  • Can arise due to model complexity and/or
    insufficient data

18
2. How to relate models to data
Ever-infected
Two-level mixing model
Never-infected
Sample
Unseen
19
2. How to relate models to data
Individual-based transmission models involve
unseen infection times
20
2. How to relate models to data
Even detailed data from studies generally only
give bounds on unseen infection times e.g.
infection occurs between last ve test and
first ve test
21
2. How to relate models to data
  • 2.2 Fitting stochastic models
  • (ii) No explicit likelihood
  • Solutions include
  • Use a simpler approximating model
  • e.g. use pseudolikelihood, e.g. Ball, Mollison
    and Scalia-Tomba, 1997

22
2. How to relate models to data
Ever-infected
Two-level mixing model
Never-infected
Explicit interactions between households
23
2. How to relate models to data
Ever-infected
Two-level mixing model -gt independent households
model
Never-infected
In a large population, households are
approximately independent
24
2. How to relate models to data
  • 2.2 Fitting stochastic models
  • (ii) No explicit likelihood
  • Solutions include
  • Use a simpler approximating model
  • e.g. discrete-time model instead of a continuous
    time model (e.g. Lekone and Finkenstädt, 2006)

25
2. How to relate models to data
  • 2.2 Fitting stochastic models
  • (ii) No explicit likelihood
  • Solutions include
  • Direct approach e.g. Martingale methods
    (Becker, 1989)

26
2. How to relate models to data
  • 2.2 Fitting stochastic models
  • (ii) No explicit likelihood
  • Solutions include
  • Data augmentation add in missing data or extra
    model parameters to formulate a likelihood

27
2. How to relate models to data
  • 2.2 Fitting stochastic models
  • (ii) No explicit likelihood Data augmentation
    (continued)
  • Common example
  • - model describes individual-to-individual
    transmission
  • - observe times of case ascertainment, test
    results, etc, but not times of infection/exposure
  • - augment data with missing infection/exposure
    times

28
2. How to relate models to data
Infectivity starts
Infectivity ends

TI
TE
Exposure time
ve test
Not observed
Observed data
-ve test
Höhle et al. (2005)
29
2. How to relate models to data
  • 2.2 Fitting stochastic models
  • (ii) No explicit likelihood Data augmentation
    (continued)
  • Data-augmentation methods include
  • MCMC (e.g. Gibson and Renshaw, 1998 ONeill and
    Roberts, 1999 Auranen et al., 2000)
  • EM algorithm (e.g. Becker, 1997)

30
2. How to relate models to data
  • 2.2 Fitting stochastic models
  • (ii) No explicit likelihood Data augmentation
    (continued)
  • Data-augmentation methods can also be used in
    less obvious settings
  • e.g. final size data for complex models

31
2. How to relate models to data
Ever-infected
Two-level mixing model
? Data
Never-infected
Augment parameter space using links to describe
potential infections
Demiris and ONeill, 2005
32
Outline
  • Why relate models to data?
  • How to relate models to data
  • Present and future challenges

33
3. Present future challenges
  • 3.1 Large populations/complex models
  • Current methods often struggle with large-scale
    problems.
  • e.g
  • Large population,
  • Many missing data,
  • Many hard-to-estimate parameters/covariates

34
3. Present future challenges
  • 3.1 Large populations/complex models
  • e.g. UK foot Mouth outbreak 2001
  • Keeling et al. (2001) stochastic discrete-time
    model, parameterised via likelihood estimation
    and tuning/ simulation.
  • Attempting to fit this kind of model using
    standard Bayesian/MCMC methods does not work
    well.

35
3. Present future challenges
Large data set and many missing data can cause
problems for standard (and also non-standard) MCMC
36
3. Present future challenges
  • 3.1 Large populations/complex models
  • e.g. Measles data
  • Cauchemez and Ferguson (2008) discuss the
    problems that arise when fitting a standard SIR
    model to large-scale temporal aggregated data in
    a large population using standard methods.

37
3. Present future challenges
  • 3.1 Large populations/complex models
  • Problems of this kind are usually tackled via
    approximations (e.g. of the model itself).
  • Challenge Can generic non-approximate methods be
    found?

38
3. Present future challenges
  • 3.2 Data augmentation
  • Comment this technique is surprisingly powerful
    and is (probably) under-developed.

39
3. Present future challenges
  • 3.2 Data augmentation
  • e.g. Cauchemez and Ferguson (2008) use a novel
    MCMC data-augmentation scheme using a diffusion
    model to approximate an SIR epidemic model.

40
3. Present future challenges
  • 3.2 Data augmentation
  • e.g. For final size data, instead of imputing a
    graph describing infection pathways, could
    instead impute generations of infection (joint
    work with Simon White).
  • This can lead to much faster MCMC algorithms.

41
3. Present future challenges
Ever-infected
Two-level mixing model
Never-infected
Imputing edges in graph
42
3. Present future challenges
Ever-infected
Two-level mixing model
Never-infected
2
Infection chain 1, 3, 1, 2, 1
1
2
3
4
2
5
4
43
3. Present future challenges
  • 3.2 Data augmentation
  • e.g. Augmented data can also (sometimes) be used
    to bound quantities of interest.
  • Clancy and ONeill (2008) show how to obtain
    stochastic bounds on R0 and other quantities by
    considering minimal and maximal
    configurations of unobserved infection times in
    an SIR model.

44
3. Present future challenges
  • 3.2 Data augmentation

x
x
x
x
x
Observed removal times
x
Imputed infection times
45
3. Present future challenges
  • 3.2 Data augmentation

x
x
x
x
x
Observed removal times
Soon as possible
x
Imputed infection times
46
3. Present future challenges
  • 3.2 Data augmentation

x
x
x
x
x
Observed removal times
Late as possible
x
Imputed infection times
Can show that Soon as possible maximises R0
but that minimal value is not necessarily given
by Late as possible use Linear Programming to
find actual solution.
General idea also applicable to final outcome data
47
3. Present future challenges
  • 3.3 Model fit and model choice
  • Various methods are used in the literature to
    assess model fit, e.g.
  • Simulation-based methods use of Bayesian
    predictive distribution standard methods where
    applicable Bayesian p-values

48
3. Present future challenges
  • 3.3 Model fit and model choice
  • Likewise for model choice methods include AIC,
    RJMCMC
  • Challenge Better understanding of pros and cons
    of such methods

49
References
  • B. D. Elderd, V. M. Dukic, and G. Dwyer (2006)
    Uncertainty in predictions of disease spread and
    public health responses to bioterrorism and
    emerging diseases. PNAS 103, 15693-15697
  • I.M. Longini, Jr and J.S. Koopman (1982)
    Household and community transmission parameters
    from final distributions of infections in
    households. Biometrics 38, 115-126.
  • P.D. O'Neill, D. J. Balding, N. G. Becker, M.
    Eerola and D. Mollison (2000) Analyses of
    infectious disease data from household outbreaks
    by Markov Chain Monte Carlo methods. Applied
    Statistics 49, 517-542.
  • M. Van Boven, M. Koopmans, M. D. R. van Beest
    Holle, A. Meijer, D. Klinkenberg, C. A. Donnelly
    and H.A.P. Heesterbeek (2007) Detecting emerging
    transmissibility of Avian Influenza virus in
    human households. PLoS Computational Biology 3,
    1394-1402.
  • D. Clancy and P.D. O'Neill (2007) Exact Bayesian
    inference and model selection for stochastic
    models of epidemics among a community of
    households. Scandinavian Journal of Statistics
    34, 259-274.
  • N.G. Becker (1997) Uses of the EM algorithm in
    the analysis of data on HIV/AIDS and other
    infectious diseases. Statistical Methods in
    Medical Research 6, 24-37.
  • F.G. Ball, D. Mollison and G-P. Scalia-Tomba
    (1997) Epidemic models with two levels of mixing.
    Annals of Applied Probability 7, 46-89.
  • M. Höhle, E. Jørgensen. and P.D. O'Neill (2005)
    Inference in disease transmission experiments by
    using stochastic epidemic models. Applied
    Statistics 54, 349-366.

50
References
  • N. G. Becker (1989) Analysis of Infectious
    Disease Data. Chapman and Hall, London.
  • G. Gibson and E. Renshaw (1998). Estimating
    parameters in stochastic compartmental models
    using Markov chain methods. IMA Journal of
    Mathematics Applied in Medicine and Biology 15,
    19-40.
  • P.D. ONeill and G.O. Roberts (1999) Bayesian
    inference for partially observed stochastic
    epidemics. Journal of the Royal Statistical
    Society Series A 162, 121-129.
  • K. Auranen, E. Arjas, T. Leino and A. K. Takala
    (2000) Transmission of pneumococcal carriage in
    families a latent Markov process model for
    binary longitudinal data. Journal of the American
    Statistical Association 95, 1044-1053.
  • P.E. Lekone and B.F. Finkenstädt  (2006)
    Statistical Inference in a stochastic epidemic
    SEIR model with control intervention Ebola as a
    case study.  Biometrics 62, 1170-1177. 
  • M.J. Keeling, M.E.J. Woolhouse, D.J. Shaw, L.
    Matthews, M. Chase-Topping, D.T. Haydon, S.J.
    Cornell, J. Kappey, J. Wilesmith, B.T. Grenfell
    (2001). Dynamics of the 2001 UK Foot and Mouth
    Epidemic Stochastic Dispersal in a Heterogeneous
    Landscape. Science 294, 813-817.
  • S. Cauchemez and N.M. Ferguson (2008).
    Likelihood-based estimation of continuous-time
    epidemic models from time-series data
    application to measles transmission in London.
    Journal of the Royal Society Interface 5,
    885-897.
  • D. Clancy and P.D. O'Neill (2008) Bayesian
    estimation of the basic reproduction number in
    stochastic epidemic models. Bayesian Analysis, in
    press.
Write a Comment
User Comments (0)
About PowerShow.com