State Space Models for Survival Analysis - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

State Space Models for Survival Analysis

Description:

1. State Space Models for Survival Analysis. Weiming Ke. The University of Memphis (Joint work with Dr. Wai-yuan Tan) June 29, 2004. 2. Outline ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 57
Provided by: Tig58
Category:

less

Transcript and Presenter's Notes

Title: State Space Models for Survival Analysis


1
State Space Models for Survival Analysis
  • Weiming Ke
  • The University of Memphis
  • (Joint work with Dr. Wai-yuan Tan)
  • June 29, 2004

2
Outline
  • Introduction
  • State Space Model
  • Estimation of Parameters
  • ? Multi-level Gibbs Sampling Procedure
  • ? Weighted Bootstrap Procedure
  • Estimation of the Survival Probabilities
  • An Illustrative Example
  • Computer Simulation
  • Summary

3
Introduction
  • Many diseases such as AIDS, cancer and infectious
    diseases are often very complicated biologically.
  • Most of these diseases are complex stochastic
    processes where it is often very difficult to
    estimate the unknown parameters, especially in
    cases where not many data are available.

4
Introduction (cont.)
  • In this article we propose a state space modeling
    approach by combining stochastic models with
    statistical models to describe the process of a
    disease.
  • Then we will apply the Gibbs sampling method and
    the weighted bootstrap method to estimate the
    unknown parameters and the state variables of the
    model.
  • By using these estimates, we can validate the
    model and then estimate the survival
    probabilities.

5
State Space Model
  • the state space model of a system consists of
    two sub-models
  • 1. The stochastic system model which is the
  • stochastic model of the system.
  • 2. The observation model which is a
    statistical
  • model relating some available data to the
    system.
  • It extracts biological information from the
    system via its stochastic system model and
    integrates this information with those from the
    data through its observation model.

6
State Space Model (cont.)
  • The state space model was originally proposed by
    Kalman in the early 60s for engineering control
    and communication (Kalman, 1960). Since then it
    has been successfully used as a powerful tool in
    aerospace research.
  • It was first proposed by Tan and his associates
    for AIDS and cancer research (Tan et al., 1998,
    1999, 2000, 2001, 2002).
  • Apparently state space models can be extended to
    other diseases as well, such as infectious
    diseases.

7
Advantages of the State Space Models
  • The state space model of a system is advantageous
    over the stochastic model of the system alone or
    the statistical model of the system alone since
    it combines information and advantages from both
    of these models.
  • The followings are some specific advantages of
    the state space models

8
Advantages of the State Space Models (cont.)
  • State space model provides an optimal procedure
    to update the model by new data which may become
    available in the future. This is the smoothing
    step of the state space models.
  • The state space model provides an optimal
    procedure via the Gibbs sampling method and the
    weighted bootstrap method to estimate
    simultaneously the unknown parameters and the
    state variables of interest.

9
The stochastic system model
  • In many cases, we can derive stochastic equations
    for the state variables of a system by using
    basic biological mechanism of the disease.
  • We will illustrate the state space modeling
    approach by using a birthdeathillnesscure
    process for a disease such as tuberculosis.

10
The stochastic system model (cont.)
  • Consider a population of individuals who are at
    risk for a disease, such as tuberculosis.
  • In this population, there are two types of
    people the normal healthy people who do not have
    the disease and the sick people who have
    contracted the disease.
  • Let N1(t) and N2(t) denote the numbers of the
    normal people and the sick people respectively in
    the population at time t.

11
The stochastic system model (cont.)
  • To derive stochastic differential equations for
    the state variables N1(t) and N2(t) , the
    following transition variables are used
  • F1(t) Number of normal healthy people who
    become sick
  • during t, t ?t),
  • F2(t) Number of sick people who are cured by
    the drug
  • during t, t ?t),
  • B1(t),B2 (t) Numbers of births of N1, N2
    people during t, t ?t),
  • D1(t),D2(t) Numbers of deaths of N1, N2 people
    during t, t ?t)
  • R1(t),R2 (t) Numbers of immigrants of N1 and
    N2 people during
  • t, t ?t),

12
The stochastic system model (cont.)
  • Let a1 denote the disease rate, and a2 the cure
    rate. It means that a1 and a2 are the transition
    rates of N1?N2 and N2?N1 respectively.
  • Let b1, d1, ?1 denote the birth rate, death
    rate and immigration rate of the N1 people.
  • Let b2, d2, ?2 denote the birth rate, death
    rate and immigration rate of the N2 people.

13
The stochastic system model (cont.)
  • Assume that during the time interval t, t ?t),
    the birthdeathillnesscure processes follow the
    multinomial distributions with parameters b1?t,
    d1?t, a1?t, and b2?t, d2?t, a2?t.
  • Assume that the immigration follow the Poisson
    distributions with means ?1(t)?t and ?2 (t)?t .

14
The stochastic system model (cont.)
  • Then, given N1(t) and N2(t) , the conditional
    probability distributions of
  • B1(t), D1(t), F1(t) and B2(t), D2(t),
    F2(t) are given by
  • B1(t), D1(t), F1(t) N1(t)
  • Multinomial N1(t) b1?t, d1?t, a1?t,
  • B2(t), D2(t), F2(t) N2(t)
  • Multinomial N2(t) b2?t, d2?t, a2?t.

15
The stochastic system model (cont.)
  • The conditional probability distributions of
  • R1(t) and R2(t) are given respectively by
  • R1(t) N1(t) Poisson with mean N1(t)?1(t)?t
    ,
  • R2(t) N2(t) Poisson with mean N2(t)?2(t)?t
    .

16
The stochastic system model (cont.)
  • Then, we have the following stochastic
    differential equations for N1(t) and N2(t)
  • N1(t ?t) N1(t)R1(t)F2(t)B1(t)-F1(t)-D1(t)
  • N2(t ?t) N2(t)R2(t)F1(t)B2(t)-F2(t)-D2(t)

17
The observation model
  • If some observed data are available from this
    system, then we can derive some statistical
    models to relate the data to the system.
  • For the observation model, assume that observed
    numbers of N1 and N2 people at times tk , k 1,
    . . . , n are available.
  • Let Y1(k) and Y2(k) be the observed numbers of N1
    and N2 people at time tk .

18
The observation model (cont.)
  • Assume that and
    are normally
    distributed with mean 0 and variances s12 and
    s22, independently for k 1, . . . , n. The
    observation models are represented by the
    statistical models that are given by
  • Y1(k) N1(tk) N1(tk)1/2e1 ,
  • Y2(k) N2(tk) N2(tk)1/2e2 ,
  • for k 1, . . . , n
  • where e1 and e2 are independently
    distributed as normal with mean 0 and variances
    s12 and s22 .

19
Estimation of Parameters (cont.)
  • From the state space models, we can estimate
    simultaneously the unknown parameters and the
    state variables through the multi-level Gibbs
    sampling method and the weighted bootstrap
    method.
  • For the above example, birth rates b1, b2,
    death rates d1, d2, and immigration rates
    ?1, ?2 of the normal and sick people, and the
    transition rates a1, a2 of N1?N2 and N2?N1 are
    the parameters to be estimated.

20
Estimation of Parameters (cont.)
  • To illustrate, let X be the collection of all the
    state variables N1(t), N2(t), T the collection
    of all unknown parameters b1, b2, d1, d2, ?1,
    ?2, a1, a2, and Y the collection of vectors of
    observed data sets Y1(k), Y2(k), k 1, . . . ,
    n.
  • Let P(T) be the prior distribution of the
    parameters T, P(X T) the conditional
    probability density of X given the parameters T,
    and P(Y X, T) the conditional probability
    density of Y given X and T.

21
Estimation of Parameters (cont.)
  • The joint probability density function of (X, Y,
    T) is
  • P(X, Y, T) P(T) P(X T) P(Y X, T)
  • From above, we can derive the conditional
    probability density function P(X T, Y) of X
    given (T, Y) and the conditional probability
    density function P(T X, Y) of T given (X,Y),
    respectively, as
  • P(X T, Y) ? P(X T) P(Y X, T),
  • and P(T X, Y) ? P(T) P(X T) P(Y X, T).

22
Gibbs sampling procedure
  • The multi-level Gibbs sampling method is an
    extension of the Gibbs sampling method to the
    multivariate cases.
  • The method was first proposed by Sheppard
    (1994).
  • This method was a useful tool for estimating the
    unknown parameters and state variables through a
    sequential procedure.

23
Gibbs sampling procedure (cont.)
  • The multi-level Gibbs sampling procedures for
    estimating the unknown parameters T and the state
    variables X are given by the following loop
  • (1) Given initial values of T and observed data
    Y, generate X from P(X Y, T) through the
    weighted Bootstrap method due to Smith and
    Gelfand (1992) .

24
Gibbs sampling procedure (cont.)
  • (2) Generate T from the conditional distribution
  • PT X, Y where X is the value obtained
  • in step (1).
  • (3) Using T obtained from (2) as initial values,
  • go back to (1) to generate X, and repeat
    the
  • (1), (2) loop until convergence.

25
Gibbs sampling procedure (cont.)
  • At convergence, we then generate a random sample
    of X from the conditional distribution PX Y,
    and a random sample of T from the posterior
    distribution PT Y.
  • Repeating these procedures we then generate a
    random sample of X and a random sample of T. we
    may then use the sample means as the estimates of
    X and T, and use the sample variances as the
    variances of these estimates.

26
Weighted bootstrap procedure
  • Because in practice it is often very difficult to
    derive P(X Y, T), whereas it is easy to
    generate X from P(X T).
  • We developed an indirect method by using the
    weighted bootstrap method due to Smith and
    Gelfand (1992) to generate X from P(X Y, T)
    through the generation of X from P(X T).
  • The proof of this algorithm has been given in Tan
    (2002).

27
Weighted bootstrap procedure (cont.)
  • The algorithm of the weighted bootstrap method is
    given by the following steps
  • (a) Given T and X(i), generate a large
    random sample of size m on X(i1) by using
    PX(i1)X(i) from the stochastic system model
    denoted it by
  • X(1)(i1), , X(m)(i1)
  • (b) Computing ?k PY(i1) X(k)(i1), T by
    using the statistical model.
  • Computing qk ?k ? (?1 ?2 ?m )
  • for k 1, , m.

28
Weighted bootstrap procedure (cont.)
  • (c) Construct a population ? with element E1,
    , Em and with P(Ek) qk .
  • Draw an element randomly from ?. If the
    outcome is Ek, then X(k)(i1) is the element of
    X(i1) generated from the conditional
    distribution of X given the observed data Y and
    the parameter T.
  • (d) Start with i 1 and repeat (a) ? (c) until
    i tM to generate a random sample of X from P(X
    Y, T).

29
Estimating Parameters
  • Now we explain how to generate parameters
  • T T1, T2, , Tk from P(T X, Y) by using
    the multi-level Gibbs sampling method.
  • With no loss of generality, we illustrate the
    method with k 3 and write T T1, T2, T3.
  • The procedure for estimating the parameters goes
    through the following loop

30
Estimating Parameters (cont.)
  • Given initial values of T and X, generate
  • T1 from P(T1 Y, X, T2, T3), and
  • denote it by T1(1).
  • (2) Generate T2 from P(T2 Y, X, T1(1), T3) and
    denote it by T2(1), where T1(1) is the value
    obtained in step (1).

31
Estimating Parameters (cont.)
  • Generate T3 from PT3Y, X, T1(1), T2(1) and
    denote it by T3(1), where T1(1) and T2(1) are the
    values obtained in (1) and (2).
  • Using T1(1), T2(1), T3(1) obtained from
  • (1)?(3) as initial values, generate X from
  • P(X Y, T) and denote it by X(1).

32
Estimating Parameters (cont.)
  • (5) Using T T1(1), T2(1), T3(1) and X X(1)
  • obtained from (1)?(4) as initial values, go
  • back to repeat (1)?(4) to generate T1(2),
  • T2(2), T3(2) and X(2), and repeat the loop
  • until convergence.

33
Estimating Parameters (cont.)
  • At convergence, we can generate a random sample
    of X from the conditional distribution P(X Y)
    of X given Y, and a random sample of T from the
    posterior distribution P(T Y) of T given Y.
  • Repeating these procedures we then generate a
    random sample of X and a random sample of T.

34
Estimating Parameters (cont.)
  • We may then use the sample means to derive the
    estimates of the parameters T and the state
    variables X, and use the sample variances as the
    variances of these estimates.
  • The convergence of these procedures is proved by
    using the basic theory of homogeneous Markov
    chains see (Tan, 2002, Chapter 3).

35
Estimation of the survival probabilities
  • By using the estimates of the parameters, we can
    estimate the survival probabilities.
  • We will illustrate the procedure by using the
    same example.
  • Let d1 and d2 denote the death rates of the
    normal and sick people, respectively a1 and a2
    are the transition rates of N1?N2 and N2?N1,
    respectively.

36
Estimation of the survival probabilities (cont.)
  • Let S1(t) and S2(t) denote the survival
    probabilities that normal and sick people will
    survive at time t when the population is at risk
    for the disease.
  • Then we have the following system of equations

37
Estimation of the survival probabilities (cont.)
  • The solution of the above equations are given by
  • where w a1 a2 d1 d22 4a2 d2
    d1 1/2.
  • d1 a1 a2 d1 d2 w.
  • d2 a1 a2 d1 d2 ? w.

38
An illustrative example
  • To illustrate the above methods, consider the
    disease tuberculosis (TB) which is curable by
    drugs.
  • Given in Table 1 are the numbers of TB cases in
    USA from 1980 to 1992 reported by CDC together
    with the total USA population sizes over these
    years (CDC Report, 1993).
  • Given in Figure 1 are the numbers of TB people.
  • In this data set, it is clear that the number of
    TB cases in USA is declining to the lowest level
    in 1985 and then increases due presumably to the
    effects of HIV (CDC Report, 1993).

39
Table 1. Observed numbers of total people, TB
people and normal people
40
Figure 1. Observed numbers of TB people
41
An illustrative example (cont.)
  • To fit this data, we thus assume that the
    infection rate a1 a1(1) before January 1985 and
    assume that a1 a1(2) after January 1985.
  • Assume that other parameters are not affected by
    HIV and other factors. Because the TBs are rare
    in children, we may ignore birth so that the
    unknown parameters are T ?1, ?2 d1, d2 a1(1),
    a1(2), a2.
  • Let t0 0 denote January 1980 so that
  • N1(0) 226517805 and N2(0) 28000.

42
An illustrative example (cont.)
  • Using the data given in Table 1, we apply the
    Gibbs sampling method and the weighted bootstrap
    method to estimate the unknown parameters and
    the state variables.
  • Because we do not have previous knowledge about
    the parameters, we assumed a non-informative
    uniform prior for the parameters.
  • The estimates of the parameters are given in
    Table 2.
  • The estimates of the numbers of TB people are
    plotted in Figure 2, together with the observed
    numbers.
  • The estimates of the survival probabilities
    of the sick people are plotted in Figure 3.

43
Table 2 Estimates of parameters and standard
errors
44
Figure 2. Estimated and Observed Numbers of TB
people(--- Estimated --- Observed)
45
Figure 3. Estimates of the survival probabilities
of TB people
46
An illustrative example (cont.)
  • From Figure 2, apparently the estimated numbers
    of the sick people are close to its observed
    numbers.
  • From results in Table 2, it turned out that the
    estimate of a1(2) is slightly greater than a1(1),
    indicating that since 1985 HIV and/or other
    causes have increased the infection rate of TB
    slightly.

47
Computer Simulation
  • To further examine this approach, we have assumed
    some parameter values and generated some computer
    Monte Carlo data.
  • The generated numbers are given in Table 3
  • The values of the parameters for generating these
    data are ?1 0.05, ?2 0.03,
  • d1 0.04, d2 0.1, a1 0.2, a2 0.4,
  • N1(0) 1000, N2(0) 10.

48
Table 3. Generated Numbers of Normal People and
Sick People
49
Computer Simulation (cont.)
  • Using the data in Table 3 and assuming a
    non-informative uniform prior for the parameters,
    we have applied the Gibbs sampling procedure and
    the weighted bootstrap procedure to estimate the
    parameters and the state variables.
  • Given in Table 4 are the estimates of the
    parameters and their true values.
  • Plotted in Figure 4 are the estimates of the
    numbers of the sick people together with the
    generated numbers.
  • Plotted in Figure 5 are the estimates of the
    survival probabilities.

50
Table 4. Estimates of Parameters and Standard
Errors
51
Figure 4. Estimated and Observed Numbers of Sick
people (--- Estimated --- Observed)
52
Figure 5. Estimates of the Survival Probabilities
of Sick People
53
Computer Simulation (cont.)
  • From the results in Table 4, apparently, the
    estimates are very close to its true values.
  • From Figure 4, it is also apparent that the
    estimates of the state variables are very close
    to the generated numbers.
  • These results indicate that the methods proposed
    in this article are very effective.

54
Summary
  • In this article, we have developed a state space
    model for a birthdeathillnesscure process.
  • We have developed a procedure (the Gibbs
    sampling method together with the weighted
    bootstrap method) to estimate the unknown
    parameters and the state variables, and hence the
    survival probabilities.
  • The numerical example and the computer simulation
    indicate that the methods are quite useful and
    promising.

55
Discussion
  • In recent years, Tan and his associates have
    developed some state space models for cancers and
    AIDS (Tan and Chen, 1998 Tan and Xiang, 1998
    Tan and Xiang, 1999 Tan and Ye, 2000 Wu and
    Tan, 2000 Tan et al., 2001 Tan et al., 2002).
  • The present article extends this modeling
    approach to other human diseases such as
    tuberculosis. This type of modeling approach is
    definitely useful for other diseases such as the
    heart disease and to risk assessment of
    environmental agents as well. In this respect,
    more research works are needed.

56
  • Thank You !
Write a Comment
User Comments (0)
About PowerShow.com