Design of Experiments for Generalised Linear Models - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Design of Experiments for Generalised Linear Models

Description:

Link function either probit or comp. log-log ... Probit and comp log-log link functions. Models s3=(probit, lin); s4=(probit, int), s5=(comp, lin); s6=(comp, int) ... – PowerPoint PPT presentation

Number of Views:158
Avg rating:3.0/5.0
Slides: 37
Provided by: johnecc
Category:

less

Transcript and Presenter's Notes

Title: Design of Experiments for Generalised Linear Models


1
Design of Experiments for Generalised Linear
Models
  • John Eccleston
  • University of Queensland
  • (Sue Lewis Dave Woods (U Southampton UK, Ken
    Russell (UOW))

2
Motivation
Potato Packaging Investigation of several factors
that influence the protected atmosphere
packaging of potatoes one-week shelf
life After one-week the package was checked for
presence or absence (0 or 1) of liquid in the
pack. Variables of interest vitamin
concentration in the pre-packaging dip and the
levels of two gases in the packing atmosphere.
3
Motivation(cont.)
A central composite design (CCD) with each factor
at two levels, axial points and two centre points
a 16 point design This is a standard design
(RSM), which is optimal under the assumption of
an additive linear model and normal data MLEs
could not be obtained, penalised likelihood used
to obtain estimates and standard errors Some
estimates were zero, all standard errors
large Poor experiment Optimality of standard
design assumes standard (normal) response
4
Motivation(cont.)
Chemistry experiment Want Probability that a new
product will be formed in a chemical reaction 48
wells in a tray 4 factors/variables type of
solvent, volume of composition, evaporation
rate, rate of agitation during mixing Response
presence or absence of a new product 0 or 1 3
replicates of a 24 factorial would be a normal
theory experimental design (standardscreening
design)
5
GLMs
  • Model
  • E(Y) m
  • Linear predictor ? Xb
  • Link function ? g(m)
  • Bernoulli data, p g-1(?)
  • Logit link g(p) logp/(1-p)
  • Probit link g(p) F-1(p)
  • Compl. log-log link g(p) log-log(1-p)

6
Estimation
  • The MLEs of the coeffts, bs, are found by
    iterative procedures
  • Var-cov of the MLE of bs is (XWX)-1
  • where
  • XWX is the information matrix (denoted by I)
  • X is the design matrix
  • W diagonal wt. matrix eg for logit w p(1-p)

7
GLM nonlinear model
  • To find optimal X we must have knowledge of b
    this is the major difference between linear and
    non-linear optimal design
  • Designs optimal for a specific b are called
    locally optimal
  • We may also want to distinguish between link
    functions

8
Literature
  • Some references
  • Abdelbasit, K. M. and Plackett, R. L. (1983)
  • Minkin, S. (1987)
  • Ford, I., Torsney, B. and Wu, C. J. F. (1992)
  • Muller, W. G. and Ponce De Leon, A. C. M. (1996)
  • Torsney, B and Gunduz, N. (2000)
  • Review by Atkinson and Haines (1996) (Handbook of
    Stats Vol 13 Ch 14) includes GLMs, also Atkinson
    and Donev (1992)
  • - concentrate on continuous or approximate
    designs
  • - focus here is on discrete or exact designs

9
Performance of Factorial Designs
  • A GLM analysis is performed to estimate the
    probabilities (p) in both examples
  • How good are standard factorial or fractional
    factorial experiments for a binary response
    (logit model)?
  • Is it reasonable to use a standard design which
    is optimal under normal theory in a binary data
    situation?

10
Evaluation of Designs
  • D-optimality is used to compare designs
  • D-value of a design is det XWX
  • X is the design matrix
  • W diagonal wt. matrix eg for logit w p(1-p)
  • Design with max D-value is D-optimal design
  • d1 better than d2 , if D-value of d1gtD-value of
    d2, denoted as D1gtD2

11
Simulation study parameter space
  • First Order standard 24 factorial main-effects
    design as an initial experiment, d0
  • Chemistry example four factors
  • Model b0 x1b1 x2b2 x3b3 x4bb4 Xb
  • Response is binary
  • Investigate different parameter spaces
  • Compare designs using D-value, DDet(Information
    matrix), max D-value is D-optimal design

12
Simulation (cont)
13
Simulation (cont)
  • A sample of 10000 values of each of the bs are
    drawn at random from each B1, B2, B3
  • D-optimal design, ds, computed (algorithm)
  • Efficiency of the standard factorial d0 compared
    to ds, as Ds/D0

14
Simulation Results - Efficiency for each
parameter space
  • B1 median 0.43, lower quartile 0.31, max.
    close to 1, min is v. close to zero,
  • P(No MLEs) 0.77
  • B2 median 0.48, min0.16, max 0.75
  • B3 median 0.07, max 0.28, min 0.003, P(No
    MLEs) approx 1

15
Comments
  • Factorial designs can perform badly esp. when a
    logit model is poorly approximated by a linear
    model (p close to 0.5 ) which is often the case.
  • In our examples factorial designs no good
  • i) chemistry p high
  • ii) potato p close to zero
  • So what to do in the case of no knowledge.

16
An Approach
  • optimise across various parameter spaces
  • Compound criterion (Atkinson and Cox (1974))
  • Consider n parameter sets
  • product of D-value proddet(Ii)1/pi i1,,n
  • det(Ii)1/pi is the D-value for model i
  • Compromise design
  • Find X which max proddet(Ii)1/pi i.e. the
    product of D-values
  • The design is robust for several parameter sets

17
What to do?Compromise across a parameter set
  • Can experimenter give some ranges for the bs, if
    so then
  • This specifies a volume of potential bs
  • Compare
  • 25-3 fractional factorial levels are end pts of
    ranges of bs
  • (9 design compromise)
  • centroid design levels midpoint of ranges (one
    design)
  • Coverage design ( 9 design compromise)
  • Coverage design
  • Select a random sample of no sets of bs one for
    each from the respective range at equally spaced
    intervals
  • Find optimal compromise design

18
Compromise Design Study
19
Compromise Design Study(cont)
  • B1 centred on origin, large volume, B2 centred
    origin, small volume, B3 large volume and centre
    far from origin
  • Results
  • B2 use centroid design
  • B3 use coverage design
  • B1 grey area centroid or coverage?
  • Both centroid and coverage are in general better
    than standard factorial

20
Comments
  • Our investigations reveal
  • If ranges centred around the origin and not too
    wide then centroid design is reasonable
  • Coverage design is reasonable otherwise.
  • Issues
  • Chance of poor design remains
  • The more components in the compromise criterion
    then the weaker the final compromise design
    trade off
  • Open problem

21
GLM nonlinear model
  • To find optimal X we must have knowledge of b
    suppose known
  • Designs optimal for a specific b are called
    locally optimal
  • We may also want to distinguish between link
    functions

22
Compromise across models and link functions
  • Suppose we have two variables x1 and x2
  • Explanatory model (linear or interaction)
  • b0 x1b1 x2b2 OR b0 x1b1 x2b2 b12x1x2
  • Link function either probit or comp. log-log
  • Want a design, X, which is efficient across
    models and link functions a compromise design

23
Example
  • n 6 (design points ) in practice replicated
  • b1 and b2 denoted the linear and interaction
    model parameter values
  • b1 (3.0, 1.6,4.1), b2 (1.2, 1.7, 5.4,
    -1.7)
  • Probit and comp log-log link functions
  • Models s3(probit, lin) s4(probit, int),
    s5(comp, lin) s6(comp, int)
  • Compromise across 4 modelss3, s4, s5 s6

24
Example (cont.)
  • Choose X such that
  • prod(Ds1. Ds2. Ds3. Ds4)
  • Compare the compromise design d(comp) against the
    optimal design for each model,
  • e.g. optimal design for s3 is denoted by d3
  • A computer algorithm developed to search for
    compromise and locally optimal designs
  • SA and CE routines

25
Tables of Efficiencies
Design
  • link function not important
  • linear model has zero eff. for interaction model
  • interaction model poor efficiency for estimating
    linear model
  • compromise design has good eff. across all
    models - robust

26
Simulation Study for bs
  • A set of b2 values are drawn from MN(0,s2I)
  • bj1 bj2 z, zN(0, sj02), j1, 2 3
  • sj02 K bj2
  • For each set b1 and b2 , a compromise design, dc
    and local optimal designs, d1 and d 2 are
    examined through relative efficiencies (ratio of
    opt. values)
  • 10000 samples

27
Efficiencies from Simulation(median, min)
Compromise design more robust to choice of model
than local optimal designs, also wrt parameters
values. Median efficiencies sensitive to value of
K, as K increases efficiency decreases comp
design appear less so (?).
28
Comments
  • Compromise design through compound criteria
    yields efficient designs here
  • If no knowledge of bs then need to optimise
    across parameter space and X space some results
  • Applications (so far)
  • nonlinear models e.g.pharmacokinetic/dynamic,
  • GLM chemical reactions, etc.
  • GLMs (Nonlinear models) present many interesting
    challenges
  • So far have considered simple models only

29
(No Transcript)
30
Potato Example
  • A CCD 2 3 factorial with six axial points at
    1.218 and two centre points a 16 point design.
  • Logit regression was to be fitted
  • Finite MLEs could not be obtained
  • Penalised likelihood (Firth (1993)) gave
    estimates and standard deviations for models
    m1linear, m2m1interaction terms and
    m3m2quadratic terms

31
(No Transcript)
32
Compromise Design Study
  • A comparison is made between designs that
    compromise across different forms of the linear
    predictor, the CCD used and locally optimal
    designs.
  • Use the estimates from the analysis to obtain a
    compromise design across m1, m2 and m3 and to
    obtain individual locally optimal designs.

33
Compromise Design
  • Compromise design
  • Maxdet(Im1)1/pm1.det(Im2)1/pm2.det(Im3)1/pm3
  • Efficiency of design da compared to design db is
    D-value(db)/ D-value(da).

34
Table of Efficiencies
Design
  • Comments
  • Compromise design robust (high eff.) across all
    models
  • Locally opt designs risky zero efficiency for
    some models
  • CCD modest eff.

35
Simulation Study
  • Vary parameter values to determine if eff. of dc
    is robust reparameter values.
  • 10,000 iterations MVN ( b3, s3),
  • b2 and b1 perturbations of b3 as before
  • In general compromise design better than CCD
    under each model (greater median better spread))
  • An obvious advantage is that a compromise design
    incorporates prior knowledge

36
Conclusions
  • Compromise design through compound criteria
    yields efficient designs here
  • Account for uncertainty in model and parameter
    values
  • If no knowledge of bs then need to optimise
    across parameter space and X space some results
  • Continuing research
  • Qualitative variables blocking .
  • Other models eg Poisson data, ordinal data
  • No prior knowledge case, random effects
  • General nonlinear models compartmental models
  • Model discrimination etc. computational issues.
Write a Comment
User Comments (0)
About PowerShow.com