Design of Experiments for Generalised Linear Models - PowerPoint PPT Presentation

1 / 36

About This Presentation

Title:

Design of Experiments for Generalised Linear Models

Description:

Link function either probit or comp. log-log ... Probit and comp log-log link functions. Models s3=(probit, lin); s4=(probit, int), s5=(comp, lin); s6=(comp, int) ... – PowerPoint PPT presentation

Number of Views:158

Avg rating:3.0/5.0

Slides: 37

Provided by: johnecc

Category:

more less

Transcript and Presenter's Notes

Title: Design of Experiments for Generalised Linear Models

1
Design of Experiments for Generalised Linear
Models

John Eccleston
University of Queensland
(Sue Lewis Dave Woods (U Southampton UK, Ken
Russell (UOW))

2
Motivation
Potato Packaging Investigation of several factors
that influence the protected atmosphere
packaging of potatoes one-week shelf
life After one-week the package was checked for
presence or absence (0 or 1) of liquid in the
pack. Variables of interest vitamin
concentration in the pre-packaging dip and the
levels of two gases in the packing atmosphere.
3
Motivation(cont.)
A central composite design (CCD) with each factor
at two levels, axial points and two centre points
a 16 point design This is a standard design
(RSM), which is optimal under the assumption of
an additive linear model and normal data MLEs
could not be obtained, penalised likelihood used
to obtain estimates and standard errors Some
estimates were zero, all standard errors
large Poor experiment Optimality of standard
design assumes standard (normal) response
4
Motivation(cont.)
Chemistry experiment Want Probability that a new
product will be formed in a chemical reaction 48
wells in a tray 4 factors/variables type of
solvent, volume of composition, evaporation
rate, rate of agitation during mixing Response
presence or absence of a new product 0 or 1 3
replicates of a 24 factorial would be a normal
theory experimental design (standardscreening
design)
5
GLMs

Model
E(Y) m
Linear predictor ? Xb
Link function ? g(m)
Bernoulli data, p g-1(?)
Logit link g(p) logp/(1-p)
Probit link g(p) F-1(p)
Compl. log-log link g(p) log-log(1-p)

6
Estimation

The MLEs of the coeffts, bs, are found by
iterative procedures
Var-cov of the MLE of bs is (XWX)-1
where
XWX is the information matrix (denoted by I)
X is the design matrix
W diagonal wt. matrix eg for logit w p(1-p)

7
GLM nonlinear model

To find optimal X we must have knowledge of b
this is the major difference between linear and
non-linear optimal design
Designs optimal for a specific b are called
locally optimal
We may also want to distinguish between link
functions

8
Literature

Some references
Abdelbasit, K. M. and Plackett, R. L. (1983)
Minkin, S. (1987)
Ford, I., Torsney, B. and Wu, C. J. F. (1992)
Muller, W. G. and Ponce De Leon, A. C. M. (1996)
Torsney, B and Gunduz, N. (2000)
Review by Atkinson and Haines (1996) (Handbook of
Stats Vol 13 Ch 14) includes GLMs, also Atkinson
and Donev (1992)
- concentrate on continuous or approximate
designs
- focus here is on discrete or exact designs

9
Performance of Factorial Designs

A GLM analysis is performed to estimate the
probabilities (p) in both examples
How good are standard factorial or fractional
factorial experiments for a binary response
(logit model)?
Is it reasonable to use a standard design which
is optimal under normal theory in a binary data
situation?

10
Evaluation of Designs

D-optimality is used to compare designs
D-value of a design is det XWX
X is the design matrix
W diagonal wt. matrix eg for logit w p(1-p)
Design with max D-value is D-optimal design
d1 better than d2 , if D-value of d1gtD-value of
d2, denoted as D1gtD2

11
Simulation study parameter space

First Order standard 24 factorial main-effects
design as an initial experiment, d0
Chemistry example four factors
Model b0 x1b1 x2b2 x3b3 x4bb4 Xb
Response is binary
Investigate different parameter spaces
Compare designs using D-value, DDet(Information
matrix), max D-value is D-optimal design

12
Simulation (cont)
13
Simulation (cont)

A sample of 10000 values of each of the bs are
drawn at random from each B1, B2, B3
D-optimal design, ds, computed (algorithm)
Efficiency of the standard factorial d0 compared
to ds, as Ds/D0

14
Simulation Results - Efficiency for each
parameter space

B1 median 0.43, lower quartile 0.31, max.
close to 1, min is v. close to zero,
P(No MLEs) 0.77
B2 median 0.48, min0.16, max 0.75
B3 median 0.07, max 0.28, min 0.003, P(No
MLEs) approx 1

15
Comments

Factorial designs can perform badly esp. when a
logit model is poorly approximated by a linear
model (p close to 0.5 ) which is often the case.
In our examples factorial designs no good
i) chemistry p high
ii) potato p close to zero
So what to do in the case of no knowledge.

16
An Approach

optimise across various parameter spaces
Compound criterion (Atkinson and Cox (1974))
Consider n parameter sets
product of D-value proddet(Ii)1/pi i1,,n
det(Ii)1/pi is the D-value for model i
Compromise design
Find X which max proddet(Ii)1/pi i.e. the
product of D-values
The design is robust for several parameter sets

17
What to do?Compromise across a parameter set

Can experimenter give some ranges for the bs, if
so then
This specifies a volume of potential bs
Compare
25-3 fractional factorial levels are end pts of
ranges of bs
(9 design compromise)
centroid design levels midpoint of ranges (one
design)
Coverage design ( 9 design compromise)
Coverage design
Select a random sample of no sets of bs one for
each from the respective range at equally spaced
intervals
Find optimal compromise design

18
Compromise Design Study
19
Compromise Design Study(cont)

B1 centred on origin, large volume, B2 centred
origin, small volume, B3 large volume and centre
far from origin
Results
B2 use centroid design
B3 use coverage design
B1 grey area centroid or coverage?
Both centroid and coverage are in general better
than standard factorial

20
Comments

Our investigations reveal
If ranges centred around the origin and not too
wide then centroid design is reasonable
Coverage design is reasonable otherwise.
Issues
Chance of poor design remains
The more components in the compromise criterion
then the weaker the final compromise design
trade off
Open problem

21
GLM nonlinear model

To find optimal X we must have knowledge of b
suppose known
Designs optimal for a specific b are called
locally optimal
We may also want to distinguish between link
functions

22
Compromise across models and link functions

Suppose we have two variables x1 and x2
Explanatory model (linear or interaction)
b0 x1b1 x2b2 OR b0 x1b1 x2b2 b12x1x2
Link function either probit or comp. log-log
Want a design, X, which is efficient across
models and link functions a compromise design

23
Example

n 6 (design points ) in practice replicated
b1 and b2 denoted the linear and interaction
model parameter values
b1 (3.0, 1.6,4.1), b2 (1.2, 1.7, 5.4,
-1.7)
Probit and comp log-log link functions
Models s3(probit, lin) s4(probit, int),
s5(comp, lin) s6(comp, int)
Compromise across 4 modelss3, s4, s5 s6

24
Example (cont.)

Choose X such that
prod(Ds1. Ds2. Ds3. Ds4)
Compare the compromise design d(comp) against the
optimal design for each model,
e.g. optimal design for s3 is denoted by d3
A computer algorithm developed to search for
compromise and locally optimal designs
SA and CE routines

25
Tables of Efficiencies
Design

link function not important
linear model has zero eff. for interaction model
interaction model poor efficiency for estimating
linear model
compromise design has good eff. across all
models - robust

26
Simulation Study for bs

A set of b2 values are drawn from MN(0,s2I)
bj1 bj2 z, zN(0, sj02), j1, 2 3
sj02 K bj2
For each set b1 and b2 , a compromise design, dc
and local optimal designs, d1 and d 2 are
examined through relative efficiencies (ratio of
opt. values)
10000 samples

27
Efficiencies from Simulation(median, min)
Compromise design more robust to choice of model
than local optimal designs, also wrt parameters
values. Median efficiencies sensitive to value of
K, as K increases efficiency decreases comp
design appear less so (?).
28
Comments

Compromise design through compound criteria
yields efficient designs here
If no knowledge of bs then need to optimise
across parameter space and X space some results
Applications (so far)
nonlinear models e.g.pharmacokinetic/dynamic,
GLM chemical reactions, etc.
GLMs (Nonlinear models) present many interesting
challenges
So far have considered simple models only

29
(No Transcript)
30
Potato Example

A CCD 2 3 factorial with six axial points at
1.218 and two centre points a 16 point design.
Logit regression was to be fitted
Finite MLEs could not be obtained
Penalised likelihood (Firth (1993)) gave
estimates and standard deviations for models
m1linear, m2m1interaction terms and
m3m2quadratic terms

31
(No Transcript)
32
Compromise Design Study

A comparison is made between designs that
compromise across different forms of the linear
predictor, the CCD used and locally optimal
designs.
Use the estimates from the analysis to obtain a
compromise design across m1, m2 and m3 and to
obtain individual locally optimal designs.

33
Compromise Design

Compromise design
Maxdet(Im1)1/pm1.det(Im2)1/pm2.det(Im3)1/pm3
Efficiency of design da compared to design db is
D-value(db)/ D-value(da).

34
Table of Efficiencies
Design

Comments
Compromise design robust (high eff.) across all
models
Locally opt designs risky zero efficiency for
some models
CCD modest eff.

35
Simulation Study

Vary parameter values to determine if eff. of dc
is robust reparameter values.
10,000 iterations MVN ( b3, s3),
b2 and b1 perturbations of b3 as before
In general compromise design better than CCD
under each model (greater median better spread))
An obvious advantage is that a compromise design
incorporates prior knowledge

36
Conclusions

Compromise design through compound criteria
yields efficient designs here
Account for uncertainty in model and parameter
values
If no knowledge of bs then need to optimise
across parameter space and X space some results
Continuing research
Qualitative variables blocking .
Other models eg Poisson data, ordinal data
No prior knowledge case, random effects
General nonlinear models compartmental models
Model discrimination etc. computational issues.