Modeling Wim Buysse RUFORUM 1 December 2006 - PowerPoint PPT Presentation

1 / 70
About This Presentation
Title:

Modeling Wim Buysse RUFORUM 1 December 2006

Description:

Title: Modeling Author: Wim Buysse Description: RUFORUM 1 December 2006 Last modified by: Wim Buysse Created Date: 6/1/2004 12:01:30 PM Document presentation format – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 71
Provided by: Wim45
Category:

less

Transcript and Presenter's Notes

Title: Modeling Wim Buysse RUFORUM 1 December 2006


1
ModelingWim BuysseRUFORUM 1 December 2006
Research Methods Group
2
Part 1. General Linear Models
Research Methods Group
3
General Linear Models
Dataset from
Research Methods Group
4
General Linear Models
Dataset from p. 89 - 95
Research Methods Group
5
General Linear Models
Effects of three levels of sorbic acid (Sorbic)
and six levels of water activity (Water) on
survival of Salmonella typhimurium
(Density) Water density log(density/ml)
Research Methods Group
6
General Linear Models
ANOVA approach
Research Methods Group
7
General Linear Models
Results
Research Methods Group
8
General Linear Models
The same data, but each treatment is presented
as a dummy variable. (Warning for educational
purposes only.)
Research Methods Group
9
General Linear Models
Regression with a first independent variable.
Research Methods Group
10
General Linear Models
We add a second independent variable.
Research Methods Group
11
General Linear Models
We add a third one.
Research Methods Group
12
General Linear Models
We add a fourth one.
Research Methods Group
13
General Linear Models
We continue to construct the model.
Research Methods Group
14
General Linear Models
Finally, the results.
Research Methods Group
15
General Linear Models
Comparison of the two approaches.
Research Methods Group
16
General Linear Models
  • Comparison of the two approaches
  • They give the same results (in terms of SS.)
  • The approach to choose depends on what you want
    to know.
  • The regression approach still works when the
    ANOVA approach is not possible anymore (for
    instance when there are missing values).

Research Methods Group
17
Example modelling approach with normally
distributed data.
Protocol and dataset.
Research Methods Group
18
Example modelling approach with normally
distributed data.
Data Screening of suitable species for
three-year fallow file Fallow
N.xls Protocol p. 13
Research Methods Group
19
Example modelling approach with normally
distributed data.
The analysis approach is written down in
chapter 19 of Good statistical practice for
natural resources research
Research Methods Group
20
Modelling approach general
  • 5 steps
  • (Visual) exploration to discover trends and
    relationships
  • Choose a possible model
  • The trend you see
  • Knowledge of the experimental design
  • Biological/scientific knowledge of the process
  • Fitting estimation of parameters
  • Check assessing the fit
  • Interpretation to answer the objectives.

Research Methods Group
21
Expanding the model
  • ANOVA and regression
  • Same calculations
  • Data
  • pattern noise
  • systematic component random component
  • Same assumptions
  • Systematic components are additive
  • Variability of the groups is similar
  • The random component is (rather) normally
    distributed. The random variability of y around
    the systematic component is not affected by this
    systematic component.

Research Methods Group
22
GENERAL LINEAR MODELS
Research Methods Group
23
GENERAL LINEAR MODELS
Research Methods Group
24
GENERAL LINEAR MODELS
Data pattern
noise Pattern is explained by a linear
combination of the independent variables (Data
N(m,v) and the variance is rather constant
across the different groups) Noise N(0,1) and
the variance is rather constant across the
different groups
Research Methods Group
25
Expanding the model
  • If the data are not normally distributed or if
    the variance of the different groups is not
    similar
  • Possible approach transformation of the data
     linearising  the model
  • Problems
  • You dont work anymore on a scale that has a
    biological meaning.
  • Retransforming the standard errors back to the
    original scale is not possible anymore.

Research Methods Group
26
Expanding the model
Better solution GENERAL LINEAR MODELS gt
GENERALIZED LINEAR MODELS
  • Less restrictions two essential differences
  • Data can be distributed according to the family
    of exponential distributions Normal, Binomial,
    Poisson, Gamma, Negative binomial
  • Link function the link between E(Y) and the
    independent variables is not longer a linear
    combination of the independent variables. It is
    also possible that the linear combination of the
    independent variables is a function of can also
    be a linear combination of a function of E(Y).
    (We dont transform the dependent variables but
    include the transformation into the model).

Research Methods Group
27
Expanding the model
Better solution GENERAL LINEAR MODELS gt
GENERALIZED LINEAR MODELS
  • Also
  • - The systematic component (linear combination
    of independent variables) can include both
    continuous and categorical variables and even
    polynomials
  • But still
  • The variance is constant across the different
    groups (or has become constant because of the
    transformation through the link function)

Research Methods Group
28
Generalised linear models
Statistical theory is more difficult, but the
menus in GenStat and the way you can interpret
the output is very similar to what we know from
ANOVA and regression.
Research Methods Group
29


Research Methods Group
30
Example 1. Logistic regression
Example cardio-vascular disease according to age
age and chd.xls
Research Methods Group
31
Example 1. Logistic regression
Example same data but according to age group
Research Methods Group
32
Example 1. Logistic regression
Example the linear regression is not an
appropriate model and the predictions at the
extremes will not be correct
Research Methods Group
33
Example 1. Logistic regression
Example test ?2 test limited information
Research Methods Group
34
Example 1. Logistic regression
  • Bernoulli process an (independent) event that
    can have two possible outcomes (1 0,
    success-failure, ) with a given probability of
    succes
  • Tossing a coin head or tail p 0,5
  • Throwing 6 with a dice (success) compared to
    throwing any other number p 1/6
  • Conducting a survey is the head of the household
    male or female? calculate p from the proportion
    found in the collected data
  • Screening of cardio-vascular diseases. p disease
    43 out of 100 individuals 0.43

Research Methods Group
35
Example 1. Logistic regression
  • In GenStat

Research Methods Group
36
Example 1. Logistic regression
  • Logistic function

Research Methods Group
37
Example 1. Logistic regression
  • Logistic function
  • Sigmoid form
  • Linear in the middle
  • The probability is restricted between 0 et 1
  • Small values flatten towards 0 large values
    flatten towards 1

Research Methods Group
38
Example 1. Logistic regression
  • GenStat output
  • Similar, but deviance instead of variance and
    test ?2 instead of F

Research Methods Group
39
Example 1. Logistic regression
  • GenStat output
  • model
  • Logit(CHD) -5,31 0,1109 AGE

Research Methods Group
40
Example 1. Logistic regression
  • Logit(CHD) -5,31 0,1109 AGE

Research Methods Group
41
Example 1. Logistic regression
Research Methods Group
42
Example 1. Logistic regression
  • Binomial distribution when we repeat the
    Bernoulli process, the order of success or
    failure can change
  • Example head of household in a survey

Research Methods Group
43
Example 1. Logistic regression
  • Calculation of probabilities if success female
    headed household with p 0,2

Research Methods Group
44
Example 1. Logistic regression
  • Calculated probabilities for obtaining success
  • We can now construct a frequency distribution of
    obtaining success
  • Probability long-run frequency frequency when
    very many data
  • binomial distribution

Research Methods Group
45
Example 1. Logistic regression
  • Binomial distribution
  • Counts of a categorical variable
  • Example experiment of survival of trees from
    different provenances
  • File survival trees.xls

Research Methods Group
46
Example 1. Logistic regression
  • Several approaches possible

1
Research Methods Group
47
Example 1. Logistic regression
  • Several approaches possible

1
Research Methods Group
48
Example 1. Logistic regression
  • Several approaches possible

2
Research Methods Group
49
Example 1. Logistic regression
  • Several approaches possible

2
Research Methods Group
50
Example 1. Logistic regression
  • Several approaches possible

3
Research Methods Group
51
Example 1. Logistic regression
  • Several approaches possible

3
Research Methods Group
52
Example 1. Logistic regression
  • The Bernoulli distribution is a special case of
    the binomial distribution
  • There exist families of distributions.

Research Methods Group
53
Example 1. Logistic regression
  • There is of course a difference in the
    variability that is explained.

1
2
3
Research Methods Group
54
Example 2. Modelling counts
  • We used logistic regression to analyse counts.
  • Bernoulli distribution distribution of success
    of events that follow a Bernoulli process (1 or
    0, yes or no)
  • Binomial distribution distribution of possible
    (and independent) combinations of Bernoulli
    events
  • So, more like analysis of proportions.
  • Next Poisson distribution distribution of
    counts of Bernoulli events

Research Methods Group
55
Example 2. Modelling counts
  • Poisson distribution distribution of counts of
    Bernoulli events
  • BUT
  • p is very small
  • n is very big
  • pn lt 5
  • Events happen randomly and independent of each
    other.

Research Methods Group
56
Example 2. Modelling counts
  • Poisson distribution distribution of rare
    events
  • Number of civil airplane crashes (when there is
    no war) in the whole world during several years.
  • Number of infected seeds in seed lots that are
    certified by a controlling agency.
  • Number of individuals of a rare tree species in a
    square kilometre in the same Agro Ecological Zone.

Research Methods Group
57
Example 2. Modelling counts
  • THUS
  • The distribution that best describes counts is
    not automatically a Poisson distribution.
  • It depends of the context.

Research Methods Group
58
Example 2. Modelling counts
  • Some mathematical statistics

The proportion mean/variance must be 1.
Poisson index In GenStat (s2-m)/m
Research Methods Group
59
Example 2. Modelling counts
We briefly have seen already other counts ?2
test
?2 test is there evidence of an association
between two discrete variables H0 no
association H1 association
Research Methods Group
60
Example 2. Modelling counts
We could use another kind of probability to
calculate the test statistic
Research Methods Group
61
Example 2. Modelling counts
But now we look at the table in another way. If
we consider the counts in the table as a
variable, we could construct a frequency
distribution.
Research Methods Group
62
Example 2. Modelling counts
  • Long run frequency distribution probability
    distribution
  • We just expanded the binomial distribution into
    the multinomial distribution.
  • Binomial distribution
  • Independent observations
  • p success everywhere the same. The probability
    that an individual observation falls into a
    specific cell of the table is the same for all
    cells.
  • Multinomial observation
  • The number of total observations is fixed.

Research Methods Group
63
Example 2. Modelling counts
If the total number of observations was not
fixed gt Poisson distribution BUT Thanks to a
lot of difficult statistical theory we can also
use the Poisson distribution even if the total
number of observation is not fixed.
Research Methods Group
64
Example 2. Modelling counts
CONCLUSION Even though the context is
important to decide whether we can use the
Poisson distribution to analyse counts
(distribution of rare events) Generally Anal
ysis of multiway contingency tables gt Poisson
distribution logarithm link LOGLINEAR
MODELING
Research Methods Group
65
Example 2. Modelling counts
  • Analysis of counts
  • Often we can use the Poisson distribution
  • But not always

Research Methods Group
66
Example 2. Loglinear modelling

Research Methods Group
67
Example 2. Loglinear modelling
Adding interactions
Research Methods Group
68
Example 2. Loglinear modelling
?2 test

Loglinear modelling
Research Methods Group
69
Example 2. Loglinear modelling
  • Modelling of complex datasets
  • Adding or dropping terms and interactions in the
    model and changing their order
  • Good model (good fit ) when the residual
    deviance becomes almost equal to the number of
    degrees of freedom (or mean deviance 0)
  • At that moment we can assume that the remaining
    residual variability is caused by the random
    variability (noise)
  • Adding too many terms residual deviance gt 0

Research Methods Group
70
Example 2. Loglinear modelling
  • Example lambs.xls

Research Methods Group
Write a Comment
User Comments (0)
About PowerShow.com