Title: Accounting for heterogeneous variances (heteroskedasticity) in genetic evaluations
1Accounting for heterogeneous variances
(heteroskedasticity) in genetic evaluations
Slides available at http//www.msu.edu/tempelma/n
bcec1.pdf
National Animal Breeding Seminar SeriesFall
Semester 2004
- Robert J. Tempelman
- Michigan State University
2A typical genetic evaluation model for
postweaning gain (PWG)
y X1b1 X2b2 Z1u1 Z2u2 e
Fixed effects
Random effects
Random contemporary group effects u1 Var (u1) -gt
autoregressive ys within herds or NIID
Non-genetic effects b1 (age of dam, length of PW
pd, calf sex)
Random additive genetic effects u2 Var(u2) -gt
function of one or more (multibreed) components
Genetic effects b2 (Breed and dominance and
recombination loss effects)
y Xb Zu e
???????
3Homoskedastic error models
e N (0,Ise2)
Common s2e across environments, factors, etc. may
not be a suitable assumption.
4Example of Heterogeneous Variances
- Garrick et al. (1989)
- Separate genetic (s2g) and residual (s2e)
variances estimated by Simmental and sex for
postweaning gain. - Genetic Residual
s2e
s2g
5Structural (mixed effects) modeling of variances
(Foulley et al., 1992)
- model residual and genetic variances as a
function of fixed and random effects - Example Consider the residual variance
unique to fixed calf sex j and random CG k.
Log linear mixed effects model on log variance
Antilog both sides (Multiplicative model)
6First known application of structural variance
model to beef cattle data
- San Cristobal et al. (1993) analyzing muscular
development scores in French Maine Anjou cattle - Scored on 0 to 100 scale.
- Considered structural variance model on both
residual AND genetic variances. - Effects considered
- classifier (random), condition score (fixed),
year (random), month(random) for residual
variance - Sex for genetic variance
7Representative results from San Cristobal
(multiplicative scale)
Factor Level Estimate
Baseline 1 97.57
Classifier 1 1.17
2 1.07
3 1.06
Condition Score 1 1
2 0.74
3 0.65
Year 1 0.98
2 1.14
Month 1 0.94
2 1.02
3 1.00
For example, an animal evaluated by Classifier 2
with condition score 2 born in year 1 and month 2
has a residual variance of 97.57 x1.07 x0.74 x0.
98 x1.02 77.23
8The underlying model for calving ease (1-5 scale)
Colored areas probability of occurence
1 Unassisted calving
5 Caesarean Section
1
2
3
4
5
(l)
9Heterogeneous variances for calving ease (CE)?
- Genetic evaluations based on threshold mixed
effects model. - Underlying liability (l) is typically modeled as
a function of fixed (e.g. calf sex) and random
effects (herd-year-season) IID residual (e)
i.e. - Heteroskedastic theory provided by Foulley and
Gianola (1996) - Demonstrated that statistically significant calf
sex by age of dam interactions for CE in
homoskedastic error threshold models may be an
artifact of heterogeneous residual variances
10ALLOWING FOR HETEROGENEOUS RESIDUAL VARIANCES IN
THRESHOLD MODELS
1
2
3
4
5
Note how probability of extreme outcomes
particularly depend on residual variance
11Genetic evaluations accounting for calving ease
- French Holstein, Normande, and Montbeliarde
breeds (Ducrocq, 2000) - Heteroskedasticity is breed dependent
- 15 lower residual variance in winter versus
summer. - Larger residual variance (1.07-1.18x) for male
calves. - Italian Holsteins (Canavesi et al., 2003)
- Larger residual variance (1.03) for males
- Regional differences for residual variance
- Both evaluations only consider fixed effects
models for residual variances
12Fixed and random effects for log residual
variances in threshold models for calving ease
- Kizilkaya and Tempelman (2005 GSE)
- First parity Italian Piedmontese cattle
Parameter Linear Mixed Model Analysis of Birth Weights Threshold Mixed Model Analysis of Calving Ease
Estimate SE Estimate SE
Sire Variance 1.13 ? 0.20 0.13 ? 0.02
MGS Variance 0.50 ? 0.11 0.02 ? 0.01
Sire-MGS covariance 0.35 ? 0.11 0.02 ? 0.01
CG variance 1.68 ? 0.19 0.13 ? 0.02
Male residual variance 14.44 ? 1.03 1.09 ? 0.09
Female residual variance 10.19 ? 0.73 0.71 ? 0.06
Sex difference in residual variances 4.26 ? 0.53 0.38 ? 0.05
CV for herd-specific variances 0.60 ? 0.09 0.74 ? 0.14
F
R
Fixed effects and Random effects for Residual
Heteroskedasticity
13Estimates ( )of and 95 credible sets ( )
for Herd Specific Variances for CE Relative to
Baseline (1.0)
Note Because sire-mgs model was used, residual
heteroskedasticity may be partly genetic
CV 0.74
14Impact on calving ease EPDs?Heteroskedastic vs.
Homoskedastic Error
15Impact of residual heteroskedasticity across CG
on Sire EPDs for birthweights (Kizilkaya and
Tempelman, 2005)
CV 0.60
Implications of ranking herds for product
uniformity!
Herd 66
Sire A
All of Sires A progeny were from Herd 66
16Multiple Breed Populations
- Might naturally expect heterogeneous genetic
variances (for different breedgroups and
different levels of heterozygosity)
17Multibreed genetic modeling
- Additive model (Lo et al., 1993)
- For any individual j, its additive genetic effect
aj has variance
Expected allelic contribution due to Breed b in
individual/parent j
Additive genetic variance of Breed b
Variance due to genetic segregation between
Breeds b and b
18Simple two breed example
Suppose
P2
P1
F1
Theory used for QTL mapping in pig breed crosses
better power than Haley-Knott regression
(Perez-Enciso and Varona, 2000)
F2
19ApplicationNelore-Hereford data (Fernando
Cardoso PhD)
- Data set
- 22,717 post-weaning gain (PWG) records on
Hereford and Nelore x Hereford calves raised in
Brazil (from 1974-2000) - 40,082 animals (including ancestors in pedigree
file) - Breed compositions of animals with records ranged
from purebred Hereford to 7/8 Nelore - Purebred Herefords and F1s represent 90 of the
data
20(No Transcript)
21But maybe the residual variances are
heterogeneous too!
- Beef cattle performance is recorded across
diverse production systems and environments, with
data quality often compromised by, e.g. - Recording error, preferential treatment, disease,
etc. - Hierarchical model constructions have been
independently used to address - heteroskedasticity (Foulley et al., 1992
SanCristobal et al., 1993) and - robustness to outliers (Stranden and Gianola,
1998, 1999). - Important to discern outliers from high-variance
subclasses
22First stage Specify the Linear Mixed Model
y X1b1 X2b2 Z1u1 Z2u2 e
Fixed effects
Random effects
Non-genetic effects b1 (age of dam, length of PW
period, calf sex)
Random contemporary group effects u1
Genetic effects b2 (Breed additive, dominance
and recombination loss effects)
Random additive genetic effects u2
y Xb Zu e
OR
23Second stage Structural variance model
baseline
Regression parameters
Fixed classification effects
Random classification effects
Lack-of-fit term with mean 0
Breed proportion
EXAMPLES
Breed heterozygosity
Calf sex
CG
24Distributional assumptions on random effects
- Location parameters
- u includes 940 CG (uCG) and 40,082 additive
genetic effects (uA) - uCG N(0,Is2CG)
- uA N(0,G(f)) where f includes breed specific
variances and segregation variances. - Residual variance
- v v1 v2 v940 includes random relative
variances for 940 CG - vi IID Inverted-gamma with mean 1 and standard
deviation sv
25Need to consider one more thing
- Recall
- What about wj?
- Lack-of-fit term
where
26- 1) If wj Gamma(n/2, n/2) then this is
equivalent to specifying - 2) If wj 1 for all j, then
i.e. Student t error Demonstrated to be resistant
to outliers Stranden and Gianola (1998 1999)
Many other options!!! See Rosa et al. (2003)
27Now (At least) four distributional possibilities!
- 2 2 factorial based on distribution (normal
versus Student t) and homoskedastic versus
heteroskedastic residuals - Homoskedastic normal
- Homoskedastic Student t
- Heteroskedastic normal
- Heteroskedastic Student t
28Some results
- Based on Pseudo Bayes Factors (PBF), the Student
t heteroskedastic model provided the best data
fit the homoskedastic normal model the worst
data fit. - The heteroskedastic Student t error model was the
best fit - The posterior mean of the degrees of freedom
parameter (n) was 7.33 0.48 indicating a
heavier tailed residual distribution than normal
(n 8) for PWG data
29Heteroskedastic residual variance results from
Fixed effects
Parameter EST. SE 95PPI
Gender (t1) 1.13 0.09 (0.97, 1.31)
Nelore proportion (g1) 1.15 0.45 (0.48, 2.20)
Heterozygosity (g2) 0.70 0.16 (0.46, 1.06)
CG (sn) 0.72 0.06 (0.62,0.86)
Random effects
Evidence of genetic homeostasis? (Lerner, 1954)
30What do these estimates mean again?
- Example a male F1 calf in a herd (Herd 5) with
above average variability ( ) - Nelore proportion
- Heterozygosity
- Estimated residual variability
31Posterior densities of heritabilities under
homoskedastic normal error model
Cardoso and Tempelman, 2004
32Posterior densities of heritabilities under
heteroskedastic normal error model
Some of most variable herds were exclusively
Herefords
Why the flip flop from homoskedastic normal
error? -gtSome of most variable herds were
exclusively Herefords
Why the flip flop
Posterior densities look very similar under
Student t heteroskedastic
33Where do we go from here?
- Genetic evaluation for residual variability?
- Relevance Uniformity of product premium.
- San Cristobal-Gaudy et al. (1998, 2001) Sorensen
and Waagepeterson (2003)
A numerator relationship matrix r genetic
correlation between location and log variance
effects
34Litter size in sheep (San Cristobal et al., 2003)
For litter size in pigs, a negative was
estimated (Sorensen and Waagespeterson, 2003)
Sire EPD for litter size variability (v)
r
Sire EPD for litter size (u)
35Multiple trait analysis?
- The standard for genetic evaluations today
- Perhaps genetic covariances/correlations between
traits are heterogeneous across environments too.
- Hopefully, these issues will be investigated
further.
36References
- Cardoso, F.F., and R.J. Tempelman. 2004.
Hierarchical Bayes multiple-breed inference with
an application to genetic evaluation of a
Nelore-Hereford population. Journal of Animal
Science 821589-1601. - Canavesi F., Biffani S., Samore A.B., Revising
the genetic evaluation for calving ease in the
Italian Holstein Friesian. Interbull Bulletin 30
(2003) 82-85 http//www-interbull.slu.se/bulletin
s/framesida-pub.htm. - Ducrocq V., Calving ease evaluation of French
dairy bulls with a heteroskedastic threshold
model with direct and maternal effects, Interbull
Bulletin 30 (2000) 82-85 http//www-interbull.slu
.se/bulletins/framesida-pub.htm. - Foulley, J.L. 1997. ECM approaches to
heteroskedastic mixed models with constant
variance ratios. Genetics, Selection, Evolution
29297-315. - Foulley, J. L., M. S. Cristobal, D. Gianola, and
S. Im. 1992. Marginal likelihood and Bayesian
approaches to the analysis of heterogeneous
residual variances in mixed linear Gaussian
models. Computational Statistics Data Analysis
13 291-305. - Foulley J.L., Gianola D., Statistical analysis of
ordered categorical data via a structural
heteroskedastic threshold model, Genetics
Selection Evolution 28 (1996) 249-273. - Garrick, D.J., E.J. Pollak, R.L. Quaas, and L.D.
Van Vleck. 1989. Variance heterogeneity in
direct and maternal weight traits by sex and
percent purebred for Simmental-sired calves.
Journal of Animal Science 67 2515-2528. - Kachman, S.D. and R.W. Everett. 1993. A
multiplicative model when the variances are
heterogeneous. Journal of Dairy Science
76859-867. - Kizilkaya, K., and R.J. Tempelman. 2005. A
general approach to mixed effects modeling of
residual variances in generalized linear mixed
models. Genetics, Selection, Evolution (in
press) - Lo, L. L., R. L. Fernando, and M. Grossman. 1993.
Covariance between relatives in multibreed
populations - additive-model. Theoretical and
Applied Genetics 87 423-430. - Mark, T. 2004. Applied genetic evaluations for
production and functional traits in dairy cattle.
Journal of Dairy Science 87 2641-2652. - Meuwissen, T.H.E., G. DeJong, and B. Engel. 1996.
Joint estimation of breeding values and
heterogeneous variances of large data files.
Journal of Dairy Science 79310-316. - Perez-Enciso, M., and L. Varona. 2000.
Quantitative Trait Loci Mapping in F2 Crosses
Between Outbred Lines. Genetics 155391-405.
37References (contd)
- Robinson G.K., 1991. That BLUP is a good thing -
the estimation of random effects, Statistical
Science 6 15-51. - Robert-Granie, C., B. Bonati, D. Boichard, and A.
Barbat. 1999. Accounting for variance
heterogeneity in French dairy cattle genetic
evaluation. Livestock Production Science 60
343-357. - Robert-Granie, C. B. Heude, and J.L. Foulley.
2002. Modeling the growth curve of Maine-Anjou
beef cattle using heteroskedastic random
coefficients models. Genetics, Selection,
Evolution 43423-445. - Rodriguez-Almeida, F. A., L. D. Vanvleck, L. V.
Cundiff, and S. D. Kachman. 1995. Heterogeneity
of variance by sire breed, sex, and dam breed in
200-day and 365-day weights of beef-cattle from a
top cross experiment. Journal of Animal Science
73 2579-2588. - Rosa, G. J. M., C. R. Padovani, and D. Gianola.
2003. Robust linear mixed models with
normal/independent distributions and Bayesian
mcmc implementation. Biometrical Journal 45
573-590. - San Cristobal, M., J. L. Foulley, and E.
Manfredi. 1993. Inference about multiplicative
heteroskedastic components of variance in a mixed
linear gaussian model with an application to
beef-cattle breeding. Genetics Selection
Evolution 25 3-30. - San Cristobal-Gaudy, J.M. Elsen, L. Bodin, and
C.Chevalet. 1998. Prediction of the response to
a selection for canalisation of a continuous
trait in animal breeding. Genetics, Selection,
Evolution 30 423-451. - San Cristobal-Gaudy, M., Bodin, L., Elsen, J-.M.,
Chevalet, C. 2001. Genetic components of litter
size variability in sheep, Genetics Selection
Evolution 33 249-271 - Sorensen D.A., Waagepetersen R., 2003. Normal
linear models with genetically structured
residual heterogeneity a case study. Genetical
Research Cambr. 82 207-222. - Stranden, I. and D. Gianola. 1998. Attenuating
effects of preferential treatment with Student t
mixed linear models A simulation study.
Genetics, Selection, Evolution 30 565-583. - Stranden, I. and D. Gianola, 1999. Mixed effects
linear models with t-distributions for
quantitative genetic analysis A Bayesian
approach. Genetics, Selection, Evolution
3125-42.