Accounting for heterogeneous variances (heteroskedasticity) in genetic evaluations - PowerPoint PPT Presentation

About This Presentation
Title:

Accounting for heterogeneous variances (heteroskedasticity) in genetic evaluations

Description:

Accounting for heterogeneous variances (heteroskedasticity) ... Variance due to genetic segregation between Breeds b and b' 8/11/09. 18. Simple two breed example ... – PowerPoint PPT presentation

Number of Views:296
Avg rating:3.0/5.0
Slides: 38
Provided by: robert1265
Learn more at: https://www.nbcec.org
Category:

less

Transcript and Presenter's Notes

Title: Accounting for heterogeneous variances (heteroskedasticity) in genetic evaluations


1
Accounting for heterogeneous variances
(heteroskedasticity) in genetic evaluations
Slides available at http//www.msu.edu/tempelma/n
bcec1.pdf
National Animal Breeding Seminar SeriesFall
Semester 2004
  • Robert J. Tempelman
  • Michigan State University

2
A typical genetic evaluation model for
postweaning gain (PWG)
y X1b1 X2b2 Z1u1 Z2u2 e
Fixed effects
Random effects
Random contemporary group effects u1 Var (u1) -gt
autoregressive ys within herds or NIID
Non-genetic effects b1 (age of dam, length of PW
pd, calf sex)
Random additive genetic effects u2 Var(u2) -gt
function of one or more (multibreed) components
Genetic effects b2 (Breed and dominance and
recombination loss effects)
y Xb Zu e
???????
3
Homoskedastic error models
e N (0,Ise2)
Common s2e across environments, factors, etc. may
not be a suitable assumption.
4
Example of Heterogeneous Variances
  • Garrick et al. (1989)
  • Separate genetic (s2g) and residual (s2e)
    variances estimated by Simmental and sex for
    postweaning gain.
  • Genetic Residual

s2e
s2g
5
Structural (mixed effects) modeling of variances
(Foulley et al., 1992)
  • model residual and genetic variances as a
    function of fixed and random effects
  • Example Consider the residual variance
    unique to fixed calf sex j and random CG k.

Log linear mixed effects model on log variance
Antilog both sides (Multiplicative model)
6
First known application of structural variance
model to beef cattle data
  • San Cristobal et al. (1993) analyzing muscular
    development scores in French Maine Anjou cattle
  • Scored on 0 to 100 scale.
  • Considered structural variance model on both
    residual AND genetic variances.
  • Effects considered
  • classifier (random), condition score (fixed),
    year (random), month(random) for residual
    variance
  • Sex for genetic variance

7
Representative results from San Cristobal
(multiplicative scale)
Factor Level Estimate
Baseline 1 97.57
Classifier 1 1.17
2 1.07
3 1.06
Condition Score 1 1
2 0.74
3 0.65
Year 1 0.98
2 1.14
Month 1 0.94
2 1.02
3 1.00
For example, an animal evaluated by Classifier 2
with condition score 2 born in year 1 and month 2
has a residual variance of 97.57 x1.07 x0.74 x0.
98 x1.02 77.23
8
The underlying model for calving ease (1-5 scale)
Colored areas probability of occurence
1 Unassisted calving
5 Caesarean Section
1
2
3
4
5
(l)
9
Heterogeneous variances for calving ease (CE)?
  • Genetic evaluations based on threshold mixed
    effects model.
  • Underlying liability (l) is typically modeled as
    a function of fixed (e.g. calf sex) and random
    effects (herd-year-season) IID residual (e)
    i.e.
  • Heteroskedastic theory provided by Foulley and
    Gianola (1996)
  • Demonstrated that statistically significant calf
    sex by age of dam interactions for CE in
    homoskedastic error threshold models may be an
    artifact of heterogeneous residual variances

10
ALLOWING FOR HETEROGENEOUS RESIDUAL VARIANCES IN
THRESHOLD MODELS
1
2
3
4
5
Note how probability of extreme outcomes
particularly depend on residual variance
11
Genetic evaluations accounting for calving ease
  • French Holstein, Normande, and Montbeliarde
    breeds (Ducrocq, 2000)
  • Heteroskedasticity is breed dependent
  • 15 lower residual variance in winter versus
    summer.
  • Larger residual variance (1.07-1.18x) for male
    calves.
  • Italian Holsteins (Canavesi et al., 2003)
  • Larger residual variance (1.03) for males
  • Regional differences for residual variance
  • Both evaluations only consider fixed effects
    models for residual variances

12
Fixed and random effects for log residual
variances in threshold models for calving ease
  • Kizilkaya and Tempelman (2005 GSE)
  • First parity Italian Piedmontese cattle

Parameter Linear Mixed Model Analysis of Birth Weights Threshold Mixed Model Analysis of Calving Ease
Estimate SE Estimate SE
Sire Variance 1.13 ? 0.20 0.13 ? 0.02
MGS Variance 0.50 ? 0.11 0.02 ? 0.01
Sire-MGS covariance 0.35 ? 0.11 0.02 ? 0.01
CG variance 1.68 ? 0.19 0.13 ? 0.02
Male residual variance 14.44 ? 1.03 1.09 ? 0.09
Female residual variance 10.19 ? 0.73 0.71 ? 0.06
Sex difference in residual variances 4.26 ? 0.53 0.38 ? 0.05
CV for herd-specific variances 0.60 ? 0.09 0.74 ? 0.14

F
R
Fixed effects and Random effects for Residual
Heteroskedasticity
13
Estimates ( )of and 95 credible sets ( )
for Herd Specific Variances for CE Relative to
Baseline (1.0)
Note Because sire-mgs model was used, residual
heteroskedasticity may be partly genetic
CV 0.74
14
Impact on calving ease EPDs?Heteroskedastic vs.
Homoskedastic Error
15
Impact of residual heteroskedasticity across CG
on Sire EPDs for birthweights (Kizilkaya and
Tempelman, 2005)
CV 0.60
Implications of ranking herds for product
uniformity!
Herd 66
Sire A
All of Sires A progeny were from Herd 66
16
Multiple Breed Populations
  • Might naturally expect heterogeneous genetic
    variances (for different breedgroups and
    different levels of heterozygosity)

17
Multibreed genetic modeling
  • Additive model (Lo et al., 1993)
  • For any individual j, its additive genetic effect
    aj has variance

Expected allelic contribution due to Breed b in
individual/parent j
Additive genetic variance of Breed b
Variance due to genetic segregation between
Breeds b and b
18
Simple two breed example
Suppose
P2
P1
F1
Theory used for QTL mapping in pig breed crosses
better power than Haley-Knott regression
(Perez-Enciso and Varona, 2000)
F2
19
ApplicationNelore-Hereford data (Fernando
Cardoso PhD)
  • Data set
  • 22,717 post-weaning gain (PWG) records on
    Hereford and Nelore x Hereford calves raised in
    Brazil (from 1974-2000)
  • 40,082 animals (including ancestors in pedigree
    file)
  • Breed compositions of animals with records ranged
    from purebred Hereford to 7/8 Nelore
  • Purebred Herefords and F1s represent 90 of the
    data

20
(No Transcript)
21
But maybe the residual variances are
heterogeneous too!
  • Beef cattle performance is recorded across
    diverse production systems and environments, with
    data quality often compromised by, e.g.
  • Recording error, preferential treatment, disease,
    etc.
  • Hierarchical model constructions have been
    independently used to address
  • heteroskedasticity (Foulley et al., 1992
    SanCristobal et al., 1993) and
  • robustness to outliers (Stranden and Gianola,
    1998, 1999).
  • Important to discern outliers from high-variance
    subclasses

22
First stage Specify the Linear Mixed Model
y X1b1 X2b2 Z1u1 Z2u2 e
Fixed effects
Random effects
Non-genetic effects b1 (age of dam, length of PW
period, calf sex)
Random contemporary group effects u1
Genetic effects b2 (Breed additive, dominance
and recombination loss effects)
Random additive genetic effects u2
y Xb Zu e
OR
23
Second stage Structural variance model
baseline
Regression parameters
Fixed classification effects
Random classification effects
Lack-of-fit term with mean 0
Breed proportion
EXAMPLES
Breed heterozygosity
Calf sex
CG
24
Distributional assumptions on random effects
  • Location parameters
  • u includes 940 CG (uCG) and 40,082 additive
    genetic effects (uA)
  • uCG N(0,Is2CG)
  • uA N(0,G(f)) where f includes breed specific
    variances and segregation variances.
  • Residual variance
  • v v1 v2 v940 includes random relative
    variances for 940 CG
  • vi IID Inverted-gamma with mean 1 and standard
    deviation sv

25
Need to consider one more thing
  • Recall
  • What about wj?
  • Lack-of-fit term

where
26
  • 1) If wj Gamma(n/2, n/2) then this is
    equivalent to specifying
  • 2) If wj 1 for all j, then

i.e. Student t error Demonstrated to be resistant
to outliers Stranden and Gianola (1998 1999)
Many other options!!! See Rosa et al. (2003)
27
Now (At least) four distributional possibilities!
  • 2 2 factorial based on distribution (normal
    versus Student t) and homoskedastic versus
    heteroskedastic residuals
  • Homoskedastic normal
  • Homoskedastic Student t
  • Heteroskedastic normal
  • Heteroskedastic Student t

28
Some results
  • Based on Pseudo Bayes Factors (PBF), the Student
    t heteroskedastic model provided the best data
    fit the homoskedastic normal model the worst
    data fit.
  • The heteroskedastic Student t error model was the
    best fit
  • The posterior mean of the degrees of freedom
    parameter (n) was 7.33  0.48 indicating a
    heavier tailed residual distribution than normal
    (n 8) for PWG data

29
Heteroskedastic residual variance results from
Fixed effects
Parameter EST. SE 95PPI
Gender (t1) 1.13 0.09 (0.97, 1.31)
Nelore proportion (g1) 1.15 0.45 (0.48, 2.20)
Heterozygosity (g2) 0.70 0.16 (0.46, 1.06)
CG (sn) 0.72 0.06 (0.62,0.86)
Random effects
Evidence of genetic homeostasis? (Lerner, 1954)
30
What do these estimates mean again?
  • Example a male F1 calf in a herd (Herd 5) with
    above average variability ( )
  • Nelore proportion
  • Heterozygosity
  • Estimated residual variability

31
Posterior densities of heritabilities under
homoskedastic normal error model
Cardoso and Tempelman, 2004
32
Posterior densities of heritabilities under
heteroskedastic normal error model
Some of most variable herds were exclusively
Herefords
Why the flip flop from homoskedastic normal
error? -gtSome of most variable herds were
exclusively Herefords
Why the flip flop
Posterior densities look very similar under
Student t heteroskedastic
33
Where do we go from here?
  • Genetic evaluation for residual variability?
  • Relevance Uniformity of product premium.
  • San Cristobal-Gaudy et al. (1998, 2001) Sorensen
    and Waagepeterson (2003)

A numerator relationship matrix r genetic
correlation between location and log variance
effects
34
Litter size in sheep (San Cristobal et al., 2003)
For litter size in pigs, a negative was
estimated (Sorensen and Waagespeterson, 2003)
Sire EPD for litter size variability (v)
r
Sire EPD for litter size (u)
35
Multiple trait analysis?
  • The standard for genetic evaluations today
  • Perhaps genetic covariances/correlations between
    traits are heterogeneous across environments too.
  • Hopefully, these issues will be investigated
    further.

36
References
  • Cardoso, F.F., and R.J. Tempelman. 2004.
    Hierarchical Bayes multiple-breed inference with
    an application to genetic evaluation of a
    Nelore-Hereford population. Journal of Animal
    Science 821589-1601.
  • Canavesi F., Biffani S., Samore A.B., Revising
    the genetic evaluation for calving ease in the
    Italian Holstein Friesian. Interbull Bulletin 30
    (2003) 82-85 http//www-interbull.slu.se/bulletin
    s/framesida-pub.htm.
  • Ducrocq V., Calving ease evaluation of French
    dairy bulls with a heteroskedastic threshold
    model with direct and maternal effects, Interbull
    Bulletin 30 (2000) 82-85 http//www-interbull.slu
    .se/bulletins/framesida-pub.htm.
  • Foulley, J.L. 1997. ECM approaches to
    heteroskedastic mixed models with constant
    variance ratios. Genetics, Selection, Evolution
    29297-315.
  • Foulley, J. L., M. S. Cristobal, D. Gianola, and
    S. Im. 1992. Marginal likelihood and Bayesian
    approaches to the analysis of heterogeneous
    residual variances in mixed linear Gaussian
    models. Computational Statistics Data Analysis
    13 291-305.
  • Foulley J.L., Gianola D., Statistical analysis of
    ordered categorical data via a structural
    heteroskedastic threshold model, Genetics
    Selection Evolution 28 (1996) 249-273.
  • Garrick, D.J., E.J. Pollak, R.L. Quaas, and L.D.
    Van Vleck. 1989. Variance heterogeneity in
    direct and maternal weight traits by sex and
    percent purebred for Simmental-sired calves.
    Journal of Animal Science 67 2515-2528.
  • Kachman, S.D. and R.W. Everett. 1993. A
    multiplicative model when the variances are
    heterogeneous. Journal of Dairy Science
    76859-867.
  • Kizilkaya, K., and R.J. Tempelman. 2005. A
    general approach to mixed effects modeling of
    residual variances in generalized linear mixed
    models. Genetics, Selection, Evolution (in
    press)
  • Lo, L. L., R. L. Fernando, and M. Grossman. 1993.
    Covariance between relatives in multibreed
    populations - additive-model. Theoretical and
    Applied Genetics 87 423-430.
  • Mark, T. 2004. Applied genetic evaluations for
    production and functional traits in dairy cattle.
    Journal of Dairy Science 87 2641-2652.
  • Meuwissen, T.H.E., G. DeJong, and B. Engel. 1996.
    Joint estimation of breeding values and
    heterogeneous variances of large data files.
    Journal of Dairy Science 79310-316.
  • Perez-Enciso, M., and L. Varona. 2000.
    Quantitative Trait Loci Mapping in F2 Crosses
    Between Outbred Lines. Genetics 155391-405.

37
References (contd)
  • Robinson G.K., 1991. That BLUP is a good thing -
    the estimation of random effects, Statistical
    Science 6 15-51.
  • Robert-Granie, C., B. Bonati, D. Boichard, and A.
    Barbat. 1999. Accounting for variance
    heterogeneity in French dairy cattle genetic
    evaluation. Livestock Production Science 60
    343-357.
  • Robert-Granie, C. B. Heude, and J.L. Foulley.
    2002. Modeling the growth curve of Maine-Anjou
    beef cattle using heteroskedastic random
    coefficients models. Genetics, Selection,
    Evolution 43423-445.
  • Rodriguez-Almeida, F. A., L. D. Vanvleck, L. V.
    Cundiff, and S. D. Kachman. 1995. Heterogeneity
    of variance by sire breed, sex, and dam breed in
    200-day and 365-day weights of beef-cattle from a
    top cross experiment. Journal of Animal Science
    73 2579-2588.
  • Rosa, G. J. M., C. R. Padovani, and D. Gianola.
    2003. Robust linear mixed models with
    normal/independent distributions and Bayesian
    mcmc implementation. Biometrical Journal 45
    573-590.
  • San Cristobal, M., J. L. Foulley, and E.
    Manfredi. 1993. Inference about multiplicative
    heteroskedastic components of variance in a mixed
    linear gaussian model with an application to
    beef-cattle breeding. Genetics Selection
    Evolution 25 3-30.
  • San Cristobal-Gaudy, J.M. Elsen, L. Bodin, and
    C.Chevalet. 1998. Prediction of the response to
    a selection for canalisation of a continuous
    trait in animal breeding. Genetics, Selection,
    Evolution 30 423-451.
  • San Cristobal-Gaudy, M., Bodin, L., Elsen, J-.M.,
    Chevalet, C. 2001. Genetic components of litter
    size variability in sheep, Genetics Selection
    Evolution 33 249-271
  • Sorensen D.A., Waagepetersen R., 2003. Normal
    linear models with genetically structured
    residual heterogeneity a case study. Genetical
    Research Cambr. 82 207-222.
  • Stranden, I. and D. Gianola. 1998. Attenuating
    effects of preferential treatment with Student t
    mixed linear models A simulation study.
    Genetics, Selection, Evolution 30 565-583.
  • Stranden, I. and D. Gianola, 1999. Mixed effects
    linear models with t-distributions for
    quantitative genetic analysis A Bayesian
    approach. Genetics, Selection, Evolution
    3125-42.
Write a Comment
User Comments (0)
About PowerShow.com