Title: Missing Heritability
1Instructions for use
ICAR-Indian Agricultural Statistical Research
Institute
Missing Heritability
Shashank kshandakar Ph.D.(Agricultural
Statistics), Roll No.10684
2Contents
- Introduction
- Dissolving the of Missing Heritability Problem
- Estimation of Heritability from Common Variants
- HasemanElston Regression Method
- Heritability from Case-control Study
- Liability Threshold Model
- Illustration
- Conclusions
- References
3Introduction
- Heritability is a key genetic parameter that can
help to understand the genetic architecture of
complex traits. - Heritability (in narrow sense) defined as the
proportion of total phenotypic variation that is
due to additive genetic factors - In GWAS, statistical significance variants
explain only a small fraction of the heritability
and the heritability estimates obtained in GWAS
are much lower than those of traditional
quantitative methods.
4Introduction
- Estimation and knowledge of missing heritability
is important because disease susceptibility is
known to be due to genetic factors, and
understanding this genetic variation may
contribute to better prevention, diagnosis and
treatment of disease. - Knowledge of Missing heritability help in
planning of research strategies to uncover the
genetic risk factors (Maher, 2008)
5Heritability and number of loci for several traits (Manolio et al.? 2009) Heritability and number of loci for several traits (Manolio et al.? 2009) Heritability and number of loci for several traits (Manolio et al.? 2009)
Disease Number of loci Proportion of heritability explained (GWASs)
ARMD 5 50
Crohns disease 32 20
Fasting glucose 4 1.5
HDL cholesterol 7 5.2
Height 40 5
Myocardial infarction 9 2.8
Type 2 diabetes 18 6
6Probable Reasons for Missing heritability
- Nadeau (2009), pointing that the missing
heritability is due to epigenetic factors - Epigenetics is the study of heritable changes in
gene expression that are not caused by changes in
DNA sequence - Epigenetic modifications are influenced by the
environmental factors, such as smoking, stress,
and nutrients - Trans-generational epigenetic inheritance
that contribute to disease risk would not be
detectable in GWAS but may contribute to average
risk and to similarities among relatives.
(Richards, 2006)
7Probable Reasons for Missing heritability
- Manolio et al. (2009) termed Missing Heritability
as a dark matter of GWAS, dark matter in the
sense that one is sure it exists, can detect its
influence, but simply cannot see it (yet) - Kong (2009), suggest that the rare variants
contribute to missing heritability in two ways
first, they are more difficult to discover and
second, even if discovered, their contribution to
heritability would be underestimated when
evaluated under models that do not take parental
origin into account
8Probable Reasons for Missing heritability
- Eichler et al. (2010), explore the impact of
large variants (deletions, duplications and
inversions) that are individually rare but
collectively common in the population - Rare variants that can potentially affect many
genes and various biological pathways in an
individual organism are inaccessible by most
existing genotyping and sequencing technologies
(Zuk et al. 2014)
9Dissolving the Missing Heritability Problem by
Traditional Method
- Heritability estimation from family study
- Twin study
- Parents-offspring regression
- Heritability estimation from GWAS study
10Estimation of Missing Heritability
- In quantitative genetics, we assume that there is
absence of gene-environment interaction and
correlation (Falconer et al. 1996) - VP VG VE VG VA VD VI
- H2 VG / VP h2 VA / VP
- where VP is the phenotypic variance of a
population - VG is the genotypic variance
- VE is the environmental variance
- VA is the additive genetic variance
- VD is the dominance variance
- VI is the epistasis variance
11Estimation
- Cov (P1, P2) Cov (A1D1I1E1, A2D2I2E2)
- Cov (A1, A2) Cov (D1, D2)
Cov (I1, I2) Cov (E1, E2) - Twin studies is the most commonly used
traditional methods for estimating heritability
(Pierrick et al. 2016). - Twin studies are a special type of
epidemiological studies designed to measure the
contribution of gene as opposed to the
environment, for a given trait. - Monozygotic and dizygotic twins share almost 100
and 50 of their genetic material respectively. -
12Estimation
- The environment is typically divided into
- Shared Environment (C) - the part of the
environment that affects both twins in the same
way - Unique Environment (U) - the part of the
environment that affects one twin but not the
other - (Silventoinen et al. 2003)
- In the absence of interaction and correlation
between C and U, we have - E C U
13Estimation
- Assuming epistasis effects to be negligible
(assumption in twin studies), then - Cov (PT1, PT2) Cov (AT1DT1CT1UT1 ,
AT2DT2CT2UT2) - Cov (AT1, AT2) Cov (DT1, DT2) Cov (CT1, CT2)
Cov (UT1, UT2) - where indexes T1 and T2 represent the two
twins for each twin pair studied - Cov (UT1, UT2) is zero for both monozygotic and
dizygotic twins as each twins unique environment
by definition is independent of that of the other
twin.
14Estimation
- Variance is a special case of covariance when the
two variables are identical, and that for
monozygotic twins AT1, DT1, and CT1 equal to AT2,
DT2, and CT2 respectively, then - CovMT (PT1, PT2) VAVDVC
CovDT (PT1, PT2) 1 2 VA 1 4 VD
VC - ?? ???? 2 2 ?????? ???? ?? ??1, ??
??2 -2 ( ?????? ???? ?? ??1, ?? ??2 ??
?? ?? ?? ?? ?? 3 2 ?? ?? ?? ?? - where ?? ???? 2 is the broad-sense heritability
from twin studies, because the resulting estimate
provides an accurate estimate of neither H2 nor
h2, although it is closer to H2 than to h2
(Falconer and Mackay, 1996).
15Estimation
- The covariance between the traits of parents (one
or the mean of both) and the mean of their
offspring (Falconer and Mackay 1996) - Cov (PP, PO) Cov (AP DP IP EP, AO DO
IO EO) - Cov(AP, AO) Cov(DP, DO)
Cov(IP, IO) Cov(EP, EO) - Doolittle, 2012 assumes that Cov (DP, DO) and Cov
(EP, EO) are zero - Environments experienced by individuals are
likely to be more similar within a family line,
so Cov(EP,EO) might be some value (Guo et al.
2014)
16Estimation
- Covariance of the parents and their offspring is
equal to half of additive genetic variance, and a
variance term representing effects due to
dominance and similarities between environments - Cov (PP,PO) Cov (AP,AO) Cov (DP,DO) Cov
(EP,EO) 1 2 VA VDEC - ?? ???? 2 2 ?????? ( ?? ??, ?? ??
) ?? ?? ?? ?? ?? ?? ?? ??????
?? ?? - Heritability estimates in both twin studies and
parent-offspring regression include an extra term
when compared to h2, but they do not correspond
to H² - ?? 2 h 2 h ????h???? 2
- where h ????h???? 2 is the part of heritability
contributed by the extra component(s)
representing non-additive variance.
17Estimation
- Some epigenetic factors can lead to additive
genetic effects (Pierrick et al. 2017), the
additive variance of them ( ?? ?? ?????? )
should be added to the additive variance of DNA
sequences ( ?? ?? ?????? ) to obtain VA,
assuming there is no interaction between ?? ??
?????? and ?? ?? ?????? then, - VA ?? ?? ?????? ?? ?? ??????
- h2 ?? ?? ?????? ?? ?? ?? ??
?????? ?? ?? - h ?????? 2 h ?????? 2
- h ?????? 2 h2 - h ?????? 2
-
18Estimation
- Missing heritability (MH) equals to difference
between the estimates obtained by traditional
quantitative methods (H2) and the estimates
obtained by GWAS ( h ?????? 2 ). Thus, - MH H2 - h ?????? 2
- h ????h???? 2 h ?????? 2
- Missing heritability results from the part of
heritability originating from epigenetic factors
stably transmitted across generations, plus the
part of heritability originatingfrom
non-additives factors.
19Estimation of Heritability from Common Variants
20Estimation of heritability from common variants
- SNPs identified by GWAS explain only a small
fraction of the heritability - Genome-wide complex trait analysis (Yang et al.
2011) estimates the variance explained by all the
SNPs to solve the missing heritability problem. - The basic concept behind GCTA, is to fit the
effects of all the SNPs as a random effects by
using linear mixed model (Hayes et al. 2009) and
H-E regression method (Haseman et al. 1972)
21HasemanElston regression method
- The unbiased estimator of ?? ?? 2 is provided
by HasemanElston regression method - Let ,Yj (x1j-x2j)2 be the squared pair
difference for jth sib pair - x1jµ g1j e1j x2j µ g2j e2j
- gij a, d, -a for BB, Bb and bb individuals
- ?? ?? (0, 1 2 ???? 1) is the proportion of
gene IBD for jth sib pair - Conditional expectation of the squared pair
differences - E(Yj ?? ?? ) (?? ?? 2 2?? ?? 2 ) -(2?? ??
2 ) ?? ?? a ß ?? ?? - where a ?? ?? 2 2?? ?? 2 ß -2?? ?? 2
?? ?? 2 -ß/2
22Estimation of heritability from common variants
- In GWAS the associations between individual SNPs
and the trait are represented by following simple
regression model - yj µ xijai ej
- ejN(0, ?? ?? 2 )
- where yj is the phenotypic value of the jth
individual µ is the general mean ai is the
allele substitution effect of ith SNP xij is an
indicator variable that takes a value of 0, 1 or
2 if the genotype of the j th individual at ith
SNP is bb, Bb or BB respectively and ej is the
residual effect.
23Estimation of heritability from common variants
- Let, m causal variants are genotyped, then the
model is - yj µ gj ej and gj ??1 ?? ?? ????
?? ?? - where gj is the total genetic effect of jth
individual m is the number of causal loci ui is
the additive effect of the ith causal variant
zij is the design matrix allocating casual allele
to trait - -2pi / 2?? ?? (1-?? ?? )
if the genotype of the jth - zij
individual at ith locus is qq - (1-2pi) / 2?? ?? (1-?? ?? )
Qq - 2(1-pi ) / 2?? ?? (1-?? ?? )
QQ
24Estimation of heritability from common variants
- In matrix notation,
- y µ1 g e and g Zu
- variance-covariance matrix of y (the vector of
observations) can be expressed as - var(y) ZZ ?? ?? 2 I ?? ?? 2 ????' ??
?? 2 ?? I ?? ?? 2 G ?? ?? 2 I ?? ?? 2 - u N(0, I ?? ?? 2 ) gj N(0, ?? ?? 2 m ??
?? 2 ) - where ?? ?? 2 is the variance of causal
(random) effects ?? ?? 2 is the variance of
total additive genetic effects I is an n x n
identity matrix, G is the genetic relationship
matrix between pairs of individuals at causal
loci.
25Estimation of heritability from common variants
- The number and positions of the causal variants
are exactly not known, so G matrix is not
directly obtained - The Genome-wide relationship matrix (A) is
obtained from a genome-wide sample of SNPs - A ?? ?? ' ?? wij ?? ????
-???? ?? ?? ?? ?? (??- ?? ?? ) - where W is a standardized genotype matrix with
the ijth element xij is the number of copies of
the allele for the ith SNP of the jth individual
and pi is the frequency of the allele
26Estimation of heritability from common variants
- The Genome-wide relationship matrix (A) between
individual j and k can be estimated by the
following equation - Ajk 1 ?? ??1 ?? ( ?? ???? -2?? ??
)( ?? ???? -2?? ?? ) 2?? ?? (1-?? ?? )
when j ? k - 1 1 N i1 N ?? ???? 2 - 1
2p i x ik 2?? ?? 2 ) 2p i (1-p i )
when jk - Gjk is not known, so to fit model and estimate
the genetic variance ( ?? ?? 2 ), A is used
i.e. estimate of relationship matrix based on the
Genome-wide relationship matrix (A) .
27Estimation of heritability from common variants
- Randomly sample 2N SNPs from all the SNPs across
the genome and randomly split them into two
groups (N SNPs in each group). - Calculate Ajk using all the SNPs in the first
group. - Calculate Gjk using SNPs with MAF ? in the
second group - Regress Gjk on Ajk for j k (use Gjk - 1 and Ajk
- 1 when j k). the regression coefficient is - ß ?????? ( ?? ???? , ?? ???? ) ??????( ?? ????
) - Repeat the procedure using different numbers of
SNPs -
28Estimation of heritability from common variants
- Yjk (z1j-z2j)2 squared z-score difference
between individual - Gjk is not known we replace it by an estimate A
jk such that - E(Gjk A jk) A jk
- E(Yjk) E(a ßGjk) a ß A jk
- Yjk is plotted against the A jk i.e. regression
of Yjk on A jk -2 ?? ?? 2 - ?? ?? 2 - ß/2
- The relationship at causal loci is predicted with
error by the observed SNPs, and the error is c
1/N - ?? 1- (?? 1 ?? ) ??????( ?? ???? )
29Estimation of Heritability from Case-control
Study
30Liability Threshold Model
- Liability describe all the genetic and
environmental factors that contribute to the
development of a multi-factorial disorder - The level of liability at which we distinguish
population into case or control is referred as
the threshold level.
31Liability Threshold Model
- Liability is best represented as a standard
normal distribution curve as most individual who
is affected or unaffected will possess some
degree of liability - li gi ei
- where li is an unknown liability of ith
individual and a person is assumed to be a case
if his liability exceeds a threshold t - gi is a genetic random effect, which can be
correlated across individuals - ei is the environmental random effect, which is
assumed to be independent of each other and of
the genetic effects.
32Estimation of heritability from case-control study
- The advantages of working on the scale of
liability are that the, population parameters
such as variance components and heritability are
independent of prevalence - l µ1N g e
- where lN(0, 1)and g N(0, ?? ?? 2 )
- Mean of the distribution of liability is zero (µ
0) when there is no ascertainment - The total phenotypic variance ( ?? ?? ?? ) on the
scale of liability is per definition equal to 1 - The heritability on the liability scale is
- ?? ?? ?? ?? ?? ??
33Estimation of heritability from case-control
study
- Applying the properties of truncated normal
distributions, the mean liability is - i E(ly1)z/K for case
and - i2 E(ly0) -z/(1-K)
for control - Squared mean liability
- E(l2y1)1it for case
- E(l2y0) 1i2t for
control - The covariance between y (unaffected/affected
status) and l (liability) to describe the
relationship between the phenotypes on the two
scales - Cov(y,l) E(y.l)- E(y)E(l) K1i (1-K)0i2 Ki
z
34Estimation of heritability from case-control
study
- The genetic value on the observed 01 risk scale
for an individual (u), defined in Equation , as - u c bg
- where c is a constant, The linear regression
coefficient (b) that links the two scales is
derived from the regression of the phenotype on
the observed scale (y) on the additive genetic
effect on the scale of liability (g) - b cov(y, g) / ?? ?? 2 E(y.g) -
E(y)E(g)/ h ?? 2 - Ki h ?? 2 / h ?? 2
- z
- u c bg c zg
- ?? ?? 2 var(zg) z2 ?? ?? 2
35Estimation of heritability from case-control
study
- The proportion of the total variance of 01
observations, which is the Bernoulli distribution
variance K(1 - K) and can be written as - h ?? 2 ?? ?? 2 /
K(1-K) ?? ?? 2 cov (y, g)/ ?? ?? 2
2/K(1-K) - s g 2 b2/K(1-K) h l 2 z2/K(1-K)
- h l 2 h ?? 2 K(1-K)
/z2 - The mean of the estimated genetic values is
- E( ?? y 1) zi s g 2 for case
- E( ?? y 0) z ?? 2 s g 2 for control
-
36Selection probabilities
- When the study is observational, the probability
of being included in the study is independent of
the phenotype. - In case-control study, the proportion of cases is
usually greatly ascertained - ?? ?? ???????? (1-??) ?? ?????????????? ??
1-?? ?? ?????????????? ??(1-??)
??(1-??) ?? ???????? - where Pcase and Pcontrol are the probabilities
that a case and a control would be selected for
the study respectively - K is the prevalence of a condition in the
population - P is the prevalence in condition the study
37Non-normality of the liability
- When the proportions of cases and controls are
not a random sample from the population. - The mean and variance for case and control
disease status (ycc), disease liability (lcc),
and genetic liability (gcc) are - E(ycc) P
- (usually, P 1/2)
- var(ycc) P (1-P) which is the phenotypic
variance on the observed scale in the
case-control sample - where P is the proportion of cases in the
case-control study sample -
38Non-normality of the liability
- E(lcc) Pi (1-P)i2 i(P-K)/(1-K)
- i? where ? is, (??-??)
(1-??) - var(lcc) ?? ?????? 2 E( ?? ???? 2 )
E(lcc)2 - P(1 it )(1 - P)(1 i2t) - i2 ?? 2
1Pit-(i-P)tik/(1-K) - i2 ?? 2 - 1i ?(t-i ?)
- 1?
- ? ??? ??-???
- var(lcc) gt 1, in a case-control study because
individuals from the tails of the distribution of
liability have been selected.
39Non-normality of the liability
- The mean of genetic liability depends on the mean
liability phenotype of case-control sample and
the heritability of liability - E(gcc) h l 2 E(lcc) h l 2 Pi (1-P)i2
- h l 2 i?
- Variance of genetic liability as
- var(gcc) s gcc 2 E( ?? ???? 2 ) E(gcc)2
- h l 2 E ?? ???? 2 h l 2 E(lcc)2
- h l 2 P(1it)(1-P)(1i2t)- h l 4 i2?2
- h l 2 1 h l 2
?? -
40Non-normality of the liability
- The regression of phenotype on the observed risk
scale on genetic liability in the case-control
study - bcc cov(ycc,gcc)/var(gcc)
- E(ycc.gcc)-E(ycc)E(gcc)/var(gc
c) - h ?? 2 iP- h ?? 2 i?/ ?? ?????? 2
??h ?? 2 i(1-?)/ ?? ?????? 2 - z ??(1-??) ?? ?? 2 ??(1-??) ??
?????? 2 ?? ? where ? ??(1-??) ?? ?? 2
??(1-??) ?? ?????? 2 - where, ? quantifies the change of the regression
coefficient due to ascertainment in a regression
of phenotype on the observed risk scale onto
genetic factors on the scale of liability
41Non-normality of the liability
- The genetic value on the observed scale (ucc) for
an individual in a case-control study is - ucc c bccgcc
- c z?gcc
- c z ??(1-??) ?? ?? 2 ?? 1-?? ??
?????? 2 gcc - and,
- var (ucc) ?? ?????? 2 ?? ???? 2 ?? ?????? 2
- z ??(1-??) ?? ?? 2 ?? 1-??
?? ?????? 2 2 ?? ?????? ??
42Non-normality of the liability
- The mean of the estimated genetic values, when
samples are ascertained - E( ?? ccycc1) ??(1-??)(1-??) ??(1-??)(1-??)
E( ?? y1) for case - E( ?? ccycc0) ??(1-??)?? ??(1-??)?? E( ??
y0) for control - ?? ???? 2 is a squared regression coefficient
that transform the estimate of genetic factor on
the observed risk scale to liability scale - ?? ?? 2 ??(1-??) ??(1-??) ?? 2 ??
?????? 2 ??(1-??) ??(1-??) ?? ???? 2
?? ???? 2 ?? 2 ?? ?????? 2 ?? ?? 2
?? 2 ??(1-??) ??(1-??)
43Non-normality of the liability
- The mean genetic liability for cases transformed
the observed scale by - bcci ?? ?????? 2 bcci ??(1-??)
??(1-??) ?? ?? 2 - bcci2 ?? ?????? 2 bcci2 ??(1-??)
??(1-??) ?? ?? 2 - ?? ?????? 2 ??(1-??) ??(1-??) ?? ?? 2
- i (1-??) (1-??) ?? ???? ?? and
- i2 i2 ?? ?? - ?? ???? 1-??
44Non-normality of the liability
- h l 2 h ?????? 2 ??(1-??) ?? 2 ????
- h ?? 2 h ?????? 2 1 ?? ??(1-??)
??(1-??) 2 -
- var (h ?? 2 ) ?????? ( h ?????? 2 ) 1 ??
??(1-??) ??(1-??) 4
45Illustration
- Estimate the heritability of a trait from GWAS
study when fitting significant SNPs and all SNPs
simultaneously. - Sol-Phenotypic and marker data was simulated
with the help of R-package and the dimension of
phenotypic data and marker data is 200X1 and
500X200 respectively. The model is represented
as - y µ1 g e
- rrBLUP is used to estimate the effect of random
additive genetic variance from SNPs information.
From 500 SNPs, 10 most significant SNPs are
selected by LASSO. The heritability of a trait
from GWAS study when fitting significant SNPs and
all SNPs was 0.2998 and 0.3755 respectively.
46Illustration
- 2. To estimate the heritability on the liability
scale along with standard error from ascertained
case-control data (K 0.1, l gt 1.282 s ?? ) - Sol-
- The phenotypic data of 2500 case and 2500 control
is simulated (mvrnorm function in the R package).
The heritability estimated from observed data is
0.1021 and heritability estimated in liability
scale is - h l 2 ?? ?? 2 0.1032
- var ( h l 2 ) 0.00017
47Conclusions
- Heritability is not missing but hidden
- In the form of common variants of small effect
scattered across the genome - In the form of low frequency variants only
partially tagged by common variants - Estimates of heritability from traditional method
are inflated - If there is physical material (epigenetic
factors), other than DNA pieces, that can affect
the phenotype and be transmitted stably across
generations, then it should also be thought to
play the role that contributes to additive
genetic effects.
48Conclusions
- There are many character of biological or
economic interest which vary in discontinuous
manner but are not inherited in a simple
Mendelian manner, for this type of traits, the
estimation of heritability based on liability and
threshold model provide an unbiased effect. - The missing heritability of complex traits can be
resolved by estimates of heritability explained
by all genotyped SNPs. - The general framework for heritability
estimation, called GCTA based on HE regression
method provides the unbiased estimates of
heritability -
49References
- Doolittle, D. P. (2012). Population Genetics
Basic Principles. (16). Springer Science
Business Media. - Eichler, E. E., Flint, J., Gibson, G., Kong, A.,
Leal, S.M., Moore, J. H. and Nadeau. J. H.
(2010). Missing Heritability and Strategies for
Finding the Underlying Causes of Complex
Disease. Nature Reviews Genetics. 11 (6)
446450 - Falconer, D. S. and Mackay, T.F.C.(1996).
Introduction to Quantitative Genetics Addison
(4th Edn.). Wesley Longman Ltd. - Golan, D., Lander, E. S. and Rosset,S.
(2014)."Measuring Missing Heritability Inferring
the Contribution of Common Variants." Proceedings
of the National Academy of Sciences. 111(49)
E5272-E5281. - Guo, G., Lin, W., Hexuan, L. and Thomas, R.
(2014). Genomic Assortative Mating in Marriages
in the United States. PloS One .9 (11) e112322
50References
- Gusev, A., Bhatia, G., Zaitlen, N., Vilhjalmsson,
B. J., Diogo, D., Stahl, E. A.and Plenge, R. M.
(2013). Quantifying Missing Heritability at known
GWAS Loci. PLoS genetics, 9(12), e1003993. - Haseman, J.K. and Elston, R.C. (1972). The
Investigation of Linkage Between a Quantitative
Trait and a Marker Locus. Behavioural Genetics.
2(1)319. - Hayes, B. J., Visscher, P. M., and Goddard, M. E.
(2009). Increased Accuracy of Artificial
Selection by Using the Realized Relationship
Matrix. Genetics research. 91(1), 47-60. - Lee, S. H., Wray, N. R., Goddard, M. E. and
Visscher, P. M. (2011). Estimating Missing
Heritability for Disease from GWAS. The American
Journal of Human Genetics, 88(3), 294-305. - Maher, B. S., (2008). The Case of the Missing
Heritability. Nature. 456 1821
51References
- Manolio, T. A., Francis, S. C., Nancy, J. C.,
Goldstein, D. B., Hindorff, L.A., Hunter, D.J.,
McCarthy, M. I., Ramos, E.M., Cardon, L. R. and
Chakravarti, Aravinda. (2009). Finding the
Missing Heritability of Complex Diseases.
Nature. 461(7265) 747753. - Moore, J. H. and Bush, W.S. (2012). Genome-Wide
Association Studies. PLoS Computational Biology.
8 (12) e1002822 - Pierrick, B. and Lu, Q. (2016). "Dissolving the
Missing Heritability Problem. - Richards, E. J., (2006). Inherited Epigenetic
Variation-Revisiting Soft Inheritance. Nature
Reviews Genetics. 7 395401. - Silventoinen, K., Sammalisto, S., Perola, M.,
Boomsma, D. I., Cornes, B.K., Davis, C., Leo D.,
Lange, M. D., Harris, J. R. and Hjelmborg. J.V.B.
(2003). Heritability of Adult Body Height A
Comparative Study of Twin Cohorts in Eight
Countries. Twin Research. 6 (05) 399408
52References
- Visscher, P.M., et al. (2006). Assumption-Free
Estimation of Heritability from Genome-Wide
Identity-by-Descent Sharing Between Full
Siblings. PLoS Genet.2e41. - Yang, J., Benyamin, B., McEvoy, B.P., Gordon, S.,
Henders, A. K., Nyholt, D. R. and Madden, P. A.
(2010). Common SNPs Explain a Large Proportion of
the Heritability for Human Height. Nature
Genetics. 42(7)565569. - Yang, J., Lee, S.H., Goddard, M.E. and Visscher,
P.M. (2011). GCTA A Tool for Genome-Wide Complex
Trait Analysis. American Journal of Human
Genetics. 88(1)7682. - Zuk O, Hechter, E., Sunyaev, S.R., Lander, E.S.
(2012). The Mystery of Missing Heritability
Genetic Interactions Create Phantom Heritability.
Proceedings of the National Academy of
Sciences.109(4)11931198. - Zuk, O., Schaffner, S. F., Samocha, K., Do, R.,
Hechter, E., Kathiresan, S. and Lander, E. S.
(2014). Searching for Missing Heritability
Designing Rare Variant Association Studies.
Proceedings of the National Academy of Sciences.
111(4)455464
53Instructions for use
54LASSO
- Data train
- Input y x1-x500
- Cards
-
- Data test
- Input y x1-x500
- Cards
-
- proc glmselect datatrain valdatatest
- plotscoefficients
- model y x1-x500/
- selectionLASSO(steps10 choosevalidate)
- run
55Estimation of heritability
- xlt-read.csv(file.choose("g1"))
- ylt-read.csv(file.choose("p1"))
- library(rrBLUP)
- xlt-as.matrix(x)
- ylt-as.matrix(y)
- ans lt- mixed.solve(y,x)
- betalt-ansu
- glt-xbeta
- sigma2glt-var(g)
- sigma2plt-var(y)
- h2lt-sigma2g/sigma2p
- h2
- 0.3755903
- xlt-read.csv(file.choose("g2"))
- ylt-read.csv(file.choose("p1"))
- library(rrBLUP)
- xlt-as.matrix(x)
- ylt-as.matrix(y)
- ans lt- mixed.solve(y,x)
- betalt-ansu
- glt-xbeta
- sigma2glt-var(g)
- sigma2plt-var(y)
- h2lt-sigma2g/sigma2p
- h2
- 0.2998782
56Illustration No. 2 (code)
- library(MASS)
- library(pps)
- sigmag1
- sigmae3
- m1matrix(sigmag,100,100)
- m2diag(sigmag,100,100)
- m10.05(m1-m2)
- sigmam1m2
- murep(0,100)
- gvlt-mvrnorm(n 50, mu, sigma, tol 1e-6,
empirical FALSE, EISPACK FALSE) - gvlt-t(matrix(gv,nrow1,byrowT))
- evefflt-rnorm(5000,0,sigmae)
- llt-gveveff
- sigmallt-sd(l)
- threslt-1.283sigmal
- cc1lt-ifelse(lgtthres,1,0)
- idlt-rep(c(150),each100)
- cclt-as.data.frame(cbind(id,cc1))
- names(cc)lt-c("id","cc1")
- csumlt-tapply(cc,2,cc,1,sum)
- s1lt-cccc,21,
- s2lt-cccc,20,
- s3lt-stratsrs(s2,1,csum)
-
57If y follows a standard normal distribution with
a truncation point at t, with t gt 0, so that the
fraction of y that is larger than t is K, then
the mean value of y above the truncation point
is E(y y gt t) i ?? ?? E(y y lt t)
?? ?? - ???? ??-?? Var (y y gt t) 1-
i(i-t) Var (y y lt t) 1 - ?????? (??-??)
?? ?? ???? (?? - ??) where z the height
of the normal curve at point t
Fig. 2
58Fig .3
var(lcc) gt 1, in a case-control study because
individuals from the tails of the distribution of
liability have been selected.