Title: Introduction to Genetic Epidemiology
1Introduction to Genetic Epidemiology
- 5th Annual Interdisciplinary Genetic Research
Course - Medical, Public Health, Biostatistical
Bioethical Approaches - October 6, 2008
- T.H. Beaty, Johns Hopkins School of Public Health
2Genetic Epidemiology
A hybrid science focusing on complex diseases
(where both genetic environmental factors
contribute to etiology of disease)
Parent sciences (genetics epidemiology) share
common goals but they differ in their histories
perspectives.
3Landmarks in Genetics
Year Event
1865 Gregor Mendel publishes work on peas describing fundamentals of inheritance
1871 DNA is isolated from the cell nucleus
1900 3 people independently re-discover Mendels work (Correns, DeVries vonTschermak)
1901-02 Garrod discovers human example of Mendelian disease (alkaptonuria) Landsteiner discovers 1st genetic marker (ABO)
1908 Hardy Weinberg lay the foundation for modeling genes in populations
1918 Fisher describes how Mendelian genes can account for quantitative phenotypes
4Landmarks in Genetics (contd)
1930s Biometrical school of genetics develops statistical models for genes in families populations
1944 One gene-one protein model is developed
1953 Double helix structure of DNA identified by Watson Crick ( R. Franklin )
1966 Genetic code established (3 nucleotides per codon)
1972 Recombinant DNA techniques developed
1987 Human Genome Project proposed
2001 Draft sequence of human genome available
5Summarizing Genetic History
Century 19th 20th 20th 20th 20th 21st 21st
Landmarks Mendel Mendel Rediscd DNA Structure Human Genome Proj.
Disease Rare Mendelian Rare Mendelian Multifactorial or Complex Multifactorial or Complex All? ?
Mapping Linkage (2 point ? genome wide) Linkage (2 point ? genome wide) LD mapping Sequencing ?
Genes Simple genetic models (4) One gene, one protein Models for complex traits One gene, gt1 proteins Multiple genes interaction ?
Gene Expression Gene defined by mutants Gene expression (mRNA) Gene Regulation ?
Testing Neonatal Screening Diagnostic Testing Susceptibility ?
Public Health applications
6Landmarks in Epidemiology
Year Event
1832 Cholera epidemic in Britain prompted epidemiologic studies by T. Proudfoot H. Gaulter
1843 W. Farr develops epidemiologic concepts based on vital statistics
1854 J. Snow defines transmission mechanism for cholera
1900s As part of bacteriologic revolution, epidemiology evolves from a descriptive to analytical science
1920s Epidemiologic methods are applied to chronic diseases
1935 Descriptive studies of lung cancer patterns implicate cigarette smoking
7Landmarks in Epidemiology
Year Event
1948 Large cohorts established to study cardiovascular disease (Framingham)
1950 Definitive case-control studies of lung cancer smoking published in US UK
1960s Large scale studies (both cohort trials) develop statistical tools for multiple risk factors
1979 Eradication of smallpox achieved
1980s HIV emerges as major public health threat
2000- Epidemiology faces a new millenium A mix of science, public health practice policy
850 years of genetic epidemiologyMorton (2006) J
HUM GENET 51269-277
- Before DNA polymorphisms (1956-1979)
- Linkage analysis (LOD scores) began with very few
markers - Population genetics measured linkage
disequilibrium (LD) by D r2 - Heritability (h2) began with twin studies
evolved into path analysis with genetic
non-genetic causal factors - Pre-genome period (1980-2001)
- Linkage analysis expanded to multipoint analysis
- Even for Mendelian diseases, peaks are hard to
narrow - With complex phenotypes, it is worse
- Association studies proposed as alternative
(Risch Merikangas 1996 SCIENCE 2731516-1517) - Family based association tests are useful
alternative
950 years of genetic epidemiology (contd)
- Post-genome period (gt2002)
- Sequence of the human genome allows greater
understanding of structure of genes, their
physical position (maybe) their function - HapMap offers virtually unlimited markers, but
SNPs are not equally polymorphic in all
populations - LD haplotype blocks vary among populations
- Associated markers are only predictive, but can
identify causal genes - Study design will be critical
- Is 500K always better than 100K?
10Comparing genetics epidemiology
Genetics Epidemiology
Origins Biologic science of inheritance Methodologic science of Public Health
Breeding Experimental Descriptive Analytic
Focus Mechanisms of inheritance gene expression Etiologic factors in disease methods of control
Often on rare diseases Mostly on common diseases
11Comparing genetics epidemiology (contd)
Tools Family Studies Laboratory Methods Statistical Models Population Models Descriptive Studies Case/Control Cohort Designs Clinical Trials
Goals Understand mechanisms of inheritance Understand disease etiology distribution
Over-lap Diseases of interest (heart disease, cancer, diabetes) Prevent/control disease Diseases of interest (heart disease, cancer, diabetes) Prevent/control disease
12Central questions in Genetic Epidemiology
- Does the trait cluster in families?
- Can familial clustering be explained by genes or
shared environment? - What is the best model of inheritance?
- Can we locate genes for complex diseases/traits?
- How does the gene control risk of
disease?
13Extending basic questions in genetic
epidemiology(Burton et al. 2005 Lancet
366941-951)
14Useful References from Lancet 2005
- Sharp D. Genetic epidemiology strengths,
weaknesses, and opportunities. Lancet 366880,
2005. - Burton P, Tobin MD, Hopper JL. Genetic
Epidemiology 1 Key concepts in genetic
epidemiology. Lancet 366941-951, 2005. - Teare MD, Barrett JH. Genetic Epidemiology 2
Genetic linkage studies. Lancet 3661036-1044,
2005. - Cordell HJ, Clayton DG. Genetic Epidemiology 3
Genetic association studies. Lancet
3661121-1131, 2005. - Palmer LJ, Cardon LR. Genetic Epidemiology 4
Shaking the tree mapping complex disease genes
with linkage disequilibrium. Lancet
3661223-1234, 2005. - Hattersley AT, McCarthy MI. Genetic Epidemiology
5 What makes a good association study? Lancet
3661315-1323, 2005. - Hopper JL, Bishop DT, Easton DF. Genetic
Epidemiology 6 Population-based family studies
in genetic epidemiology. Lancet 3661397-1406,
2005. - Davey-Smith G, Ebrahim S, Lewis S, Hansell AL,
Palmer LJ, Burton PR. Genetic Epidemiology 7
Genetic epidemiology and public health hope,
hype, and future prospects. Lancet 3661484-1498,
2005.
15Study designs for central questions
Does the disease cluster in families? Case-control studies familial correlation
Is this due to genes or shared environments? Heritability variance components
What is the best model of inheritance? 3. Segregation analysis
16Study design for central questions (contd)
4. Can we locate causal genes for complex diseases? Linkage analysis linkage disequilibrium studies
How does the gene control risk? Gene-environment interaction
17Nature is not linear
Case-Control Designs
Familial Correlation/ Aggregation
Gene-Environment Interaction
Linkage Analysis Linkage Disequilibrium
Heritability/Variance Components
Segregation Analysis
18Different levels of study
- Population comparisons (ecological studies) can
suggest a role for genes - Cohorts of individuals or complete populations
- Case-control studies can be used to test how
genes affect risk - information about genetic risk factors in complex
diseases may reflect a direct/indirect effect on
risk - test for interactions or heterogeneity
- Family studies are generally more informative
about genetic mechanisms, but you must know how
families were sampled
19Study designs can ( do) overlap
1. Population Studies Ecological
comparisons Migrant studies Admixture
2. Case-control studies Case only Case-unrelated
control
3. Family Studies Twins sibs Nuclear families
Pedigrees Linkage studies in multiplex families
Case-related control Family based association
Outcrossing studies Genealogic registries Adoption
registries
201. Population based designs
- Population comparisons (ecological design)
- Migrant studies
- Do people who move from a low risk environment to
a high risk environment change their risk? - Consider issues of self-selection, assimilation,
etc. - Admixture studies
- Does disease risk parallel genetic admixture (
of genes of distinct ancestry)? - Admixture is only estimated
- Human populations are not constant
- Vital records can be an important resource,
especially birth defects disease registries - Does risk of disease change among offspring of
incross vs. outcross matings?
212. Case-Control Designs
- Case-unrelated control can identify genetic risk
factors - Genetic index (e.g. inbreeding)
- Genetic marker
- Genetic marker can be a risk factor due to
- Direct effect of marker in causal pathway
- Indirect effect due to linkage disequilibrium
(LD) - association between a high risk allele at an
unobserved causal gene observed marker allele
222. Case-control designs (contd)
- Conventional case-control design
- Representative sample from case control
populations - Tests for difference in allele or genotypic
frequencies - Problems with confounding (population
stratification) - Case-related control design
- Representative sample of cases their unaffected
sibs (or cousins) - Minimize chances of confounding
- Overmatched for genetic background less
statistical power - Can test for linkage directly
23Variations on case-control design
- Incomplete case-controls designs can test for
Gene-Environment interaction (GxE) - Case-only designs
- Incomplete variations (G E on cases, only E on
controls, etc.) - Family based controls
- Create controls from parental mating type
- Under Ho, marker alleles are transmitted to case
as often as not - Rejecting Ho implies linkage linkage
disequilibrium (association) - Simplex families can now contribute to tests for
linkage
243. Family Designs
- Family designs
- Fixed sets of relatives
- Twins
- Adoption studies
- Nuclear families (parents offspring)
- Pedigrees of arbitrary structure
25Families come in different shapes sizes
- Sample fixed sets of relatives
- Adoption studies address fundamental questions
about genes vs. environment - Adoptee, adoptive parents, biological parents,
unrelated sibs in adoptive family - Twins estimate heritability by comparing MZ DZ
twins - Affected sib pairs (parents) to test for linkage
- Are these representative of all families?
26Families come in different shapes sizes (contd)
- Sample nuclear families (parents offspring)
- Measure familial aggregation/correlation
- Fit models of inheritance
- Collect data on family history in extended
families - Expected risk of disease can be computed as
(person-years at risk) (age specific risk) - Requires good information on baseline incidence
rates - Expected number of cases (E) based on population
risk per person-year - Observed number of cases (O) typically by report
- Compute Family History Score as Poisson statistic
Observed
Expected
See also Silberberg et al (1999 GENET EPI
16344-355)
27Family History Scores
- Summarize familial risk in families ascertained
through probands (cases/controls) - Kerber (1995 GENET EPI 12291-301) Breast cancer
cases controls drawn from the Utah Population
Data Base - Can be used to identify highest risk families
- Schwartz et al (1988 AM J EPI 128524-535) Cancer
risk in families of cases drawn from a cancer
registry - Can be useful for public health
28Public Health uses for family history
29CDC resources www.cdc.gov/genomics
30CDC resources
Another CDC resource
31CDC resourceshttp//hugenavigator.net/
Search by gene or phenotype
32If you sample families in a representative
manner,
- Quantitative traits or a common qualitative
phenotype can be used to - Estimate heritability (h2) or
- Find best fitting model of inheritance
(segregation analysis) - If genetic markers are available, these families
can be used to - Test for linkage to unobserved genes controlling
qualitative phenotype - Drs. Liang Xu will discuss this tomorrow
- Search for quantitative traits loci (QTL) that
control quantitative phenotypes
33Family studies representative sampling (contd)
- Joint models for segregation analysis linkage
are feasible - Linkage analysis is still limited to families
informative for meiosis - Multiplex families with gt1 affected
- Simplex families have only 1 affected member
- Linkage will always reflect a subset of all
families - Heterogeneity between simplex multiplex
families should be considered
34Families ascertained through proband
- Proband (typically affected) brings the rest of
family into the study - Segregation analysis can identify the best
model of inheritance if ascertainment is
considered - Models have many parameters to estimate
- Even so they may not completely correct
- Families vary considerably in information
content - Correcting for ascertainment bias is necessary
35Linkage vs. Association
- Requires multiplex families
- Bigger is better
- Guaranteed to work for Mendelian diseases
- Genome wide studies are feasible
- Still useful for complex diseases
- Locus heterogeneity (linked unlinked families)
is a problem - Meta-analysis may strengthen evidence but
narrowing peaks is still hard
- Unrelated cases controls can be used
- Can incorporate tests for G, E, GxE, GxG, etc.
- Meta-analysis can measure consistency across
studies - Or lack thereof
- Allelic heterogeneity is a problem
- Different high risk alleles
- Genome wide studies are now feasible (but
expensive) - Interpreting them is a challenge
36Genes as risk factors
- Epidemiology study designs treat genetic markers
as a risk factor - Test Ho Genotype (G) is independent of risk,
P(case) - Odds ratio (OR) measures association between
marker risk of disease - OR(caseG)(AD)/(CB)
- Dr. Liang will discuss this
Case Control
G A B
G- C D
37What can you do with a genetic risk factor?
- Are genes just inherited risk factors?
- How can you use genetic risk factors in public
health? - Causal mutations can be used to screen
- Women at high risk of breast cancer for BRCA1 2
mutants - Couples at risk of having CF child
- Linked markers can be used for genetic counseling
or mapping - Genetic markers that are true risk factors can be
used in screening - But you must be confident in the estimated risks
- e4 for Alzheimiers Diseae?
- Is there an intervention?
- These may depend on population or environment
38Big PicturePublic Health Genetics is
different from Genetic Epidemiology
- Public health genetics is broader than genetic
epidemiology - Application vs. Research
- Screening, intervention, treatment are part of
public health genetics - Policy is key part of public health genetics
39Public Health Genetics (contd)
- Deals with both Mendelian complex diseases
- Mendelian diseases in the aggregate are a major
public health burden - Screening the population can identify high risk
individuals or groups - Screening for complex diseases will be more
demanding will require greater efforts to
validate estimates of risk
40Trends in science Genomic Medicine Human
Genome Epidemiology
- Khoury, Little Burke (2004) Human Genome
Epidemiology Oxford Univ Press - Recent advances in genetics hold considerable
promise for medicine public health - Many reports of genes for common diseases, few
are consistent - There is some hype involved
- What to do with new information as it emerges?
- How to validate them?
- How to act on them?
41Trends in science (contd)
- Genomic medicine could predict risk of common
diseases based on genotypes - Genetic vs genomic ?One gene vs. many genes
- Pharmacogenomics could tailor pharmaceutical
treatment based on genotypes - Both require solid epidemiologic data to generate
confirm predictive value of genotype on risk - This requires many studies, not one
- This may vary among populations
- This may depend on environment
42Continuum from gene discovery to disease
prevention (Khoury et al, 2004)
Generalized intervention
Assessing impact of Genes on health
Developing evaluating Intervention
Integrating evidence
Gene discovery
Genotype specific Intervention (screening,
treatment, etc.)
Biology of gene?disease
Shaded box denotes steps where good epidemiologic
approach is critical
43Summary of introduction
- Genetic epidemiology is a wide ranging scientific
discipline - Focus on identifying genes involved in complex
diseases - Variety of study designs are used
- Variety of statistical methods are available
- Complex diseases are complex
- Nature has many surprises awaiting us