Title: Human Comparative Genomics
1Human Comparative Genomics
2- What is the sequence of the normal Human Genome?
- What accounts for the genetic differences
between individuals?
3Finding Segmental Duplications in the Human Genome
Bailey et al (2002) Science 2971003-07
4Segmental Duplications in the Human Genome
Bailey et al (2002) Science 2971003-07
5Polymorphism in Segmental Duplications
Iafrate et al (2004) Nat Genet 36949-51
6Polymorphism in Segmental Duplications
- CGH studies find many copy number polymorphisms
in segmental duplications (12 per individual) - Rare and common polymorphisms
- Many overlap coding regions
- Critical for the interpretation of amplifications
in cancers - Responsible for phenotypic differences between
people?
7SNPs/Hap Map
- http//www.hapmap.org/
- 1 SNP/1000 bp
The International HapMap Project is a
multi-country effort to identify and catalog
genetic similarities and differences in human
beings. Using the information in the HapMap,
researchers will be able to find genes that
affect health, disease, and individual responses
to medications and environmental factors. The
Project is a collaboration among scientists and
funding agencies from Japan, the United Kingdom,
Canada, China, Nigeria, and the United States.
All of the information generated by the Project
will be released into the public domain
8Clustering human populations by genotype
- K-means clustering of gene expression data
- Pick a number (k) of cluster centers
- Assign every gene to its nearest cluster center
- Move each cluster center to the mean of its
assigned genes - Repeat 2-3 until convergence
- EM-based clustering of genotype data
- Pick a number (k) of sub-populations
- Assign every individual to a sub-population based
on the allele frequencies in the sub-population - Recalculate the allele frequencies in each sub
population - Repeat 2-3 until convergence
9An Example
I1 (A1,B1,C2) I2 (A1,B1,C2) I3 (A1,B2,C2) I4
(A2,B2,C1) I5 (A1,B1,C1) I6 (A1,B1,C2) I7
(A1,B1,C2) I8 (A2,B2,C2) I9 (A1,B2,C1) I10
(A2,B1,C2) I11 (A2,B2,C2) I12 (A2,B2,C2)
12 individuals genotyped at three different
independent biallelic loci
10k1
k3
k2
I1 (A1,B1,C2) I2 (A1,B1,C2) I3 (A1,B2,C2) I4
(A2,B2,C1)
I5 (A1,B1,C1) I6 (A1,B1,C2) I7 (A1,B1,C2) I8
(A2,B2,C2)
I9 (A1,B2,C1) I10 (A2,B1,C2) I11
(A2,B2,C2) I12 (A2,B2,C2)
F(A1)k10.75 F(B1)k10.5 F(C1)k10.25
F(A1)k20.75 F(B1)k20.75 F(C1)k20.25
F(A1)k30.25 F(B1)k30.25 F(C1)k30.25
Consider individual I1 (A1,B1,C2) P(I1 in k1)
(.75)(.5)(.75) 0.28 P(I1 in k2)
(.75)(.75)(.75) 0.42 P(I1 in k3)
(.25)(.25)(.75) 0.046 Therefore reassign I1 to
k2
11Questions
- How many sub-populations (k) best partition the
data? - How strong is the evidence for the clusters?
- Do the inferred clusters correspond to our
notions of race, ethnicity, ancestry, or
geography? - Given the inferred clusters can we accurately can
we classify new individuals? - Can we identify population admixture or migration
events?
12Attempts to group humans by genotype
13? and Fst
- ?, average nucleotide diversity (1
in 1000 bp) - Fst, proportion of genetic variation that can be
ascribed to differences between populations
(10)
14Summary of Findings
- ? and Fst are small
- Diversity within African populations is highest
- Unsupervised clustering tends to support either 3
or 4 sub-populations depending on number and type
of markers and individuals included in the study,
but the composition of the groups are often
different in different studies
15An exampleBamshad et al (2003) Am. J. Hum.
Genet. 72578-89
16ButBamshad et al (2003) Am. J. Hum. Genet.
72578-89
17A contradiction?
- Although they differed on the extent and
composition of sub-populations, so far all
studies have found evidence of significant
sub-structure in human populations - And yet, all studies agree that Fst is small
(between 3-15)
See review by Jorde and Wooding (2004) Nature
Genet. 36 S28-S33
18Small Fst does not imply lack of structure
E2
C2
C1
C1
A1
A2
E2
A1
B1
A1
E1
C2
A1
A2
D1
D1
E1
E2
C2
A2
B1
B2
A1
C1
D1
D2
E1
A1
D2
A2
D1
A1
B1
A1
E2
E1
D2
A1
B2
E2
B2
C2
A1
A1
19Pharmacogenomics
- Many drugs never reach the market because of side
effects in a small minority of patients - Many drugs on the market are efficacious in only
a small fraction of the population - This variation is (in part) due to genetic
determinants - Orissa?EGF mutations
- Codeine?cytochrome P450 alleles
20Question Is race, ancestry, ethnicity, geography
or genetic substructure a reasonable proxy for
genotype at alleles relevant for drug metabolism?
Answer So farNo. Still looks as if we will
have to genotype the relevant loci before making
any guesses
21Population genetic structure of variable drug
response.Wilson et al (2001) Nat Genet. 29
265-269
A
B
C
CYP1A2
A African B European C Asian
GSTM1
CYP2C19
DIA4
NAT2
CYP2D6
22Evidence for Archaic Asian Ancestry on the Human
X ChromosomeGarrigan et al. (2005) Mol. Biol.
And Evol. 22189-192
- Pseudogene on the X-chromosome
- 18 substitutions between human-chimp
- 15 substitutions between two human alleles
- Assuming a molecular clock the split between the
two human alleles is about 2 million years - Both alleles found in southern Asia, only one
allele found in Africa - Only human gene tree to root in Asia
23Garrigan et al. (2005) Mol. Biol. And Evol.
22189-192
24Human evolution in a nutshell
5-6 mya
H. ergaster
1 mya
0.5 mya
H. erectus
0.2 mya
H. neanderthalis
chimps
H. sapien
25Garrigan et al. (2005) Mol. Biol. And Evol.
22189-192
26So what happened?
- Strong selection for the Asian allele in southern
Asia - -not likely since this is a pseudogene locus
- -fails Tajimas D test
- Gene flow between H. sapien and H.erectus in
southern Asia - -branch lengths are about right for 2 million
years of divergence - -H. erectus was in southern Asia until 18,000
years ago (Morwood et al. and Brown et al. in
Nature (2004) vol 431.) - -supporting evidence from genetic analysis of
lice and other human parasites (Reed et al (2004)
PLoS 21972-83)
27Human evolution in a nutshell
5-6 mya
H. ergaster
1 mya
0.5 mya
H. erectus
H. neanderthalis
0.2 mya
?
chimps
H. sapien