Title: Common Deletion Polymorphisms in the Human Genome
1Common Deletion Polymorphisms in the Human Genome
Instructor Yao-Ting Huang
Bioinformatics Laboratory, Department of Computer
Science Information Engineering, National Chung
Cheng University.
2Polymorphisms in the Human Population
- So far we have extensively focused on Single
Nucleotide Polymorphisms (SNPs).
3Polymorphisms in the Human Population
- However, we are distinguished from one another by
all sorts of mutations in the human genome. - e.g., deletion, insertion, and inversion.
-A C T T T G C T C-
-A C T T T G C T C-
Deletion polymorphisms
4Types of Deletions
- Recall our human genome is a diploid.
- Each DNA segment should be observed with two
copies if no deletion.
C CT TT TC CC CA AC C
5Types of Deletions
- Recall our human genome is a diploid.
- Deletion may occur in one or both chromosomes.
- Homozygous deletion
- Hemizygous deletion
C CT TT CC CC CA AC C
C CT TT CC CC CA AC C
6Disease Association with Deletion
- It has been known that deletions can also lead to
a variety of serious disorders. - Tumor suppressor gene are often found to be
deleted in cancer cells. - With one copy he or she may stay healthy.
- However, the remaining copy of the tumor
suppressor gene can be inactivated by a point
mutation, leaving no tumor suppressor gene to
protect the body.
7References
- McCarroll et al. Common deletion polymorphisms in
the human genome. Nature Genetics, 2006. - Conrad et al. A high-resolution survey of
deletion polymorphism in the human genome. Nature
Genetics, 2006. - Hinds et al. Common deletions and SNPs are in
linkage disequilibrium in the human genome.
Nature Genetics, 2006. - Evan E.E. Widening the spectrum of human genetic
variation, Nature Genetics, 2006.
8The International HapMap Project
- 269 samples from four populations are genotyped
to provide a dense map of SNPs. - SNP genotypes are generated at ten genotyping
centers using seven different genotyping
technologies. - SNPs are discarded if they fail to pass the
quality control (genotyping errors or failure). - These authors utilize several signatures of
deletions left in SNP genotyping. - That is, SNPs of genotyping errors or failure are
re-scanned for detecting the presence of deletion
polymorphisms.
9The International HapMap Project
- The deletions usually left two common signatures
in SNP genotyping. - (1) Homozygous deletions resulted in null
genotypes. - We fail to obtain the SNP alleles of these
genotypes (i.e. missing data in the microarray
chip). - Null genotypes could be due to either the failure
of genotyping or deletion. - (2) Hemizygous deletions resulted in SNP
genotypes miscalled as homozygous. - We will still obtain SNP alleles for this type of
deletions.
10Footprints of Deletions in SNPs (1/3)
- (1) Homozygous deletions tend to result in
cluster of null genotypes. - Genotyping failure are less likely to be
clustered and specific to an individual.
deletions
A CT TA AA CT CA CC T
C CT CA CA CT CA CC C
A CT TA CA CT C A AC T
C CT CA CC CC CA AC T
SNP1
SNP2
Genotyping failuare
11Footprints of Deletions in SNPs (2/3)
- (2) Hemizygous deletions resulted in SNPs
genotypes miscalled as homozygous. - These hemizygous deletions often violate the law
of Mendelian inheritance. - An allele in a child could not have been received
from any of their parents.
A CT CA CG CA CA CC T
A CT TT TC CC C A AC T
C CT CT CC CC CA CC C
12Gregor Mendel
- Gregor Mendel is called Father of modern
genetics. - His study of the inheritance of traits in pea
plants is the fundamental basis of genetics
(1866).
13Footprints of Deletions in SNPs (2/3)
- (2) Hemizygous deletions resulted in SNPs
genotypes miscalled as homozygous. - These hemizygous deletions often lead to Mendel
inconsistency. - e.g., TT of the child violates the Mendelian law.
A CT CA CG CA CA CC T
A CT TT TC CC C A AC T
C CT CT CC CC CA CC C
14Hardy-Weinberg Equilibrium
- Hardy-Weinberg Equilibrium describes a
relationship between the frequencies of alleles
and the genotypes observed in a population. - Frequencies of alleles Pr(A) and Pr(a)
- Frequencies of genotypes Pr(AA), Pr(Aa), and
Pr(aa).
Father
Mother
15Hardy-Weinberg Equilibrium
- Hardy-Weinberg Equilibrium describes a
relationship between the frequencies of alleles
and the genotypes observed in a population. - Let Pr(A)p and Pr(a)q.
- Ideally, Pr(AA) p2, Pr(Aa) 2pq, and Pr(aa)
q2.
16Deviation of HWE
- Inbreeding
- Breeding between close relatives, which causes an
increase in homozygosity for all genes. - Assortative mating
- When individuals tend to mate with individuals
like themselves or dissimilar, which increase or
reduce genetic diversity. - Small population size (sampling)
- It is easy to observe a random change in
genotypic frequencies, particularly when the
population is very small.
17Footprints of Deletions in SNPs (3/3)
- Cluster of deviation from Hardy-Weinberg
equilibrium. - We have Pr(A) p gt 0 and Pr(a) q gt 0.
- But no heterozygous SNPs (e.g., AC) are found.
18Methods for Finding Clusters of Null Genotypes
(1/3)
- Formulate each SNP as a binary vector.
- 0 normal genotypes 1 null genotypes.
19Methods for Finding Clusters of Null Genotypes
(1/3)
- Formulate each SNP as a binary vector.
- Genotyping failure tend to scatter in the matrix.
20Genotyping Errors
- Formulate each SNP as a binary vector.
- Homozygous deletions tend to be clustered.
21Methods for Finding Clusters of Null Genotypes
(2/3)
- They cluster is determined based on the
correlation coefficient r2.
22Methods for Finding Clusters of Null Genotypes
(2/3)
- They cluster is determined based on the
correlation coefficient r2. - If SNP3, SNP4, and SNP5 have r2 gt 0.8, they are
considered to be similar patterns and put in a
cluster.
23Methods for Finding Clusters of Null Genotypes
(3/3)
- Highly similar patterns (r2gt0.8) of null
genotypes or Mendel failures tend to be
physically clustered in the genome.
24Step 1 Identify a list of clusters
- A list of candidate clusters was determined by
considering every consecutive pair and
consecutive trio of observations of that pattern,
together with any other intervening SNP assays.
25Step 2 Compute P value for each cluster
Chromosome
CandidateClusters
- For each candidate cluster, a cluster P value is
computed to access the binomial probability of
observing a cluster at least as tight as that
cluster, given - the background frequency of the pattern,
- the number of SNPs spanned by the cluster, and
- the total number of SNPs genotyped.
26Clustering P Value
- Suppose the pattern frequency is 0.001.
-
27Clustering P Value
- A significance threshold is chosen somewhat to
the left of a knee in the distribution.
28Flow of the Process
Identify clusters of null genotypes or Mendel
inconsistency
CandidateClusters
CandidateClusters
29Results
- They identify 541 deletions ranging from 1 kb to
745 kb in size. - 94 are novel and 120 are homozygous deletions.
30Experimental Validation
31How Many Copy Number Polymorphisms (CNPs) are
Deletions?
- CNPs imply a deletion or a duplication.
- Copy number polymorphisms in the human genome,
Science, 2004. (76 CNPs) - Detection of large-scale variation in the human
genome. Nature Genetics, 2004. (255 CNPs) - Segmental duplications and copy number variation
in the human genome, AJHG, 2005. (119 CNPs) - Only 11 deletions are overlapped with previously
identified CNPs. - They speculate that most of these CNPs are
resulted from duplications rather than deletions.
32Exons of Genes Removed by Deletions
- 266 genes are affected by deletions in at least
one individual. - Ten genes are found to be deleted with sufficient
frequency.
33Linkage Disequilibrium (LD) between SNPs and
Deletions (1/2)
- If deletions result from ancestral events, they
will be often in LD with nearby SNPs. - Ten deletions (/, /-, -/-) obtained by
quantitative PCR over 269 HapMap samples are
tested. - The phase of deletions and all SNPs flanking
200kb to one deletion are obtained by Haploview.
34Linkage Disequilibrium (LD) between SNPs and
Deletions (2/2)
35Capture Deletions by Tag SNPs
- SNP3 can be the tag SNP for this deletion.
- i.e., the allele C at this locus indicates a
deletion. - This implies that the upcoming genome-wide
association studies using tag SNPs is also
sufficient to detect deletions.
36Linkage Disequilibrium (LD) between SNPs and
Deletions
- Six of the ten deletions have a perfect proxy.
- These deletion are connected with the same allele
across populations. - This implies they are ancestral deletion before
migration from Africa to Europe or to Asia.
37More Evidence from Larger Collection
- Based on 51 deletions validated by PCR,
- The haplotype homozygosity of SNPs flanking
homozygous deletions are significantly elevated. - This indicates the deletion travel only on
specific haplotypes.
38Concluding Remarks
- These papers provide a similar method for
identifying deletion polymorphisms using SNPs. - Mainly based on cluster of SNPs containg Mendel
failures or null genotypes. - The CNPs are mostly resulted from duplications
instead of from deletions. - This speculation is not convincible since
recently discovered deletions are shown to be
poorly overlapped. - The deletion polymorphisms and SNPs are in high
LD. - A single genotyping platform is sufficient to
detect both SNPs and deletions.
39Notes of Methods
- The process of patterns of Mendel inconsistency
are similar to that of null genotypes. - The vector is changed to length of 60 containing
two elements per trio in CEU and YRI. - The Hardy-Weinberg disequilibrium is only used
for extending the clusters identified by using
null genotypes or Mendel inconsistency. - It is estimated by (hetobs/hetexp) and by setting
a cutoffs of 0.4 or 0.7.