Title: SNP and Variation
1SNP and Variation
- Ka-Lok Ng
- Asia University
2References
- http//www.mun.ca/biology/scarr/4241rm_chapter31.h
tml - http//www.bioinfo.rpi.edu/bystrc/courses/biol454
0/lecture21/lec21.pdf
3Introduction
- Having sequenced the genomics ? then studies the
nature and distribution of variation between
individuals - Variation at DNA level nucleotide insertions,
deletions, and Single Nucleotide Polymorphism
(SNP) or small nucleotide polymorphisms - SNP refers to any site where two or more
different nucleotides are segregating in
population. Cluster of linked SNPs
haplotype SNPs and haplotypes are
increasingly important component in biological
studies which range from ecology and evolution to
biomedical (disease association study)
These variations apply to characterization of
population structure and history or functional
study of genes. They are indispensable for
recombination mapping purposes (linkage
analysis) or used as positional markers for
physical mapping - SNPs are the most common genetic variations occur
once every 100 to 300 bases.
4The Nature of Single Nucleotide Polymorphisms
- Classification of SNPs
- Most common changing from one base to another
- This could either be transversions or transitions
- Could also be insertions and deletions, also
termed indels - Some geneticists see two-nucleotide changes and
small insertions/deletions of a few nucleotides
as SNPs, therefore simple-nucleotide
polymorphism may be a better description - Microsatellites, longer sequence repeats, and any
other molecular polymorphism (transposable
element insertions, deletions, chromosome
inversions and translocations, and aneuploidy)
are not regarded as SNPs - Aneuploidy is an error in cell division that
results in the "daughter" cells having the wrong
number of chromosomes. In some cases there is a
missing chromosome, while in others an extra.
5Classification of SNPs
- SNPs classified on nature of affected nucleotide
- Noncoding SNP 5 or 3 nontranscribed region
(NTR), 5 or 3 untranslated region (UTR),
intron, or intergenic
63.1 (Part 1) Human promoter SNPs that affect
gene expression
- Coding SNP replacement polymorphisms (change
the amino acid encoded for) or synonymous
polymorphisms (change the codon but not the amino
acid) - Nonreplacement polymorphisms include both
synonymous and noncoding polymorphisms, but,
could still affect gene function by having an
effect on transcriptional or translational
regulation, splicing, or RNA stability. - This type of polymorphism is important in
increased genetic variation (Fig 3.1). - Fig. 3.1 a collection of over 140 human
promoter SNPs that have been associated with an
effect on gene expression or TF binding, and in
many cases, a clinical outcome
Fig. 3.1. Human promoter SNPs that affect gene
expression. These are loci for which a SNP has
been implicated in modulation of transcript
levels, either by statistical association or
using a biochemical assay in cell lines that are
dispersed throughout the human genome. The figure
shows where some of these nonreplacement
polymorphisms lie and affect gene expression.
73.1 (Part 2) Human promoter SNPs that affect
gene expression
- Fig. 3.1. Human promoter SNPs that affect gene
expression.
8- SNPs can also be classified as transitions or
transversions - Transitions change purine to a purine (A ?? G)
or a pyrimidine to a pyrimidine (C ?? T) - Transversions change purine to pyrimidine
and vice versa (A or G ? C or T and vice versa) - Transitions tend to occur just as frequently as
transversions and are actually more prevalent
(???), despite transversions having twice as many
possible changes - This holds broadly true for both coding and
noncoding SNPs - In part a result of difference in ab initio
(protein prediction) mechanisms where certain
types of mutations arise and are repaired - Due to the nature of the genetic code,
transitions are less likely to affect amino acids
than transversions. - This means transitions are thought to have a
higher probability of retaining the proper coding
regions
number of transitions/number of transversions gt 1
in coding region
9- Synonymous
- TGT ? TGC results in Cys ? Cys
- Nonsynonymous replacement
- TGT ? TGG results in Cys ? Trp
- can be conservative or nonconservative
- Nonsynonymous nonsense mutation, introduction of
a stop codon - TGT ? TGA results in Cys ? stop
- Nonsynonymous read through mutation
- TAA ? TTA results in stop ? Ile
10- SNP and disease
- Sickle-cell anaemia a disease caused by a
specific SNP an A?T mutation (GTGAG ? GTGTG) in
the b-globin gene changes a Glu ? Val, creating a
sticky surface on the haemoglobin molecule that
leads to polymerization of the deoxy form - SNP and blood groups A, B and O alleles
- A and B alleles differ by four SNP substitutions
- They code for related enzymes that add different
saccharide (sugar, general formula (CH2O)n) units
to an antigen on the surface of red blood cells
(rbc) - Allele Sequence
Saccharide - A .gctggtgacccctt
N-acetylgalactosamine - B .gctcgtcaccgcta
galacotse - O .cgtggt-acccctt
-- - The O allelle has undergone a mutation causing
- a phase shift, and produce no enzyme. The rbc of
type O - contain neither the A nor the B antigen, This is
why people with - type O blood are universal donors in bolld
transfusions. - The loss of activity of the protein does not
seem to carry - any adverse consequences.
The ABO antigens are terminal sugars found at the
end of long sugar chains (oligosaccharides) that
are attached to lipids on the red cell membrane.
The A and B antigens are the last sugar added to
the chain. The "O" antigen is the lack of A or B
antigens but it does have the most amount of next
to last terminal sugar that is called H antigen.
http//matcmadison.edu/is/hhps/mlt/mljensen/BloodB
ank/lectures/abo_blood_group_system.htm
11- In classical population genetic theory, genetic
loci are only regarded as polymorphic if the
frequency of the most common allele is lt 95 ?
that is a 5 changes - Most SNP are first detected in a sample of fewer
than 10 individuals, so the frequency criterion
is not applied all single nucleotide changes are
described initially as candidate SNPs. - NCBI dbSNP http//www.ncbi.nlm.nih.gov/SNP/index
.html - Seattle SNP http//pga.mbt.washington.edu
12- From Fig. 3.1 ? chromosome 1 FY, and do a NCBI
search - NCBI ? SNP ? keyword ? FY AND homo ? refSNP ID
rs17851571
13- Comment - polymorphisms ? mutations
- Confusion arises over the distinction between
polymorphisms and mutations, largely due to dual
usage of the term mutation. - All SNPs arise as mutations, in the sense that
the conversion of one nucleotide into another is
a mutational event. But by the time a seq.
variant is observed in a population, the event
that created it is usually long past, so the
observed SNP is no longer a mutation it is just
a rare seq. variant or a polymorphism. - Since the distinction only applies to a small
fraction of all SNPs, then the term polymorphism
is more general.
14Distribution of SNPs
- Distribution of SNP's lies within the domain of
population genetics - Study of relationship between SNP's and
phenotypic variation lies in the domain of
Quantitative Genetics - Application of SNP ? Quantitative trait loci
(QTL), which are loci that contribute to
polygenic phenotypic variation - Neutral theory of molecular evolution
- Balance between mutation and genetic drift
- Rate of mutations introduced into a population
rate at which polymorphisms are lost - Most mutations whether deleterious, advantageous
or neutral in effect, are lost within a few
generations - The effect of selection acts to reduce the
frequency of slightly deleterious alleles, but on
occasion tends to favor a new allele (positive
selection) or maintain two or more polymorphisms
(balancing selection) at some loci
15- Three key concepts are important in
characterizing SNP variation - Allele frequency distribution
- Linkage disequilibrium
- Population stratification (??)
- Aspects of frequency distribution
- Population structure - example SNP can be more
frequent in one population than another. As
migration is a potent (???) source of diversity,
isolation affects the rate at which variation is
lost (i.e. no variation) due to drift. - Nucleotide Diversity - the average fraction of
nucleotides that differ between a pair of alleles
chosen at random from a population - Hs lower nucleotide diversity, with an average
of one SNP every kbp between the chromosomes of
any individuals - Fly and maize an order of magnitude greater
polymorphism, with one SNP every 50-100 bp - Linkage Disequilibrium and Haplotype Maps
- Linkage Disequilibrium (LD) Non-random
association of alleles - LD allows mapping of disease loci in large
population - In humans - LD is commonly observed for several
tens, and in many cases, 100 kbps of either side
of SNP - LD has an effect on haplotypes which display
clustered distribution - Broad approximation - Genome tens of thousands
of blocks - Each block up to 100,000 bases
- 3 5 common
haplotypes - Each haplotype tens or hundreds of SNPs in
LD
163.2 (Part 1) Nucleotide diversity in natural
populations
- Fig. 3.2 Nucleotide diversity in natural
population. (A) Observed and expected of SNP
frequencies for 874 SNP's from 75 candidate human
hypertension loci. Rare alleles are the most
frequent, and the number of SNPs in each
frequency class declines as the more rare allele
becomes more common.
In a sample of several hundred alleles, the
most common class of SNPs are singletons (which
appear only once in the sample), followed by
doubletons, tripletons, and so on. Only between
1/3 and ½ of all SNPs are common in the sense
that the more rare allele is present in more than
5 of the individuals.
173.2 (Part 2) Nucleotide diversity in natural
populations
- (B) LD (D) decays with time (number of
generations) in proportion to the recombination
rate r. - (C) The level of nucleotide diversity is a
function of recombination rate, and hence
chromosomal position, as in this example for
fruit-fly.
(B) As number of generations ?, frequency of SNP
segregate ? (no more clustering) ? LD ? (C)
as r ?, nucleotide diversity ?
18- NCBI dbSNP http//www.ncbi.nlm.nih.gov/SNP/inde
x.html
dbSNP accepts submissions for SNP, microsatellite
repeats, and small-scale deletion and insertion
polymorphisms
dbSNP summary for various species
19- Submitted data
- The submitter HANDLE is a short tag that uniquely
defines each submitting laboratory in the
database - A unique ssSNP identifier SNP order record, such
as ss4923558, HANDLE YUSUKE - Keyword ss4923558 AND homo
- Keyword ss4923558 will return multiple records !
More than 11 rsSNP records - More than one submitter ? more than one ssSNP ?
these ssSNP are clustered into reference SNP
identifier ? rsSNP
20Alleles A/G Ancestral Allele G Handle YUSUKE,
EGP_SNPS, PERLEGEN, ABI Fasta seq.
gtgnldbSNPrs3737559allelePos301totalLen601t
axid9606snpclass1alleles'A/G'molGenomicbui
ld126
21 22- Go to the bottom of the page
- JBIC sample size 1270, Allele frequency of A
and G - Other populations have a smaller sample size
23- Click NCBI Assay ID ? ss4923558 ?
- Japanese Millennium Genome Project
- Measured in a group of East Asian DNA samples
- There is no individual genotype data for ss4923558
- Click HandleSubmitter ID
- YUSUKEIMS-JST082810 ?
- Allele frequency
- G 0.8929
- A 0.1071
- Sample Size 1270 (number of chromosomes)
24- Entrez SNP search terms
- http//www.ncbi.nlm.nih.gov/entrez/query.fcgi?dbS
np
25- SNP integration in Genome Browsers
- Ensembl http//www.ensembl.org/index.html
- rs3737559
- SNP rs3737559 is located in the following
transcripts - Genotype and Allele frequencies per population
BioMart
26- The local DNA seq. within 100 kb on either side
of the SNP is shown.
27The different types of SNPs are color coded as to
type (e.g. coding, intronic, flanking or other).
Deletion and insertion polymorphisms are
indicated with a triangle. The letters (K, M, R,
S, W, Y) inside the SNP squares indicate the type
of SNP using IUPAC ambiguity codes.
28- UCSC Genome browser http//genome.ucsc.edu/cgi-bin
/hgGateway - BRCA1 gene
29- NCBI Entrez Gene
- Gene BRAC1
- http//www.ncbi.nlm.nih.gov/entrez/query.fcgi?dbg
enecmdsearchtermbrca1
SNP GeneView
The coding SNPs in the BRCA1 gene. Those that do
not change the aa are colored in green, those
that result in a different aa are colored in red.
30SNP association studies
- Association studies
- A case group of people vs. a control group of
people - The case group - are diagnosed with some disease
(e.g. cystic fibrosis), react to some type of
medicine, or are even specially healthy (e.g.
more than 100 years old) - The control group are people that do not exhibit
the feature selected for the case group. - For case-control studies, a selection of SNPs is
genotyped in both the case and control groups - alleles (case group) gt alleles (control group) ?
potential markers for the observed phenotype
31SNP and disease
- Functional variation a SNP may be assoicated
with a nonsynonymous substitution in a coding
region