Title: Single Nucleotide Polymorphism
1Single Nucleotide Polymorphism
Anshu Bhardwaj Research Fellow Centre for
Cellular Molecular Biology Hyderabad 8th
November, 2003
2Single Nucleotide Polymorphism
Single base-pair differences occurring in a
population with a frequency of gt1
...C C A T T G A C...
G G T A A C T G...
...C C G T T G A C...
G G C A A C T G...
3SNPs can be found in..
- NON-CODING REGION
- 5 and 3 UTRs
- Introns
- splice sites
Non-synonymous Amino acid substitution
Synonymous Silent
4Single base-pair differences occurring in a
population with a frequency of gt1
5GENOTYPIC FREQUENCY Relative distribution of
genotypes in a population for a particular locus
6ALLELIC FREQUENCY The relative abundance of an
allele of a particular gene with reference to its
other alleles
Let pf(M) and qf(N). Thus, pf(MM) ½ f(MN)
and qf(NN) ½ f(MN).
7ALLELIC FREQUENCY The relative abundance of an
allele of a particular gene with reference to its
other alleles
Percent
p
q
MM
MN
NN
Location
83.5
15.6
0.9
0.92
0.08
Greenland
Let pf(M) and qf(N). Thus, pf(MM) ½ f(MN)
and qf(NN) ½ f(MN).
GENOTYPIC FREQUENCY Relative distribution of
genotypes in a population for a particular locus
8WHY SNPs ? ?
- SNPs are distributed non-randomly throughout the
genome
- On an average a significant SNP is found for
every 1kb of - the human genome, resulting in approximately
3 million SNPs
- Large number
- Unambiguous assay techniques
- High levels of polymorphisms in population
- Most of the phenotypic differences arise from
SNPs in - genes, but these form only a small fraction of
the total number
9dbSNP DENSITY DISTRIBUTION IN HUMAN
- Mean Density
- 0.001765 SNPs per base (17.652 SNPs per 10 kb)
- Mean Spacing
- 566.5118 bases per SNP
10SNP Discovery
- SNP Discovery refers to the initial
identification of new - SNPs
- The established method is electrophoresis(DNA
sequencing) - with subsequent data analysis. Some indirect
Discovery - techniques (e.g., dHPLC, SSCP) only indicate
that a SNP - (or other mutation) exists
- DNA sequencing of multiple individuals is used
to determine - the point and type of polymorphism
-
11SNP Validation
- SNP Validation refers to genetic validation, the
- process of ensuring that the SNP is not due
to - sequencing error
- Confirmation of SNPs found in discovery
- Larger numbers of individual samples to get
statistical - data on occurrence in the population
12- THE EXPERIMENTAL APPROACH
- RESTRICTION FRAGMENT LENGTH POLYMORPHISM
- SINGLE STRANDED CONFORMATIONAL POLYMORPHISM
- DENATURING HIGH PRESSURE LIQUID CHROMATOGRAPHY
- HYBRIDIZATION METHOD
- MALDI-TOF METHOD
-
SEQUENCING ALIGNMENT THEREAFTER
13- THE EXPERIMENTAL APPROACH
- RESTRICTION FRAGMENT LENGTH POLYMORPHISM
- SINGLE STRANDED CONFORMATIONAL POLYMORPHISM
- DENATURING HIGH PRESSURE LIQUID CHROMATOGRAPHY
- HYBRIDIZATION METHOD
- MALDI-TOF METHOD
-
SEQUENCING ALIGNMENT THEREAFTER
14IN SILICO SNP PREDICTION
POLYBAYES
SEAN SNP Prediction Program
SNP Finder
15IN SILICO SNP PREDICTION
POLYBAYES
SEAN SNP Prediction Program
SNP Finder
16Restriction Fragment Length Polymorphisms
Botstein et al (1980)
CHANGES IN MIGRATION PATTERNS THAT REPRESENT
ALLELIC VARIATION
A
3 Kb
Homolog 1
12 A
12 B
12 C
Homolog 2
1 Kb
2 Kb
PROBE
B
3 Kb
Homolog 1 2
C
Homolog 1 2
2 Kb
1 Kb
CAN BE USED TO DETECT SNPs DIFFERENTIALLY IN
HOMOZYGOUS HETEROZYGOUS INDIVIDUALS
17MALDI-TOF METHOD
Matrix-assisted laser desorption ionization-time
of flight
18(No Transcript)
19SEQUENCING METHOD
20POLYBAYES BAYESIAN INFERENCE ENGINE TO CALCULATE
THE PROBABILITY THAT A GIVEN SITE IS POLYMORPHIC
21SNP DETECTION IN REDUNDANT SEQUENCE DATA
SEQUENCE CLUSTERING CLUSTER REFINEMENT MULTIPL
E ALIGNMENT SNP DETECTION
22The PolyBayes Approach
- Use genomic sequence as reference
- cluster and align all available sequences
- remove repeats/paralogs
- Use Bayesian statistics to
- distinguish polymorphic sites from artifacts
- estimate likelihood
- Marth, GT, Korf, I, Yandell, MD, Yeh, RT, Gu, Z,
Zakeri, H, Stitziel, NO, Hillier, L, Kwok, P-Y,
Gish, WR A general approach to single-nucleotide
polymorphism discovery. Nature Genet. 1999
23452-456.
23(No Transcript)
241. Known repeat sequences are masked using
RepeatMasker
2. FRAGMENT CLUSTERING (a) WU-BLAST used to
search against dbEST (b) Sequence traces
processed with PHRED base-calling values
(c) Distinct group of matching ESTs registered
as clusters
3. Each cluster member pair-wise aligned to the
genomic anchor sequence with CROSS_MATCH
25PARALOGUE IDENTIFICATION
1. May give rise to false SNP predictions
points to difficulties during marker development
2. Calculate probability PNAT that a cluster
member is derived from genomic region.
3. Distinguish between less accurate sequences
that nevertheless originate from the same
underlying genomic location More accurate
sequences with high-quality discrepancies that
are likely to be paralogous
4. Using a threshold value PNAT,MIN paralogous
cluster members are removed
26DNAT L PPOLY.2 E (PPOLY.2 0.001) DPAR
L PPAR E (PPAR 0.02)
d discrepancies
P(MODELNATD)
PNAT,MIN 0.75
27MULTIPLE ALIGNMENT
- Depth of coverage
- The base-quality values of the sequences
- The a priori expected rate of polymorphic sites
in the region
- PSNP ? PROBABILITY THAT THE SITE IS POLYMORPHIC
- DISTRIBUTION OF PROBABILITY SCORES EXHIBITS A
- HIGH LEVEL OF SPECIFICITY
28THRESHOLD VALUE PSNP 0.4
29THE POLYBAYES SOFTWARE
30OTHER SNP PREDICTION SNP FINDING SOFTWARE
- SEAN Search for localized SNPs
- and predict SNPs
- (http//zebrafish.doc
.ic.ac.uk/Sean/) - SNP Finder For analyzing user-submitted
- trace data
- (http//gai.nci.nih.g
ov/)
31(No Transcript)
32(No Transcript)
33(No Transcript)
34(No Transcript)
35SIGNIFICANCE OF SNPs
- IN DISEASE DIAGNOSIS
- IN FINDING PREDISPOSITION TO DISEASES
- IN DRUG DISCOVERY DEVELOPMENT
- IN DRUG RESPONSES
- INVESTIGATION OF MIGRATION PATTERNS
- ALL THESE ASPECT WILL HELP TO LOOK FOR MEDICATION
DIAGNOSIS AT INDIVIDUAL LEVEL
36SNP Screening
- Two different screening strategies
- - Many SNPs in a few individuals
- - A few SNPs in many individuals
- Different strategies will require different
tools - Important in determining markers for complex
genetic - states
37SNP genotyping methods for detecting genes
contributing to susceptibility or resistance to
multifactorial diseases, adverse drug reactions
gt case-control association analysis
.GCCGTTGAC. .GCCATTGAC. .GCCATTGAC. .GCCAT
TGAC.
case
control
allele frequency genotype frequency
haplotype frequency A , G AA ,
AG , GG SNP1, SNP2, SNP3
38HAPLOTYPE
A set of closely linked genetic markers present
on one chromosome which tend to be inherited
together (not easily separable by recombination)
39SNP-Haplotype
SNP
SNP
BLACK EYE BROWN EYE BLACK EYE BLUE EYE BROWN
EYE BROWN EYE
GATATTCGTACGGA-T GATGTTCGTACTGAAT GATATTCGTACGGA-T
GATATTCGTACGGAAT GATGTTCGTACTGAAT GATGTTCGTACTGAA
T
Haplotypes
AG 2/6(BLACK EYE) GTA 3/6(BROWN EYE) AGA 1/6
(BLUE EYE)
1 2 3 4 5 6
DNA Sequence
40HAPLOTYPE CORRELATION WITH PHENOTYPE
- The Haplotype centric approach combines the
information of adjacent SNPs into composite
multilocus haplotypes.
- Haplotypes are not only more informative but
also capture the regional LD information, which
is assumed to be robust and powerful
- Association of haplotype frequencies with the
presence of desired phenotypic frequencies in the
population will help in utilizing the maximum
potential of SNP as a marker.
41ADVANTAGES
- SNPs ARE THE MOST FREQUENT FORM OF DNA VARIATIONS
- THEY ARE THE DISEASE CAUSING MUTATIONS IN MANY
GENES - THEY ARE ABUNDANT HAVE SLOW MUTATION RATES
- EASY TO SCORE
- MAY WORK AS THE NEXT GENERATION OF GENETIC
MARKERS
42LIMITATIONS
1. EXPERIMENTAL DETECTION OF SNPs REQUIRES
IMPLEMENTATION OF EXPENSIVE TECHNOLOGIES
2. NEED FOR LARGE POPULATION DATASETS FOR
ASSOCIATION STUDIES
43Some important SNP database Resources
1. dbSNP (http//www.ncbi.nlm.nih.gov/SNP/)
LocusLink (http//www.ncbi.nlm.nih.gov/LocusLink/l
ist.cgi) 2. TSC (http//snp.cshl.org/) 3. SNPper
(http//snpper.chip.org/bio/) 4. JSNP
(http//snp.ims.u-tokyo.ac.jp/search.html) 5.
GeneSNPs (http//www.genome.utah.edu/genesnps/) 6.
HGVbase (http//hgvbase.cgb.ki.se/) 7. PolyPhen
(http//dove.embl-heidelberg.de/PolyPhen/)
OMIM (http//www.ncbi.nlm.nih.gov/entrez/query.fcg
i?dbOMIM)
8. Human SNP database (http//www-genome.wi.
mit.edu/snp/human/)
Feb. 25. 2003 SI Hung