Title: SNPMarking the Human Genome
1SNP-Marking the Human Genome
Daniel C. Koboldt and Raymond D.
Miller Department of Genetics, Washington
University School of Medicine, St. Louis,
Missouri, USA
2What is a SNP?
DNA sequence is comprised of four letters or
"bases" - A, C, G, and T. The human genome
sequence contains some 3.2 billion base pairs.
Single Nucleotide Polymorphisms, or SNPs
(pronounced snips) are single-base locations in
the DNA sequence that vary between individuals.
Typically, SNPs have just two forms or
alleles. SNPs are the most common form of DNA
sequence variation between individuals.
3The HapMap Project
The International HapMap Project is a
multi-country effort to identify and characterize
the genetic differences in humans. In the first
phase of the HapMap, over one million SNPs were
genotyped in 269 samples from four populations.
Information from the HapMap will help
researchers find genes that affect health,
disease risk, and response to pharmaceuticals.
4SNPs with Neighbors
Millions of SNPs were added to public databases
during the HapMap Project. Using HapMap data, we
have shown that SNPs are clustered near one
another, not distributed randomly. More than a
third (34.1) have a neighbor SNP within 30 bp.
5The SNP-In-Primer Problem
It is critical to avoid designing primers with
polymorphic sites, since even a single base
mismatch can affect primer hybridization.
SNP ?
Target TTAGACACGGAACC Primer TTAGACACGGAACC
Target TTAGAGACGGAACC Primer TTAGACACGGAACC
6Increased Genotyping Failures
- Genotype failure rates, by platform, for assays
with or without SNP-in-primer. - Passed HapMap data from release 16.
- Passed and failed HapMap data from release 16.
- AffyAffymetrix
- BeadBeadArray
- FPFP-TDI
- InvInvader
- MIPMIP
- PerlPerlegen
- SeqSequenom
7Our SNP-Marking Algorithm
8Ambiguity Codes for SNPs
9Get the SNP-Marked Sequence
Our Site http//snp.wustl.edu/
UCSC Genome Browser ftp//hgdownload.cse.ucsc.edu
10Future Directions
- SNPs are not the only sequence variation in the
genome that can affect primer hybridization - Deletion-Insertion Polymorphisms, or DIPs, are
more difficult to detect and to annotate because
they alter the length of DNA sequence. - We need a systematic way to identify and
describe DIPs and mark their positions in the
human genome.
11Our Thanks To...
- International HapMap Consortium
- Mummi Thorisson
- National Institute of Health
-
- The UCSC Genome Browser
- Heather Trumbower
- Our Collaborators Pui Kwoks group, UCSF.
12Key Points
The more than 10 million SNPs in the genome are
clustered near one another, not randomly
distributed. This has profound implications for
assay design, since even a single base mismatch
can prevent a primer from annealing
correctly. We developed a version of the human
genome assembly in which SNP positions are marked
by ambiguity code, making it optimal for assay
design.