Title: SNP Resources: Finding SNPs,
1SNP Resources Finding SNPs, Databases and Data
Extraction
Debbie Nickerson debnick_at_u.washington.edu SeattleS
NPs
2Complex inheritance/disease
Many Other Genes
Variant Gene
Environment
Disease
Diabetes Heart Disease Schizophrenia Obesity Mu
ltiple Sclerosis Celiac Disease Cancer Asthma
Autism
Two hypotheses 1- common disease/common
variant? 2- common disease/many rare variants?
3Genomic Variation
Human Genetic Variation
Copy-Number Variants
Single Nucleotide Polymorphisms
Small indels
structural variation
Frequency
- Gene-rich, eg immune response, drug metabolism
- Abundant
cytogenetic
1 bp
1 chr
Size
4Total sequence variation in humans
Population size 6x109 (diploid) Mutation
rate 2x108 per bp per generation Expected
hits 240 for each bp ?Every variant compatible
with life exists in the population BUT Most are
vanishingly rare Compare 2 haploid genomes 1 SNP
per 1331 bp
The International SNP Map Working Group, Nature
409928 - 933 (2001)
5Building Maps of Single Nucleotide
Polymorphisms(SNPs)ATTCGGCATGAAATTCGGGATGAA
- Developed in two overlapping phases
- SNP Discovery
- SNP Genotyping
6Finding SNPs Sequence-based SNP Mining
Genomic
RRS Library
Random Shotgun
DNA SEQUENCING
Shotgun Overlap
Align to Reference
RANDOM Sequence Overlap - SNP Discovery
GTTACGCCAATACAGGATCCAGGAGATTACC GTTACGCCAATACAGCAT
CCAGGAGATTACC
gt 11 Million SNPs
7Increasing Sample Size Improves SNP Discovery
GTTACGCCAATACAGGATCCAGGAGATTACC GTTACGCCAATACAGCAT
CCAGGAGATTACC
2 chromosomes
Fraction of SNPs Discovered
New 1000 Genome Program
8Genotype - Phenotype Studies
You have candidate gene/region/pathway of
interest and samples ready to study
What SNPs are available? How do I find the
common SNPs? What is the validation/quality of
the SNPs? Are these SNPs informative in my
population/samples? What can I download
information? How do I pick the best SNPs? -
Dana Crawford
9Minimal SNP information for genotyping/characteriz
ation
- What is the SNP? Flanking sequence and
alleles. - FASTA format
- gtsnp_name
- ACCGAGTAGCCAG
- A/G
- ACTGGGATAGAAC
- dbSNP reference SNP (rs )
- Where is the SNP mapped? Exon, promoter, UTR,
etc - How was it discovered? Method
- What assurances do you have that it is real?
Validated how? - What population African, European, etc?
- What is the allele frequency of each SNP?
Common (gt5), rare - Are other SNPs associated - redundant?
- Is genotyping data for control populations
available?
10Finding SNPs Databases and Extraction
How do I find and download SNP data for
analysis/genotyping?
1. SeattleSNPs - Candidate gene website 2.
Other web applications GVS HapMap Genome
Browser 3. Entrez Gene - dbSNP - Entrez SNP
11Finding SNPs Databases and Extraction
How do I find and download SNP data for
analysis/genotyping?
1. SeattleSNPs - Candidate gene website 2.
Other web applications GVS HapMap Genome
Browser 3. Entrez Gene - dbSNP - Entrez SNP
12Finding SNPs Seattle SNPs Candidate Genes
pga.gs.washington.edu
13Finding SNPs SeattleSNPs Candidate Genes
Example - PCSK9
14Finding SNPs SeattleSNPs Candidate Genes
15Finding SNPs SeattleSNPs Candidate Genes
16(No Transcript)
17(No Transcript)
18AD
ED
19SNP_pos lttabgt Ind_ID lttabgt allele1 lttabgt
allele2 Repeat for all individuals Repeat for
next SNP
20PolyPhen - Polymorphism Phenotyping Structural
protein characteristics and evolutionary
comparison SIFT Sorting Intolerant From
Tolerant Evolutionary comparison of
non-synonymous SNPs
21Finding SNPs SeattleSNPs Candidate Genes
pga.gs.washington.edu
22Finding SNPs Databases and Extraction
How do I find and download SNP data for
analysis/genotyping?
1. SeattleSNPs - Candidate gene website 2.
Other web applications GVS HapMap Genome
Browser 3. Entrez Gene - dbSNP - Entrez SNP
23GVS Genome Variation Server
http//gvs.gs.washington.edu/GVS/
Provides rapid analysis of 4.5 million genotyped
SNPs from dbSNP and the HapMap Mapped to human
genome build 36 (hg18) Displays genotype data
in text and image formats Displays tagSNPs or
clusters of informative SNPs in text and image
formats Displays linkage disequilibrium (LD) in
text and image formats Online tutorial provided
at OpenHelix.com
24GVS Genome Variation Server
LDLR
http//gvs.gs.washington.edu/GVS/
25(No Transcript)
26GVS Genome Variation Server
27GVS Genome Variation Server
- Table of genotypes
- Image of visual genotypes
-
28GVS Genome Variation Server
Genotypes displayed in prettybase table and
visual genotype graphic
29GVS Genome Variation Server
30GVS Genome Variation Server
Dense genotypes around a candidate gene can be
integrated with broader HapMap genotypes
31GVS Genome Variation Server
Dense genotypes around a candidate gene can be
integrated with lower-density HapMap genotypes
32GVS Genome Variation Server
- Common samples-combined variations
- B. Combined samples- common variations
- Combined samples- combined variations
Common
Combined
33GVS Genome Variation Server
- Common samples- combined variations
-Common samples-
Combined variations
34GVS Genome Variation Server
B. Combined samples- common variations
SeattleSNPs
-Combined samples-
HapMap
35GVS Genome Variation Server
C. Combined samples- combined variations
Combined variations
-Combined samples-
36(No Transcript)
37(No Transcript)
38Finding SNPs Databases and Extraction
How do I find and download SNP data for
analysis/genotyping?
1. SeattleSNPs - Candidate gene website 2.
Other web applications GVS HapMap Genome
Browser 3. Entrez Gene - dbSNP - Entrez SNP
39www.hapmap.org
40Finding SNPs HapMap Browser
41Finding SNPs HapMap Browser
- HapMap data sets are useful because individual
genotype data in deeply sampled populations can
be used to determine optimal genotyping
strategies (tagSNPs) or perform population
genetic analyses (linkage disequilbrium) - Data are specific to the HapMap project (not all
dbSNP) - HapMap data is available in dbSNP
- Visualization of data and direct access to
- SNP data, individual genotypes, and LD analysis
- possible in the browser and formats can be
saved - for Haploview
42Finding SNPs Databases and Extraction
How do I find and download SNP data for
analysis/genotyping?
1. SeattleSNPs - Candidate gene website 2.
Other web applications GVS HapMap Genome
Browser 3. Entrez Gene - dbSNP - Entrez SNP
43NCBI - Database Resource
PCSK9
www.ncbi.nlm.nih.gov
44Finding SNPs using NCBI databases
http//www.ncbi.nlm.nih.gov/
45(No Transcript)
46(No Transcript)
47Finding SNPs using NCBI databases
http//www.ncbi.nlm.nih.gov/
48(No Transcript)
49PCSK9
50(No Transcript)
51(No Transcript)
52Finding SNPs - Entrez SNP Summary
- dbSNP is useful for investigating detailed
information on a - small number SNPs - and its good for a picture
of the gene - Entrez SNP is a direct, fast database for
querying SNP data - Data from Entrez SNP can be retrieved in batches
for many SNPs - Entrez SNP data can be limited to specific
subsets of SNPs - and formatted in plain text for easy parsing and
manipulation - More detailed queries can be formed using
specific field tags - for retrieving SNP data
53Summary Finding SNPs Databases and Extraction
Reviewing candidate genes using views and
resources in - SeattleSNPs Integration of
dense, gene-centric SNP maps with genomic HapMap
SNPs - GVS HapMap viewer NCBI databases
through Entrez portal -Entrez Gene, dbSNP,
Entrez SNP -many ways to retrieve and format data
54Genome Variation Server GVS
55(No Transcript)
56(No Transcript)
57New Variation to Consider - Structural Variation
Types of Structural Variants Insertions/Deletions
Inversions Duplications Translocations Size La
rge-scale (gt100 kb) intermediate-scale (500
bp100 kb) Fine-scale (1500 bp)
Nature 447 161-165, 2007
58Detection of Outliers of the Distribution
X-linked SNP
Unknown SNP
59 Genetic Strategy - New Insights
STRONG
LINKAGE
ASSOCIATION
effect size
Common Disease Many Rare Variants
??
WEAK
allele frequency
HIGH
LOW
Ardlie, Kruglyak Seielstad (2002) Nat. Genet.
Rev. 3 299-309
60Sequencing Known Candidate Genes for Functional
Variation From Individuals at the Tails of the
Trait Distribution
Individuals
Low HDL
High HDL
High Density Lipoprotein (HDL)
61ABCA1 and HDL-C
- Cohen et al, Science
- 305, 869-872, 2004
- Many examples emerging
- Common Disease
- Rare Variants
- Observed excess of rare, nonsynonymous variants
in low HDL-C samples at ABCA1 - Demonstrated functional relevance in cell culture
62Personalized Human Genome Sequencing
Solexa - an example
63(No Transcript)