Title: Targeted Sequencing of Human Genomes, Transcriptomes, and Methylomes
1Targeted Sequencing of Human Genomes,
Transcriptomes, and Methylomes
- Jin Billy Li
- George Church Lab
- Harvard Medical School
- jli_at_genetics.med.harvard.edu
2Genetic Loci X Sample Size Information
PCR seq Mass-spec
SNP array
samples
Shotgun seq RNA-seq ChIP-seq
genetic loci
3Target Capturing with Padlock Probes (aka MIPs)
pol
lig
feature 1
feature n
PCR (or RCA)
Porreca et al., Nat Methods 2007
4Mass Production of Padlock Oligos
150 nt
100 nt
50 nt
55k features of up to 200nt
510,000-fold Improvement Since Nov 20071. longer
hybridization time 2. more probes 3. right
dNTP
1
2
3
Li et al., in prepration
20-fold improvement already by better probe
design and synthesis
610,000-fold Improvement Since Nov 20071. longer
hybridization time 2. more probes 3. right
dNTP
1
2
3
Li et al., in prepration
20-fold improvement already by better probe
design and synthesis
710,000-fold Improvement Since Nov 20071. longer
hybridization time 2. more probes 3. right
dNTP
1
2
3
Li et al., in prepration
20-fold improvement already by better probe
design and synthesis
8Improved Technology -gt Better Performance
Sensitivity Uniformity
Correlation
Current
Current
Nov 2007
Nov 2007
95 captured 85 within 100-fold range 55 within
10-fold range
Li et al., in prepration
9Summary of Improvements
Nov 2007 Current
Specificity 100 100
Sensitivity/Multiplexity (of 55k) 18 95
Uniformity (in 100-fold range) 16 85
Correlation of replicates (r) 0.35 0.98
Accuracy (heterozygous calls) 31 99
10Targeted Capturing of
- Genomes
- Exome PGP etc.
- Contiguous regions or gene panels
- SNPs
- Hypermutable CpG dinucleotides
- Transcriptomes
- Alleotyping
- RNA editing sites
- Methylomes
- CpG methylation
11Targeted Capturing of
- Genomes
- Exome PGP etc.
- Contiguous regions or gene panels
- SNPs
- Hypermutable CpG dinucleotides
- Transcriptomes
- Alleotyping
- RNA editing sites
- Methylomes
- CpG methylation
12Predicting Putative Editing Sites
A -gt I (G) RNA Editing
- Post-transcriptional A -gt I
- I is read as G during translation
- Only 10 targets are known in human coding regions
A in the genome
G in some mRNAs or ESTs
13Discovery of 100s of Novel Editing Sites
36,000 predicted editing sites gDNA 7 tissue
cDNAs from an individual
Padlock Solexa 239 sites found to be edited
Validation (PCR Sanger) 18 of 20 random sites
are obviously edited
with Erez Levanon, in preparation
14 Genomic DNA
Example VEZF1
RNA - cerebellum
RNA - corpus callosum
RNA - frontal lobe
RNA - diencephalon
RNA - intestine
RNA - kidney
RNA - adrenal
15Bisulfite Padlock Probes (BSP) CpG Methylation
Bisulfite-treated genome
3-base genome
High specificity of padlock
16Methylation Level Accurately Measured
BSP-BSP correlation
BSP-Sanger correlation
Methylation level estimated by Sanger sequencing
Methylation level, replicate 2
r 0.979
r 0.966
Methylation level measured by BSP sequencing
Methylation level, replicate 1
17Methylation Pattern around GenesGene-Body
Methylation
with Madeleine Price Ball, in preparation (poster)
18Acknowledgements
Padlock technology Kun Zhang John Aach Abraham
Rosenbaum Jay Shendure Greg Porreca Annika
Ahlford RNA editing Erez Levanon Jung-Ki
Yoon CpG methylation Madeleine Price
Ball Church Lab
Sequencing Yuan Gao Bin Xie Bob Steen
Agilent Emily Leproust Wilson Woo
George Church
19Superior Quality of Padlock Oligos
55k features of up to 200nt
PCR (2x)
Solexa sequencing
150 nt
Fraction of probes
100 nt
50 nt
20From Agilent Oligos to Padlock Probesamplificatio
n and selection
DpnII
T
18bp
Agilent oligo, 136 bp 18bp
PCR
UA
p
? exonuclease
U
Annealed with DpnII guide oligo
U
NN
USER DpnII
Padlock probe
21Heterozygous Genotypes Correctly Called
before
after
Homozygous wild type Heterozygous
variation Homozygous variation
22Methods in Comparison
Padlock Array-based hyb
Upfront probe cost (10-20 of exome) 12,000 per 55k 100mers 600 per 385k 70mers
Probes amplifiable? Yes No
Reaction phase Solution, 10-20 µl Surface, 200 µl
Enzymatic hyb? Yes No
gDNA required 0.5-1 µg 20 µg (WGA)
Efficiency (-gtaccuracy) 1 N/A (lt0.1?)
Uniformity 100-fold range 10-fold range
Specificity 100 on target 30-80 on or near target
23Differential Clamping at Ligation Junction
24 GC VS Capturing Efficiency
2599 Concordance Between Padlock and HapMap
26The Editing Calls Are Well Correlated
r 0.964
27Bisulfite Padlock Probes (BSP) CpG Methylation
Bisulfite-treated genome
- 10k CpG sites tiling the ENCODE regions
- 1 CpG site every 3kb region on average
- High specificity
- 79 of 80 Sanger reads match correct locations
28collected in a tube
B
P
shearing, end polishing
PCR
B
P
adapter ligation
? exonuclease
B
hybridization in closed-tube solution
strep
B
denaturing, PCR
Li et al., unpublished
29Methods in Comparison
Padlock Array-based hyb Biotin-coupled hyb
Upfront probe cost (10-20 of exome) 12,000 per 55k 100mers 600 per 385k 70mers 500 per 244k 60mers
Probes amplifiable? Yes No Yes
Reaction phase Solution, 10-20 µl Surface, 200 µl Solution, 10-20 µl
Enzymes in hyb? Yes No No
gDNA required 0.5-1 µg 20 µg (WGA) 0.5-1 µg
Efficiency (-gtaccuracy) 1 N/A (lt0.1?) 10?
Uniformity 100-fold range 10-fold range 10-fold range?
Specificity 100 on target 30-80 on or near target 55 on or near target
30Two Tech Replicates Are Well Correlated
Uniformity
Correlation of counts
Number of reads per site
Counts, replicate 2
Counts, replicate 1
Ranked target sites