Title: DNA Sequencing: Present Status and Future Challenges
1DNA Sequencing Present Status and Future
Challenges
- Elaine Mardis
- Washington University Genome
- Sequencing Center
2Genome Sequence Present Workflow
Genomic DNA
WGS assembly using ARACHNE algorithm to generate
contigs and supercontigs
3BAC Fingerprinting Gel-based Fragment Separation
96 samples, 25 marker lanes
Marker every fifth lane
29,950 bp
HindIII Restriction Digestion
560 bp
1 agarose 8 hours, 140 volts _at_ 14C Marra et
al., Genome Res., 7, 1072-1084 (1997)
4Contig assemblyphysical map
-
- Software (Image or Bandleader) is used to
identify overlapping clones with common
restriction fragments and assembles them into a
contig (FPC)
Clone
5Sequence data assemblySupercontig creation and
gap filling
(A) A supercontig is constructed by successively
linking pairs of contigs that share at least two
forward-reverse links. Here, three contigs are
joined into one supercontig. (B) ARACHNE attempts
to fill gaps by using paths of contigs. The first
gap in the supercontig shown here is filled with
one contig, and the second gap is filled by a
path consisting of two contigs.
Genome Research 12 177-189 (2002)
6Whole genome map assembly
Genome map
Edit contigs and align to map. Gaps between
clones can be filled with other clones, such as
fosmids, or by generating PCR products from BAC
clones or genomic DNA.
7Current GSC Production Workflow
picking
Qpix
prepping
PlateTrak DNATraks
sequencing
Biomek FX
detection
Each process is documented by barcode entry
into our Oracle database
PE 3700/ 3730
data transfer
QC checks are used to assay quality at each
step in the pipeline
8Qpix picking robot
9PlateTrak 1 2 Robots
10Biomek FX robot
11ABI 3700 Sequencer
- Enhanced sensitivity relative to gel-based
systems - Capillary-based separation of samples eliminates
gel pouring, gel loading, lane tracking - Requires large volumes of buffer, polymer per run
- Moving parts (robot, sheath flow) increase
required maintenance and impact downtime - Sheath flow detection limits sensitivity, laser
illumination scheme causes beam dispersion across
sheath flow
12New generation instrument
- In-capillary detection by fixed laser eliminates
LgtR fade and sheath flow, improves sensitivity - Direct load from reaction plate eliminates
robotic volume transfer, decreases minimal load
volume - Increased plate capacity, decreased
buffer/polymer demand and automated plate
handling decrease operator intervention
ABI 3730 xl DNA Analyzer
13(No Transcript)
14Improved results with lower template input
15Issue Large clone end sequencing
Due to lower sensitivity, end-sequencing of BAC
and fosmid clones was not robust on the 3700.
To achieve reliable results, we have utilized the
ABI 3100s in a specialty group approach
- requires 1/4th x BDT reactions
- requires 100 cycles in the thermal cycler
- lower throughput capability
However, the increasing emphasis on large clone
linkage for WGS approach requires higher
throughput, lower cost for these templates
16High-throughput sequencing(c. 2002)
- GSC produces 2.6 M reads monthly
- Plasmid template preps by robotic SPRI
- Sequencing reactions in 384 well/Biomek FX
- Loading 120 ABI 3700s
- Combined WGS plasmid, fosmid and BAC end reads
with a physical map reference is becoming the
strategy of choice for de novo genome sequencing - Our recent introduction of 30 x 3730 instruments
will increase read capacity to 3.2 M reads
monthly, and allow us to efficiently and more
cheaply end sequence large clone types such as
fosmids and BACs.
17What are the future challenges to high-throughput
genome sequencing?
- Most cost decreases have been incremental, rather
than - monumental. Large cost decreases will require a
revolutionary - approach to detectionperhaps not based on light.
- There is a fundamental disconnect between the
sample size - produced by current prepping and sequencing
processes, and the - sensitivity of current instrumentation for
detection/analysis.
3. There is a need for additional fluor
combinations to enable reaction multiplexing.
18What are the current trends in DNA sequencing?
Re-sequencing of the human genome is becoming a
key approach toward understanding certain
diseases
Characterizing the genetic differences between
affected vs. unaffected individuals
Characterizing the genetic differences between
diseased vs. normal cells
Developing diagnostic/prognostic assays for
disease
19What are the technical challenges of
re-sequencing human samples?
- Limited quantities of samples
- Large sample numbers w/multiple analyses
- Critical need to avoid sample mix-ups/QA
- Ultimately instrumentation and methods that
reduce cost per reaction to well below current
costs and require little/no hands-on sample
manipulation - Informatics tools to assemble and analyze data
intelligently and correctly (!) - Database tools/features to combine different data
types in a meaningful way that aids
interpretation
20General approach
Design exon- and/or intron- specific PCR primers
Annotated human sequence from Ensembl
- lowered emphasis on readlength, increased
emphasis on speed of fragment separation and
analysis
21Re-sequencing Data pipeline
Sequence
Phred
Phrap
Sequence each end
Base-calling
Sequence alignment
Final quality determination
Quality determination
of the PCR fragment
PolyPhred
Mutation/polymorphism detection
Consed
Sequence viewing
Mutation/polymorphism tagging
Analysis
22Laboratory Workflow Web interface to database
ORACLE Database Mutation data laboratory
tracking data gene feature data
(Courtesy of D. Nickerson)
Interactive Visual Tools Data Quality Checking
(Courtesy of D. Nickerson)
23Challenges for Re-Sequencing Data Analysis
- Need improved signal processing software for
traces - - better background subtraction to eliminate
false - positives in detecting sequence differences
2. Need improved software for detecting
differences between aligned sequences - less
manual review of traces and alignments - more
analytical view of results/output
3. Statistical packages that help make sense of
re-sequencing data in the context of genetics,
probability, mutation rates,
prognosis/outcome, etc.
24Trace data examination
25Vg software tool is used to cluster and visualize
data from re-sequencing of the same
genomic regions of multiple individuals
Data Organization and Visualization
26Acknowledgements
- GSC
- Matt Hickenbotham - Rick Wilson
- Jim Eldred - John McPherson
- Darren OBrien - Bob Waterston
- Tom Erb
- Joe Strong
- Lisa Cook
- Donald Williams
- Nathan Sander
- Josh Conyers
- Todd Carter
- Lliam Christy
- - Pat Minx
University of Washington - Debbie Nickerson