Title: Tomato Project Group
1Second Tomato Finishing Workshop Chromosome 4
- Tomato Project Group
- Wellcome Trust Sanger Institute
- 25th April 2008
2Chromosome 4 Introduction
- Data Flow at WTSI
- Sequencing Method Used
- Finishing Strategies
- Use of Overlapping Data
- Chr4 Sequence Update
- Discussion points for Workshop
- Unmapped BACs
- Examples of Problem Clones
- Dealing with Large Repeats
3UK - Chromosome 4
- Gene space estimate for Chromosome 4 is 19Mb
- Mapping, sequencing and finishing at Wellcome
Trust Sanger Institute (WTSI) - BAC by BAC sequencing approach
- Approximately 200 BACs
- Funding at WTSI ends October 31st 2008
4Overview of WTSI Clone Pipeline
- Clone Selection and Verification
- Clones entered into pipeline
Mapping
BACs assigned to chr4 sequencing project on SGN
BAC registry
- Clone DNA Prep
- Digest Confirmation
- Library Construction (plasmid)
Subcloning
- Plasmid Prep
- Sequencing Processing
Sequence Contigs gt2Kb available on Sanger FTP
site and Public Databases Sequencing in
Progress
Shotgun Sequencing
HTGS Phase 1
- Sequence Improvement
- Contig Orientation and Gap Closure
- Confirmation of Assemby (QC)
Finishing
HTGS Phase 2
- Sequences Uploaded
- to SGN
- BAC Registry Updated
Finished Sequence Final EMBL submission Complete
Sequence HTGS Phase 3
5Clones Selection and Verification
- BACs selected primarily from the
- HindIII (LE-HBa-) and MboI (SL_MboI) libraries
- Using Seed BACs from SGN,
- end sequence alignment and FPC analysis
- New BACs selected from in house overgoes for
markers - Selected 5 clones from the fosmid library
- based on end sequence alignments and
fingerprints
6Plasmid Prep and Shotgun Sequencing
- Optimised for 384 well prep and sequencing
- Capillary Sequencing
- AB3730s with AB Big Dye Terminator
- pUC118 Double Stranded Sequencing Vector
- 4-6Kb inserts, double end sequenced
BACs Aim for 6x-8x Coverage Average Insert
100-150Kb (LE_HBa- and SL_MBol- Libraries) 2x
or 3x 384 plates per BAC 750 paired end
reads 1500 reads in total Average 10-15 contigs
Fosmids Average Insert 35Kb 1x 384 plates
7Clone Finishing
Gap4 (Staden) used to view and manipulate
sequence data
Manual Finishing
QC Checking
8Manual Finishing of BACsBACs viewed in relation
to map
- BACs are viewed in relation to the mapped minimal
tile path - Use in house tpf visualisation tool e.g. ctg503
9Use of Overlapping Sequences
- From Minimal Tile Path the region finished in
each clone depends on the order the clones enter
finishing - Finish unique sequence with a 2000bp overlap
between clones
BAC1
BAC4 gap closure
BAC2
BAC3
total BAC insert
finished region
Final order and orientation of finished BACs are
given in the AGP file e.g. BAC1-BAC2-BAC4-BAC3
10Summary of Clone Gap Closure Strategies
- Make use of paired ends to order and orientate
contigs - Identify whether gaps are spanned or unspanned
orchid example - Identify any repeats associated with gaps
dotter example - Estimate gap sizes using restriction digest data
- This will determine appropriate strategy for gap
closure e.g. - primer/oligo walking into regions of low quality
or gaps spanned by paired end reads - PCR and direct walking on BAC DNA into regions of
low quality and unspanned gaps (also attempted on
unresolved spanned gaps) - Use of alternative chemistries where appropriate
- structural problems, mono- di-nuclotide runs
11OrchidRead pair Visualisation Tool
Contiguous sequence with good read pair coverage
12Visualising Repeats associated with gaps
Inverted Repeat
Direct Repeat
13Restriction Digests
- Minimum of three restriction enzymes used to
confirm the assembly - Selection depends on organism and the nature of
the sequence - S. lycopersicum BACs are digested with
- BamHI
- EcoRI
- HindIII
- Comparison of real and virtual digest of entire
BAC sequence
14ConfirmWTSI In-house digest visualisation tool
15In-house digest visualisation tool
16Clone Gap Closure Strategies
- Make use of paired ends to order and orientate
contigs - Identify whether gaps are spanned or unspanned
orchid - Identify any repeats associated with gaps
dotter - Estimate gap sizes using restriction digest
- This will determine appropriate strategy for gap
closure e.g. - primer/oligo walking into regions of low quality
or gaps spanned by paired end reads - PCR and direct walking on BAC DNA into regions of
low quality and unspanned gaps (also attempted on
unresolved spanned gaps) - Use of alternative chemistries where appropriate
- structural problems, mono- di-nuclotide runs
17Sequencing Chemistries and Additives used in
Finishing
- 41 mix ratio of AB Big Dye Terminator AB dGTP
Terminator - used for general finishing reactions, not
problem specific - AB dGTP Terminator
- used for di-nucleotide runs and inverted repeats
- Additive A (SequenceRx Enhancer Solution A -
Invitrogen) - Dimethyl sulfoxide (DMSO)
- Additive ADMS0dGTP
- used for mono-nucloetide runs, inverted repeats
- Sequence Finishing Kit (SFK) (TempliPhi -
Amersham) - used to increase DNA yield
- useful for structural problems caused by
inverted repeats
18Alternative Gap Closure Strategies
- Specialist Subcloning
- Small Insert Libraries (SIL)
- Double Stranded pUC or Single Stranded M13
- Large Insert Libraries (LIL)
- Transposon Libraries (TIL)
- Restriction Fragment SIL (RFSIL)
- Alternative Strategies for dealing with large
repeats - - points for further discussion on Tuesday
- - what repeats have other chromosomes found?
19Clone Gap Closure Strategies
- Make use of paired ends to order and orientate
contigs - Identify whether gaps are spanned or unspanned
orchid - Identify any repeats associated with gaps
dotter - Estimate gap sizes using restriction digest
- This will determine appropriate strategy for gap
closure e.g. - primer/oligo walking into regions of low quality
or gaps spanned by paired end reads - PCR and direct walking on BAC DNA into regions of
low quality and unspanned gaps (also attempted on
unresolved spanned gaps) - Use of alternative chemistries where appropriate
- structural problems, mono- di-nuclotide runs
20Use of Misc_Feature Tags in EMBL/GenBank/DDBJ
- Used regularly on finished sequence to identify
regions of - uni-directional chemistry when dGTP only
- single subclone regions
- including SIL and TIL only regions
- pcr only
- Single reads from direct walks on BAC DNA
- data only from overlapping BACs
- E.coli Transposon insertion sites
- tag sp6 and t7 ends of overlaps (tomato)
- gap sizes of force joins in tandem repeats
21Misc_Feature Tag Example Clone End Tags
Accession
Length of sequence
Whole Clone Finished
Both ends of clone cited
22Misc_Feature Tag Example
23QC Check of Clone Assembly
- Before submission to public databases as HTGS
phase 3 complete, all assembled BACs undergo
several QC checks - all reasonable chemistry attempts have been made
for any specific problem types - all bases are above phred30
- orientation of paired end reads checked across
assembly - assembly is confirmed by restriction digest data
- correct misc_feature tags have been used to
identify any regions where appropriate
Ensures on high quality contiguous sequence with
low error rate
24Chromosome 4 Clone Pipeline
Additional 15 BACs finished - not on chromosome 4
from FISH
25Unmapped BACs moved from chr4
bTH82D4 (LE_HBa082D04) moved to chr7 (on FISH
map) bTH91D14 (LE_HBa091D14) moved to chr5 (on
FISH map)
26Points for Discussion at Workshop
- What problematic sequence have other groups
encountered? - Strategies for finishing repeats used by other
chromosome groups? - Unmapped BACs any from other chromosomes?
27Acknowledgements
Cornell University Lukas Mueller Robert
Buels Jim Giovannoni Steve Tanksley Colorado
State University Stephen Stack Suzanne
Royer Song-Bin Chang Arizona Genomics
Institute Rod Wing Seunghee Lee MIPS/IBI
Institute for Bioinformatics Klaus Mayer Remy
Bruggmann Wageningen University Rene Klein
Lankhorst Hans de Jong Dora Szinay
- Wellcome Trust Sanger Institute
- Karen McLaren
- Clare Riddle
- Sean Humphray
- Christine Nicholson
- Carol Scott
- Stuart McLaren
- Matt Jones
- Christine Lloyd
- Sarah Sims
- Karen Oliver
- Jane Rogers
- Imperial College London
- Gerard Bishop
- Daniel Buchan
- James Abbott
FUNDING