454 Nextgeneration sequencing Quality Control and Benchmarks - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

454 Nextgeneration sequencing Quality Control and Benchmarks

Description:

Roche 454 Pyrosequencing. High-throughput sequencing-by-synthesis technology ... 'Saturating coverage' : 43 x (Roche), 188 x (Illumina), 841 x (SOLiD) CSIRO. ... – PowerPoint PPT presentation

Number of Views:1949
Avg rating:3.0/5.0
Slides: 21
Provided by: taylorjenp
Category:

less

Transcript and Presenter's Notes

Title: 454 Nextgeneration sequencing Quality Control and Benchmarks


1
454 Next-generation sequencingQuality Control
and Benchmarks
  • Jennifer M Taylor
  • Bioinformatics Leader, Plant Industry
  • June 2009

2
454 Quality Control
  • Technical Details
  • Roche 454 Workflow
  • Applications
  • Data
  • General features
  • Quality Metrics
  • Genome Sequencing
  • Variant Detection

3
Roche 454 Pyrosequencing
  • High-throughput sequencing-by-synthesis
    technology
  • GS FLX with Titanium series reagents
  • Current 200-400 bp read lengths, 400 Mbp 10
    hour run
  • Projected 500-700 bp read lengths (20kb, 8kb,
    3kb paired end reads)
  • Applications
  • Genome resequencing
  • Variant Detection
  • Denovo genome sequencing
  • Metagenomics
  • ChIPSeq protein-DNA interactions
  • RNASeq transcript profiling small RNA
    profiling

4
Roche 454 Pyrosequencing
  • Workflow
  • Library preparation
  • Fragmentation
  • Adaptors
  • Emulsion PCR
  • Immobilisation
  • Dilution to single bead / well
  • Emulsion-based amplification

5
Roche 454 Pyrosequencing
  • Workflow (contd)
  • Sequencing by Synthesis
  • Enrichment of bead with amplified fragments
  • Arrayed to 1 bead / well / fibre-optic slide
  • Nucleotides flows (T, A, C, G)
  • Luciferase reaction measured for 1 or more
    nucleotide hybridisations
  • Data Analysis
  • Integration of position and signal
  • Application of base-calling and quality filtering

6
Roche 454 General Data Features
  • Flowgram / Pyrogram
  • Does not read sequence base directly

Flow Cycles 3 Nucleotide Flows 12 Seq length
13
TCAGGTTTTTAAG
7
Roche 454 Pyrosequencing
  • Sources of Error
  • Undercall / Overcall
  • T(0.49 , 0), T(1.6 , 2)
  • Typically result in insertions / deletions
  • Miscalls
  • TCTTG True TCTCG (overcalled T AND
    undercalled C)
  • Variation in sequence coverage
  • PCR amplification bias
  • PCR error (1)
  • Sanger sequencing can average out PCR error
  • NGS reads derived from a single molecule
    leading to transfer of error rate

8
Roche 454 Pyrosequencing
TCAGGTTTTTAAG TAACGGTTTACGG
  • For sequence length (n)
  • Min(f) n Max(f) 3n1
  • n fixed f N(µ,s²)
  • µ and s² increase linearly with sequence length
  • Nucleotide frequencies ? equal µ ? max s² ? min
  • f fixed n N(µ,s²)
  • µ and s² increase linearly with flow cycle number
  • Nucleotide frequencies ? equal µ ? min s² ? min

Kong, 2009
9
Roche 454 Quality Filters/Scores
0.5 0.55 0.6 0.65
0.7 0.75
  • 0.5 lt signal lt 0.7 overlap region
  • Allow only those reads lt5 of flows in overlap
    region
  • Excluded reads trimmed from end until
  • lt5 of flows in overlap region
  • lt 82 flows (21 flow cycles)
  • Exclude reads gt 5 ambiguous calls (N)

Margulies et al., 2005
10
Roche 454 Quality Filters/Scores
  • n length of homopolymer
  • s signal
  • j is position in read
  • P(sn) empirically determined to follow a
    Gaussian distribution
  • P(n) for random nucleotide sequence is (¼)n

Margulies et al., 2005
11
454 Quality Control
  • Technical Details
  • 454 Workflow
  • Applications
  • Data
  • General features
  • Quality Metrics
  • Genome Sequencing
  • Variant Detection

12
Genome sequencing Accuracy and Coverage
  • Alignment of multiple reads
  • Integration with Sanger sequencing
  • Wicker et al., 2006 Barley Genome
  • Compared 454-derived and ABI-Sanger derived
    consensus sequences
  • Error rates of 0.07 / position
  • Moore et al., 2006 Plastid Genomes
  • Comparison across genomes w.r.t consensus
  • 0.031 0.043

13
Roche 454 Quality Filters/Scores
  • Modifications to Quality Filters / Scores Huse
    et al., 2007
  • 43 reference templates of known sequence
  • Divergent bacteria
  • gt 340,000 reads
  • P(n) for random nucleotide sequence is (¼)n
  • Penalises long homopolymer indiscriminately

14
Accuracy and Coverage
  • Huse et al., 2007
  • Error rate of 0.49
  • 39 were homopolymer effects (insertions 36,
    deletions 27)
  • lt 2 of reads accounted for nearly 50 of errors.
  • Ambiguous bases
  • Strong correlation between Ns and other types of
    errors
  • 454 quality control allows up to 5 Ns per read
  • Removal of all sequences containing Ns
  • significant error rate improvement 0.24

15
Accuracy and Coverage
  • Harismenday et al., 2009
  • 260 kB across 4 individuals 454, Illumina, ABI
    Solid, ABI Sanger
  • Saturating coverage 43 x (Roche), 188 x
    (Illumina), 841 x (SOLiD)

16
Accuracy and Coverage Variant Detection
  • Harismenday et al., 2009
  • Variant detection accuracy

ABI Sanger FP 0.9 FN 0.31
17
Accuracy and Coverage
  • Amplicon End Bias
  • 2.3 of total reference sequence
  • 56 of Illumina sequence reads
  • 11 of SOLiD
  • 5 of 454
  • Bias after fragmentation
  • SOLiD and 454 library preparation adaptations
  • Repeats
  • SOLiD (1/2 fold coverage)
  • 454 ( equal coverage)
  • Illumina (2 fold coverage)
  • Sequence composition
  • Low coverage regions for SR tend to be AT rich.

18
Conclusions
  • 454 Quality control scores need optimisation
  • Naïve penalties for homopolymer length
  • Inadequate control of ambiguities
  • Lack of control of undercalling
  • Brockman et al., 2008 Genome Research
  • Uniformity of per-base sequence coverage needs to
    be improved
  • Pyrosequencing shows high specificity in variant
    detection and low error rates in the construction
    of consensus sequencing IN THE PRESENCE OF
    saturating coverage.

19
Acknowledgements References
  • Andrew Spriggs
  • Karl Gordon
  • David Townley
  • David Lovell
  • Brockman et al., Genome Research 2008, 18763-770
  • Marguiles et al., Nature 2005, 437(15)376-380
  • Kong, J. Comp. Biology 2009, 16(1)1-12
  • Huse et al., Genome Biology 2007, 8R143
  • Harismendy et al., Genome Biology 2009, 10R32

20
Thank you
Plant Industry Jennifer M Taylor Bionformatics
Leader Phone 61 2 62464929 Email
Jen.Taylor_at_csiro.au Web
Write a Comment
User Comments (0)
About PowerShow.com