Title: Molecular biology story: DNA "the Queen molecule"
1Molecular biology story DNA "the Queen
molecule"
Bioinformatics and Comparative Genome AnalysisÂ
Monday, march 19th 2007 Tunis
Odile Ozier-Kalogeropoulos Institut
Pasteur Université Pierre et Marie Curie E-mail
odozier_at_pasteur.fr
2Introduction
3Genomes two views
4View of genomes for biologists
http//www.pasteur.fr/externe
http//genetique.snv.jussieu.fr
5View of genomes for computer scientists
Pasteur Genopole ÃŽle-de-France, Plate-forme
technologique 4
6DNA molecule two views
7View 1
James Watson and Francis Crick (1953)
8View 2
5'
3'
3'
5'
9DNA sequence one view
10DNA sequence one view
11Sequencing DNA, "the Queen molecule"
12Sequencing DNA, "the Queen molecule"
Most of sequencing methods are based on the
natural living systems use to copy and repair
their own genomes
13Reminder!
Cell DNA synthesis
14Reminder!
Cell DNA synthesis
The main role of DNA polymerase
15Cell DNA synthesis
3'
http//www.snv.jussieu.fr/vie/dossiers/sequencage/
sequence.htm
16Cell DNA synthesis
17Cell DNA synthesis
18Cell DNA synthesis
191 Foundation of the current state-of-the-art
production genome sequencing
201 Foundation of the current state-of-the-art
production genome sequencing
211 Foundation of the current state-of-the-art
production genome sequencing
The Sanger method
221 Foundation of the current state-of-the-art
production genome sequencing
The Sanger method
1977
231 Foundation of the current state-of-the-art
production genome sequencing
The Sanger method
1977
30th year celebration!
24DNA isolation
Sample preparation
The Sanger method
Sequence production
Assembly and analysis
25DNA isolation
Sample preparation
The Sanger method
Sequence production
Assembly and analysis
26The Sanger method
Focus on
Sequence production
27The Sanger method
http//www.snv.jussieu.fr/vie/dossiers/sequencage/
sequence.htm
28The Sanger method
DNA polymerase
DNA polymerase
http//www.snv.jussieu.fr/vie/dossiers/sequencage/
sequence.htm
29The Sanger method
http//www.snv.jussieu.fr/vie/dossiers/sequencage/
sequence.htm
30The Sanger method
Fragment separation by electrophoresis on
acrylamide gel (resolution 1 base)
31The Sanger method
Reading progression
Fragment separation by electrophoresis on
acrylamide gel (resolution 1 base)
322 Current state-of-the-art production genome
sequencing in high-throughput sequencing
centers
332 Current state-of-the-art production genome
sequencing in high-throughput sequencing centers
Sanger production-scale genome sequencing
requires the 4 successive steps
1
2
DNA isolation
Sample preparation
Laboratory
Chan E.Y. (2005), Mutation res, 573, 13-40
342 Current state-of-the-art production genome
sequencing in high-throughput sequencing centers
Sanger production-scale genome sequencing
requires the 4 successive steps
1
2
3
DNA isolation
Sample preparation
Sequence production
Robots
Laboratory
Chan E.Y. (2005), Mutation res, 573, 13-40
352 Current state-of-the-art production genome
sequencing in high-throughput sequencing centers
Sanger production-scale genome sequencing
requires the 4 successive steps
1
2
3
4
DNA isolation
Sample preparation
Sequence production
Assembly and analysis
Robots
Computers
Laboratory
Chan E.Y. (2005), Mutation res, 573, 13-40
362 Current state-of-the-art production genome
sequencing in high-throughput sequencing centers
Sanger production-scale genome sequencing
requires the 4 successive steps
1
2
3
4
DNA isolation
Sample preparation
Sequence production
Assembly and analysis
Robots
Computers
Laboratory
Humans
Chan E.Y. (2005), Mutation res, 573, 13-40
372 Current state-of-the-art production genome
sequencing in high-throughput sequencing centers
Sequence production
Sequencing robots
Lab technician working with sequencing
machines Courtesy of Celera Genomics
DNA isolation
Sample preparation
Room filled with sequencing machines Courtesy of
Celera Genomics
Laboratory
382 Current state-of-the-art production genome
sequencing in high-throughput sequencing centers
Sequencing robots
Assembly and analysis
Close up of capillaries from a capillary
sequencing machine Courtesy of Celera Genomics
Computers
Lab with sequencing machines Courtesy of Celera
genomics
392 Current state-of-the-art production genome
sequencing in high-throughput sequencing centers
Assembly and analysis
Computers
Plate-forme Génomique, Institut Pasteur
403 Sequencing statistics
41http//www.genomesonline.org
42Bacteria Archea
Eukarya
Metagenomes
http//www.genomesonline.org
43others
F
USA
UK
F
High-throughput sequencing centers by country
http//www.genomesonline.org
444 Why continue sequencing?
45- 4 Why continue sequencing?
- Comparative genomics
- Impact on biomedical research
- The personal genome project
46- 4 Why continue sequencing?
- Comparative genomics
- Impact on biomedical research
- The personal genome project
47Figure 1Â Â Evolutionary relationship between
metazoans that are sequenced or due for
sequencing. The simplified phylogenetic
relationships between the metazoans for which the
complete, or nearly complete, genome sequences
are available or will be available soon.
Evolutionary distances (in million years)
Abel Ureta-Vidal, Laurence Ettwiller
Ewan Birney (2003), Nature rev. genet., 4,
pp251-262
48- International sequence databases Sequence
fragments of 100 000 species - Estimation of the
number of species 14 millions at least...
Number of sequences in GenBank (log scale)
Shendure, 2004 and Wikipedia
The phylogenetic sequence deficit for the Metazoa
Mark Blaxter, 2002
49- International sequence databases Sequence
fragments of 100 000 species - Estimation of the
number of species 14 millions at least...
Vertebrates
Arthropodes
Nematodes
Number of sequences in GenBank (log scale)
Shendure, 2004 and Wikipedia
The phylogenetic sequence deficit for the Metazoa
Mark Blaxter, 2002
50- International sequence databases Sequence
fragments of 100 000 species - Estimation of the
number of species 14 millions at least...
Vertebrates
Arthropodes
Nematodes
Number of sequences in GenBank (log scale)
Shendure, 2004 and Wikipedia
molluscs, worms..
The phylogenetic sequence deficit for the Metazoa
Mark Blaxter, 2002
51- 4 Why continue sequencing?
- Comparative genomics
- Impact on biomedical research
- The personal genome project
52-Single Nucleotide Polymorphism SNP
53HapMap Project
A freely-available public resource to increase
the power and efficiency of genetic association
studies to medical traits
- High-density SNP genotyping across the genome
provides information about - SNP validation, frequency, assay conditions
- correlation structure of alleles in the genome
Mark J. Daly, PhD
54Associated alleles reported
Kirov 2004
Straub 2002 Van den Oord 2003
Williams 2004 Bray 2005
Van den Bogaert 2003 Funke 2004
Mark J. Daly, PhD
Schwab 2003
55- 4 Why continue sequencing?
- Comparative genomics
- Impact on biomedical research
- The personal genome project
56Sequencing of individual human genomes as a
component of preventative medicine
The National Human Genome Research Institute
(NHGRI) solicits grant applications to develop
novel technologies that will enable extremely
low-cost genomic DNA sequencing. (2005-2006)
Revolutionary Genome Sequencing
Technologies The 1000 Genome For 2015
575 Improvements of the Sanger method during
these 30 years
585 Improvements of the Sanger method during these
30 years
DNA isolation
Sample preparation
Sequence production
Assembly and analysis
595 Improvements of the Sanger method during these
30 years
- Production of template DNA
- Labelling Radioactivity/Fluorescent dyes
- - Analysis of the DNA fragments produced
- Radioactivity detection/
- Laser within an automated DNA sequencing machine
- Electrophoresis acrylamide gel/capillaries
DNA isolation
Sample preparation
Sequence production
Assembly and analysis
605 Improvements of the Sanger method during these
30 years
- Production of template DNA
- Labelling Radioactivity/Fluorescent dyes
- - Analysis of the DNA fragments produced
- Radioactivity detection/
- Laser within an automated DNA sequencing machine
- Electrophoresis acrylamide gel/capillaries
DNA isolation
Sample preparation
Sequence production
Assembly and analysis
615 Improvements of the Sanger method during these
30 years
- Production of template DNA
- Labelling Radioactivity/Fluorescent dyes
- - Analysis of the DNA fragments produced
- Radioactivity detection/
- Laser within an automated DNA sequencing machine
DNA isolation
Sample preparation
Sequence production
Assembly and analysis
625 Improvements of the Sanger method during these
30 years
- Production of template DNA
- Labelling Radioactivity/Fluorescent dyes
- - Analysis of the DNA fragments produced
- Radioactivity detection/
- Laser within an automated DNA sequencing machine
- Electrophoresis acrylamide gel/capillaries
DNA isolation
Sample preparation
Sequence production
Assembly and analysis
63- Production of template DNA
- around 1985
DNA isolation
Need of single-stranded DNA for sequencing
64(No Transcript)
65- Sequencing of pure single-stranded DNA from
recombinant M13 particles
66- Production of template DNA
- around 1990
DNA isolation
- Double-stranded DNA from recombinant plasmids or
PCR products - denatured by heat or alcali for sequencing
67DNA isolation
- Recent improvement of
- template DNA production
Multiple displacement amplification
Phi29 DNA Polymerase is the replicative
polymerase from the Bacillus subtilis phage
phi29
DNA templates can be amplified 10 000 fold in a
few hours
Blanco, L. and Salas, M. (1984) Proc. Natl. Acad.
Sci. USA, 81, 5325-5329)
68(No Transcript)
69(No Transcript)
70(No Transcript)
71Recent improvement of template DNA production
Principle
Blanco, PNAS,1989
72DNA isolation
Applications of the multiple displacement
amplification
73DNA isolation
Applications of the multiple displacement
amplification
1. Whole human genome amplification using this
method
2. Sequencing the genome of a single cell
74DNA isolation
Applications of the multiple displacement
amplification
1. Whole human genome amplification using this
method Phi29 DNA polymerase is able to amplify
linear DNA
(Dean et al, PNAS, 2002)
75DNA isolation
Applications of the multiple displacement
amplification
1. Whole human genome amplification using this
method Phi29 DNA polymerase is able to amplify
linear DNA
Cascading strand displacement
Linear DNA
Circular DNA
(Dean et al, PNAS, 2002)
76DNA isolation
Applications of the multiple displacement
amplification
1. Whole human genome amplification using this
method Phi29 DNA polymerase is able to amplify
linear DNA
1-10 copies of human genomic DNA 20-30 mg
product
18 hours at 30C
DNA amplification yield after MDA
(Dean et al, PNAS, 2002)
77DNA isolation
Applications of the multiple displacement
amplification
1. Whole human genome amplification using this
method Phi29 DNA polymerase is able to amplify
linear DNA
- For
- Genome sequencing
- Genetic analysis on blood, microdissected
tissues... - Prenatal diagnosis,
- Anthropological samples...
-
(Dean et al, PNAS, 2002)
78DNA isolation
Applications of the multiple displacement
amplification
2. Sequencing the genome of a single cell
(Zhang et al, Nature Biotech, 2006)
79 Nature Biotechnology 24, 657 - 658 (2006)
doi10.1038/nbt0606-657 Single-cell
genomics Clyde A Hutchison III Â J Craig Venter
Phi29 DNA Polymerase is the replicative
polymerase from the Bacillus subtilis phage
phi29.This polymerase has exceptional strand
displacement and processive synthesis properties.
The polymerase has an inherent 3gt5Â proofreading
exonuclease activity (Blanco, L. and Salas,
M. (1984) Proc. Natl. Acad. Sci.
USA, 81, 5325-5329)
Figure 1. Sequencing the genome of a single
cell. A single cell is isolated by dilution or by
cell sorting. The cell is lysed and the
chromosome is denatured by alkaline treatment.
The cellular DNA is amplified gt109-fold by
multiple displacement amplification (MDA) using
random primers. The hyperbranched DNA product is
resolved by shearing and enzymatic treatments,
then cloned and shotgun sequenced. Ideally, a
complete genome sequence could be assembled from
the data and then annotated.
80DNA isolation
Applications of the multiple displacement
amplification
2. Sequencing the genome of a single cell
A pioneer work and a new world
Polymerase cloning "Ploning"
The authors refer to the DNA populations
amplified from single cell as Polymerase clones,
or "plones"
- Two limitations in this first experiments
- Bias in "plonable" amplification
- Chimeric plones (about 6)
(Zhang et al, Nature Biotech, 2006)
81DNA isolation
Applications of the multiple displacement
amplification
2. Sequencing the genome of a single cell
Most of the diversity of the biosphere remains
unsampled.
(Zhang et al, Nature Biotech, 2006)
82DNA isolation
Applications of the multiple displacement
amplification
2. Sequencing the genome of a single cell
Most of the diversity of the biosphere remains
unsampled. The ability to sequence an entire
genome from a single uncultured cell should
allowed to reveal this enormous biodiversity.
(Zhang et al, Nature Biotech, 2006)
83DNA isolation
Applications of the multiple displacement
amplification
2. Sequencing the genome of a single cell
Most of the diversity of the biosphere remains
unsampled. The ability to sequence an entire
genome from a single uncultured cell should
allowed to reveal this enormous biodiversity.
Metagenomics
(Zhang et al, Nature Biotech, 2006)
846 Alternatives to the Sanger method
Sequencing single molecules of DNA
85Reminder!
The Sanger method is based on the analysis of
populations of DNA molecules
- Analysis of the DNA fragments produced
Radioactivity detection/ Laser within an
automated DNA sequencing machine
Sequence production
866 Alternatives to the Sanger method Sequencing
single molecules of DNA
Cycle extention method on single molecules
1- Template DNA is arrayed on a surface or wells
2- Sequencing reaction steps including
nucleotide incorporation and washes are
performed to identify each base pair. 3- The
extended base pair is detected by fluorescence
or luminescence.
87Sequential base incorporation steps
Template
Primer
Surface
Chan E.Y. (2005), Mutation res, 573, 13-40
88Main features of cycle extention methods
compared to Sanger
- Massive parallelism
- Short read lengths
- Potential for cost reduction
89Pyrosequencing is the most famous cycle
extention method
90From Biotage, http//www.pyrosequencing.com
91Pyrosequencing
From Biotage, http//www.pyrosequencing.com
92From Biotage, http//www.pyrosequencing.com
93a, Read length distribution for the 306,178
high-quality reads of the M. genitalium
sequencing run. This distribution reflects the
base composition of individual sequencing
templates. b, Average read accuracy, at the
single read level, as a function of base position
for the 238,066 mapped reads of the same run
From Biotage, http//www.pyrosequencing.com
94The two main problems of pyrosequencing
a, Read length distribution for the 306,178
high-quality reads of the M. genitalium
sequencing run. This distribution reflects the
base composition of individual sequencing
templates. b, Average read accuracy, at the
single read level, as a function of base position
for the 238,066 mapped reads of the same run
From Biotage, http//www.pyrosequencing.com
95Pyrosequencing massive parallelism
Genome sequencing in microfabricated
high-density picolitre reactors
Margulies et al, 2005
96Genomic DNA is fragmented, ligated to adapters
and separated into single strands
Fragments are bound to beads under conditions
one fragment by bead. The beads are captured in
droplets of a PCR-reaction-mixture-in-oil
emulsion. PCR amplification occurs within each
droplet. Each bead at the end of PCR reaction
carries 10 million copies of an unique DNA
template.
Margulies, 2005, Nature, 437, pp376-380
Margulies et al, 2005
97The emulsion is broken, the DNA strands
denatured and the beads carrying single stranded
DNA clones are deposited into wells of a
fibre-optic slide.
Smaller beads carrying immobilized enzymes
required for pyrosequencing are deposited into
each well.
Margulies et al, 2005
98Sequencing instrument
- Fluidic assembly
- The well-containing
- fibre-optic slide
- c) Computer providing
- the user interface and
- the instrument control
Margulies et al, 2005
99De novo assembly of the bacterial genomes Test on
Mycoplasma genitalium (580 000 bp)
14 hours!
Density of wells 480/1mm2 Total of wells on a
slide 1.6 millions!
Margulies et al, 2005
1007 Sequencing or resequencing?
1017 Sequencing or resequencing?
- Sequencing for studies of genomes of unknown
species - needing long read length
- Resequencing for individual studies using a
known genome - as guide
102Comparison of sequencing methods
Sanger method
ABI 3730xl
Adapted from Chan E.Y. (2005), Mutation res, 573,
13-40
103Comparison of sequencing methods
Sanger method
ABI 3730xl
Adapted from Chan E.Y. (2005), Mutation res, 573,
13-40
454 technology
104Comparison of sequencing methods
Sanger method
ABI 3730xl
Adapted from Chan E.Y. (2005), Mutation res, 573,
13-40
454 technology
105Comparison of sequencing methods
Sanger method
ABI 3730xl
Adapted from Chan E.Y. (2005), Mutation res, 573,
13-40
454 technology
106Choice of sequencing method
Example of Neanderthal DNA
DNA from a fragment of 38 000-year-old
Neanderthal fossil found in 1980 in Vindija cave
(Croatia)
Neanderthal DNA constraints
Advantages of Pyrosequencing
- No bacterial cloning
- No template competition for amplification
- Read length about 200 bp
- Each sequenced product stems from just one
- original single stranded template molecule of
- known orientation (difference with PCR)
- Rare short DNA
- fragments
- Many
- contaminations
Green R.E. et al, 2006
107Principle
Lambert and Millar (2006), Green et al, (2006)
http//WWW.454.COM/
108Results
Analysis of one million base pairs of Neanderthal
DNA
Location on the human karyotype of Neanderthal
DNA
Schematic tree illustrating the number of
nucleotide changes inferred to have occured on
hominoid lineages
Green et al, (2006)
109Conclusions
110Conclusions
- Sequencing today is performed in big centers
111Conclusions
- Sequencing today is performed in big centers
- The number of sequences is exponentially
growing up....
112Conclusions
- Sequencing today is performed in big centers
- The number of sequences is exponentially
growing up....
But the bottle neck remains sequence analysis....
113Conclusions
- Sequencing today is performed in big centers
- The number of sequences is exponentially
growing up....
But the bottle neck remains analysis of
sequences....
Precisely, the goal of the present course
"Bioinformatics and Comparative Genome Analysis"
is to give you tools to participate to
improvements of this knowledge domain...
114So... Good work on the Queen molecule!
Thanks to the organizers!
And thanks for your attention!
115Plan of the course
1
2
116Plan of the course (conted)
3