Title: Today
1Today
- Please read
- Science 291 1304-1315
2Human Genome Project DissentersMy Brush with
Greatness?
- 1992 Two years into the HGP, two of the projects
biggest critics were - Sydney Brenner believed that the HGP should
focus on human EST collections, and sequence the
genome of a simple vertebrate (Fugu). - Craig Venter believed that the clone-by-clone
approach was not the most efficient way to
proceed, suggested that shotgun approaches, and
even a whole genome approach was feasible.
they were both right.
3Sydney Brenner
- 2002 Nobel Prize (Medicine/Physiology)
- Sydney Brenner and John E. Sulston, Britain
- H. Robert Horvitz, United States
- for discoveries concerning how genes regulate
organ development and a process of programmed
cell death.
4Expressed Sequence TagsESTs
Brenner was right.
- End sequenced cDNAs
- (complementary DNA)
- cDNA synthetic DNA transcribed from a mRNA
template, - through the action of an RNA dependant DNA
polymerase called reverse transcriptase.
Online Primer est.html
5- Still Sequencing cDNAs,
- first and easiest look into any genome,
- useful in understanding genomic sequence (gene
finding), - helps determine splice site variants,
- shorter than genomic clones, fits in plasmids,
- etc.
6tissue specific ESTs are very useful.
7Whole Genome Assembly
Venter was right.
- 1995 1.8 Mbp Haemophilus influenza genome
sequenced, - 1996 - on Mycoplasma, E. coli and others,
- 1999 Chromosome 2 of Arabidopsis,
- 2000 Drosophila (120 Mbp) genome,
- Human, Mosquito, etc
- Lots of genomes, several applications...
WGA of bacterial, viral populations...
8(No Transcript)
9- 1 year, 120 megabases,
- Assembly algorithms could generate accurate
genomic sequences, - Interim assemblies (or mapping) were not
necessary.
24 MARCH 2000 VOL 287 SCIENCE
10Big Biology
11Think About This
- the plasmid library construction is the first
critical step in shotgun sequencing, - if the DNA libraries are not uniform in size,
non-chimeric, and do not randomly represent the
genome, then the subsequent steps cannot
accurately reconstruct the genome sequence. - We used automated high-throughput DNA sequencing
and the computational infrastructure to enable
efficient tracking of enormous amounts of
sequence information (27.3 million sequence
reads 14.9 billion bp of sequence).
12Whos DNA?
- 21 enrolled donors,
- age, sex, ethnographic group,
- one African-American,
- one Asian-Chinese,
- one Hispanic-Mexican,
- two Caucasions.
13Whos Mostly?
14(No Transcript)
15back to humans
Individuals, Libraries,
Sequence coverage, Clone coverage, Other?
What to know?
543 bp average sequence read
8, September 1999 - 25, June 2000
16(No Transcript)
17WGA Outline
18DNA in sized libraries
19back to humans
Individuals, Libraries,
Sequence coverage, Clone coverage, Other?
What to know?
543 bp average sequence read
8, September 1999 - 25, June 2000
20Whole Genome Assembly
- 1. Screener
- 2. Overlapper
- 3. Unitigger/Discriminator,
-
- 4. Scaffolder,
- 5. Repeat Resolver.
21Screener
- ...finds and masks microsatellite repeats,
known repeated regions and ribosomal DNA, - masked regions not used to make contigs,
- marks the rest for overlapping.
22Overlapper
- ...looks for end-to end overlaps of at least 40
bp with no more than 6 differences in match,
Whats the significance?
...a one in 1017 event.
given perfect randomness.
23Good News
- ... uniquely assembled contigs (unitigs) are
readily identifiable, - all of the assembled sequences match over all of
the known sequence,
- and -
...are consistent with an 8x sequence coverage.
24Whole Genome Assembly
- 1. Screener
- 2. Overlapper
- 3. Unitigger/Discriminator,
-
- 4. Scaffolder,
- 5. Repeat Resolver.
25Unitigs
But(t)
...the Screener doesnt include all of the low
frequency level repeats, ...so, a majority of
the Overlapper outputs turned out to be bogus.
26What Now?
- over-collapsed assemblies are identified and
broken down into unitigs when possible... - these too-large contig sets are sent to the
Unitigger/Discriminator.
27Unitigger...differentiates between a true
overlap, and an overlap that includes more than
one loci.
28Discriminator
29Discriminator
...may yield u-unitigs.
Unitigger/Discriminator Output correctly
assembled contigs covering 73.6 of the genome.
30Scaffolder
- ...contigs the contigs,
- uses mate-pair information, two or more
consistent mate-pair matches yields 1 in 1010
odds of being chance.
31Repeat Resolver ...most of the remaining gaps
were due to repeats.
Rocks Use low Discriminator Value contig
sets to fill gaps, - find two or more mate
pairs with unambiguous matches in the scaffold
near the gap (2 kb, 10kb or 50 kb), (1 in
107),
Stones - find mate pair matches 2 kb, 10 kb,
and 50 kb from gap, place the mate in the gap,
check to see if its consistent with other
placed sequences.
32Repeat Resolver ...most of the remaining gaps
were due to repeats.
Rocks Use low Discriminator Value contig
sets to fill gaps, - find two or more mate
pairs with unambiguous matches in the scaffold
near the gap (2 kb, 10kb or 50 kb), (1 in
107), Stones - find mate pair matches 2 kb,
10 kb, and 50 kb from gap, place the mate in the
gap, check to see if its consistent with other
placed sequences.
33If that Doesnt Work
- ...find a mate-pair that spans the gap, and
sequence it,
Chromosome Walking
34Today/Friday
- Questions about WGA,
- CSA,
- Comparisons,
- Quality Control, etc.