DNA Sequencing and the Human Genome Project - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

DNA Sequencing and the Human Genome Project

Description:

PDP-11 (DEC) was state of art. No PCs existed. Data input = punch cards. Data output = unformatted text! Conclude: You guys are fortunate! ... – PowerPoint PPT presentation

Number of Views:118
Avg rating:3.0/5.0
Slides: 47
Provided by: kenkr3
Category:

less

Transcript and Presenter's Notes

Title: DNA Sequencing and the Human Genome Project


1
(No Transcript)
2
DNA Sequencing and the Human Genome Project
  • History
  • Technology
  • Analysis

3
Technology
4
Aspects of Sequencing Genomes
  • Sequencing method
  • Cloned DNA
  • Clone/Sequence Assembly

5
Sequencing Methods
  • Sanger chain termination method gt90 of all
    sequencing
  • Relies on ability of DNA polymerase to
    incorporate nucleotide analogs while synthesizing
    template driven DNA

6
Dideoxynucleotide-based Sequencing
7
Automating Sanger Sequencing
8
Other Automated Methods
  • Hybridization method
  • Hybridize to oligos on a chip
  • Affymetrix can do 30K resequence
  • Limited by number of features and hybridization
    specificity
  • Single molecule methods
  • Pore-base - threads DNA through molecular pore in
    membrane - bases determined by changes in
    conductance
  • Mass spec - best for small molecules now like SNPs

9
Most-used hardware
  • ABI 377 - gel based - 96 lanes a pop - read
    length 500bp - run time 4-16h gt 40,000
    bases/run X 3 runs/day 120,000
  • ABI 3700 - Capillary based - 48 capillaries -
    read length 500bp - run time 40 minutes gt
    950,000

10
Whaddaya determine the sequence of?
  • Major problem is all methods only get 500bp/read
  • Shotgun method can help

11
Basic Shotgun Strategy
12
Power of Shotgun
  • Needs no prior knowledge of target
  • Requires no maps or landmarks
  • Got genome? Can start!

13
Limits of Shotgun
  • Requires redundant sequence
  • Not only bad - gives higher accuracy
  • Requires representative library
  • No missing clones
  • Difficulty of locating overlaps
  • Harder with larger genomes
  • Vertebrate genomes have repetitive DNA
  • Especially human 50 repeat

14
Computers Help
  • When first begun
  • PDP-11 (DEC) was state of art
  • No PCs existed
  • Data input punch cards
  • Data output unformatted text!
  • Conclude You guys are fortunate!

15
Locating Overlap - considerations
  • Need rules
  • How much overlap to call it real?
  • How much mismatach in overlap?
  • How to determine error?
  • How to automate?
  • Do you let the CPU do it?
  • Do you Edit? If so how?

16
The CONTIG
  • CONTIG CONTIGuous sequence from overlapping
    consensus
  • Phrase coined by Roger Staden MRC
  • First level of assembly in shotgun
  • Definition of consensus carefully controlled

17
Sample Contig
GGCTCTTAGGAGATT

GATTTAGTTATGTTATTGTGCAACTATC

Overlap?
ATGTTATTCTGCAACCATCGCTGCGGACGAATAGCTGT

TTGTGCAACAATCGCTGCGGACGA
11111111111345224455662223333311111112233332333111
3333333
GGCTCTTAGGAGATTTAGTTATGTTATTGTGCAACNATCGCTGCGGACGA
ATAGCTGT
What constitutes consensus?
18
PHRED, PHRAP, CONSED, FINISHER
  • Phils Rapid Editor (PHRED)
  • Reads tracings from ABI
  • Calls bases using best available graphical
    analyzer
  • Makes quality assessment based on signal
    strength, background, overlapping bands etc.
  • This gives a quantitative basis for
    establishing a contig

19
Phrap
  • Phils Rapid Assembly Program
  • Takes PHRED output as input
  • Compares all PHRED reads to all other reads and
    contigs and -
  • Makes tentative contigs with biases
  • End overlaps better than middle
  • Overlap must reach threshold score
  • Score is identity plus PHRED quality factor

20
Consed
  • CONtig Sequence Editor
  • Permits finisher to edit overlaps
  • Permits/confirms contig joins
  • Permits (but discourages) sequence editing
  • Allows identification of repeats
  • Uses RepeatMasker output

21
Finisher
  • Part of Consed
  • Makes suggestions for closure
  • Tells which clones to extend or reverse sequence
  • Derives PCR primers for gap filling
  • Estimated that finishing takes over twice as long
    as sequencing

22
Workflow
BAC or Small Genomic DNA
Fragment - sonication preferred
Clone library - 5-10X representative
Sequence - Enough runs
Data to PHRED -gt PHRAP -gt Consed -gt Finisher
Decide to fill gaps or done
Post to NCBI
23
Strategies
  • Divide and conquer
  • Create physical map
  • Create smaller and smaller subclones of mapped
    pieces
  • Carry out shotgun sequencing smallest pieces
  • Whole genome
  • Generate sequence-able clones
  • Determine sequences at random using shotgun
  • Use sequence overlap to reassemble into consensus

24
Philosophical issues regarding which to use
Wet-bench intensive
Fully map then sequence
Partial sequence ends, construct map, then
sequence
Full genome shotgun
Computationally intensive
25
Limitations on shotgun sequencing genomes
  • Obtaining enough clones to cover all spots
  • Finding credible sequence overlaps
  • Repetitive DNA in Humans
  • Computational power

26
More on repeats
Reads are only 500bp
GCTAGGCTAGTGGCATG
Genome is 3,000,000,000bp
Identical repeat sequences are interspersed throug
hout the genome so impossible to
place repeat-containing reads.
Reads are only 500bp
GCTAGGCTAGTGGCATG
GCTAGGCTAGTGGCATG
GCTAGGCTAGTGGCATG
GCTAGGCTAGTGGCATG
GCTAGGCTAGTGGCATG
Clone 1 sequence
CGAGCGTGTTGTACGTGTGA
GCTAGGCTAGTGGCATG
Clone 2 sequence
GGAGTGCTGAGTGGTGCAGCTAGGCTAGTGGCATGGGAGTGCTGAGTGGT
GCA
27
Mapping first - Shotgun sequence later
M13 or plasmid - 1 BAC (150kb) needs 6000
sequence reads or 2-3000 clones
28
Clones
  • Large insert clones
  • YACs (Yeast Artificial Chromosomes
  • Useful for mapping 1mb inserts
  • Unstable during construction and propagation
  • Not useful for sequencing
  • BACs (Bacterial Artificial Chromosomes)
  • 150kb insert
  • Extremely stable and easy to propagate
  • Gold standard for sequencing targets and
    chromosome-scale maps
  • Cosmids
  • 50kb insert
  • Extremely stable and easy to propagate
  • Useful for sequencing but too small for
    chromosome maps

29
Sequence-ready clones
  • Plasmids
  • 1-10kb insert capacity
  • High copy number
  • Easy to sequence bi-directionally
  • Automated clone picking/DNA isolation possible
  • Examples pUC18, pBR322
  • Single-stranded Bacteriophage
  • 1-5kb insert capacity
  • Grows at high copy as plasmid and is shed into
    medium as single stranded DNA phage
  • Easy to isolate, pick, sequence
  • Easy to automate
  • M13 is used almost exclusively

30
Mapping
  • Human Genome Maps
  • BAC Fingerprint map
  • Genetic Map
  • Cytogenetic Map
  • STS-based physical map (YACs)
  • Radiation Hybrid Map

31
Clone map from USSC
32
Genetic Map
  • Genethon and Marshfield
  • Used CEPH families to map
  • Used microsatellite markers (highly polymorphic)
  • Mapping on only 100 families attained 0.7cM map
  • Gave 5000 well ordered PHYSICAL markers
  • Can be used to order clones and contigs

33
Cytogentics
  • FISH (Fluorescence in situ hybridization) -
    useful to locate clones

34
STS Content YAC Map
35
RH Map
36
RH principle
37
General RH mapping on Panels
38
RH Mapping Panels
  • Genebridge
  • Number 93
  • Retention 32
  • Avg. Size 25mb
  • Stanford G3
  • Number 83
  • Retention 16
  • Avg. Size 2.4mb

39
(No Transcript)
40
Output from Stanford
  • From rhserver_at_paxil.stanford.edu
  • Date Tue Sep 9, 2003 92726 AM America/Denver
  • To krauter_at_colorado.edu
  • Subject SHGC RHSERVER
  • This email message has been sent automatically by
    the StanfordHuman Genome Center RHserver in
    response to your
  • submission.If you have questions or comments
    please submit them towebmaster_at_shgc.stanford.edu
    and include the
  • message ID krauter_at_colorado.edu1063121245.93
  • Duplicate markers are indicated with a (D) after
    the marker name,a LOD score is now given for
    duplicates.Reference
  • Number Stanford RH Panel G3 Lowest LOD
    Reported 4 Chromosome Value 0
  • Results for HUM_GEN
  • ----------------------------------------
  • SubmittedVector1100000010000000101000100000100000
    0110011001000001100100010000011101000
  • 110000000110
  • SHGCNAME CHROM
    LOD_SCOREDIST. (cRs)
  • 1 SHGC-57080 22
    19.18 4
  • Vector1100000000000000101000100000100000011001100
    10000011001000100000111010001100000001102
  • 2 SHGC-7822 22
    18.63 4
  • Vector1100000000000000R01000100000100000011001100
    10000011001000100000111010001100000001103
  • 3 SHGC-58507 22
    17.15 7

41
END of LINE
  • At the end, one has
  • Detailed marker and clone maps
  • Collection of BACs covering most of genome
  • Sequence minimal tiling path of BACs

42
Assembly-line process at MIT Genome Center
Grow in 2ml cultures
Pick from plate into dishes
Bar code 384-well dishes
ABI 3700 Sequencer
Multiposition robot preps DNA
Sanger rxns done in thermal cyclers
43
Technical sidebar
  • Literally hundreds of millions of clones must be
    sequenced
  • Must automate (i.e. use robots)
  • Methods to pick clones with inserts, prepare DNA,
    carry out sequencing reactions and load automated
    sequencers must be fully automatic (no human
    steps)

44
Both Celera and HGC compromised
  • Human Genome Consortium
  • Derived STS maps
  • Sequenced BAC ends and fingerprinted to make maps
  • Then sequenced minimum tiling path of BACs
  • Celera
  • Did full random shotgun of 1-3kb and 10-20kb
    clones
  • Used STS, EST, and BAC maps to order small
    contigs into larger contigs

45
Whats the diff?
  • Did the methods produce different outcomes?
  • No
  • Both produced gapped sequences
  • Both lacked highly repeated segments
  • Both produced sequence of sufficient quality to
    begin detailed analyses
  • Yes
  • Several regions had significantly different
    sequence orders
  • Not all genes in one were present in the other
  • HGC had, on average smaller but better contigs
  • Celera had higher redundancy (i.e. accuracy)
    sequence

46
Products
  • HGC
  • Reads 181 X 106
  • Bases 23 X 109
  • Av. Contig 3 X 105
  • No. Gaps 1.5 X 105
  • No. Genes 24,500
  • Celera
  • Reads 27 X 106
  • Bases 14 X 109
  • Av. Contig 3 X 106
  • No. Gaps 1 x 105
  • No. Genes 26,383
Write a Comment
User Comments (0)
About PowerShow.com