Title: DNA Sequencing
1DNA Sequencing
2DNA sequencing
- How we obtain the sequence of nucleotides of a
species
ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGAC
TACGTTTTA TATATATATACGTCGTCGT ACTGATGACTAGATTACAG
ACTGATTTAGATACCTGAC TGATTTTAAAAAAATATT
3Which representative of the species?
- Which human?
- Answer one
- Answer two it doesnt matter
-
- Polymorphism rate number of letter changes
between two different members of a species -
- Humans 1/1,000
- Other organisms have much higher polymorphism
rates - Population size!
4Why humans are so similar
Out of Africa
N
- A small population that interbred reduced the
genetic variation - Out of Africa 40,000 years ago
Heterozygosity H H 4Nu/(1 4Nu) u 10-8, N
104 ? H 4?10-4
5Human population migrations
- Out of Africa, Replacement
- Grandma of all humans (Eve) 150,000yr
- Ancestor of all mtDNA
- Grandpa of all humans (Adam) 100,000yr
- Ancestor of all Y-chromosomes
- Multiregional Evolution
- Fossil records show a continuous change of
morphological features - Proponents of the theory doubt mtDNA and other
genetic evidence - New fossil records bury multirigionalists
- Nice article in Economist on that
- http//www.economist.com/science/displaystory.cfm?
story_id9507453
6DNA Sequencing Overview
1975
- Gel electrophoresis
- Predominant, old technology by F. Sanger
- Whole genome strategies
- Physical mapping
- Walking
- Shotgun sequencing
- Computational fragment assembly
- The futurenew sequencing technologies
- Pyrosequencing, single molecule methods,
- Assembly techniques
- Future variants of sequencing
- Resequencing of humans
- Microbial and environmental sequencing
- Cancer genome sequencing
2015
7DNA Sequencing
- Goal
- Find the complete sequence of A, C, G, Ts in
DNA - Challenge
- There is no machine that takes long DNA as an
input, and gives the complete sequence as output - Can only sequence 900 letters at a time
8DNA Sequencing vectors
DNA
Shake
DNA fragments
Known location (restriction site)
Vector Circular genome (bacterium, plasmid)
9Different types of vectors
VECTOR Size of insert
Plasmid 2,000-10,000 Can control the size
Cosmid 40,000
BAC (Bacterial Artificial Chromosome) 70,000-300,000
YAC (Yeast Artificial Chromosome) gt 300,000 Not used much recently
10DNA Sequencing gel electrophoresis
- Start at primer (restriction site)
- Grow DNA chain
- Include dideoxynucleoside (modified a, c, g, t)
- Stops reaction at all possible points
- Separate products with length, using gel
electrophoresis
11Method to sequence longer regions
genomic segment
cut many times at random (Shotgun)
Get one or two reads from each segment
900 bp
900 bp
12Reconstructing the Sequence (Fragment Assembly)
reads
Cover region with high redundancy
Overlap extend reads to reconstruct the
original genomic region
13Definition of Coverage
C
- Length of genomic segment G
- Number of reads N
- Length of each read L
- Definition Coverage C N L / G
- How much coverage is enough?
- Lander-Waterman model Prob not covered bp
e-C - Assuming uniform distribution of reads, C10
results in 1 gapped region /1,000,000 nucleotides
14Repeats
- Bacterial genomes 5
- Mammals 50
- Repeat types
- Low-Complexity DNA (e.g. ATATATATACATA)
- Microsatellite repeats (a1ak)N where k 3-6
- (e.g. CAGCAGTAGCAGCACCAG)
- Transposons
- SINE (Short Interspersed Nuclear Elements)
- e.g., ALU 300-long, 106 copies
- LINE (Long Interspersed Nuclear Elements)
- 4000-long, 200,000 copies
- LTR retroposons (Long Terminal Repeats (700 bp)
at each end) - cousins of HIV
- Gene Families genes duplicate then diverge
(paralogs) - Recent duplications 100,000-long, very similar
copies
15Sequencing and Fragment Assembly
3x109 nucleotides
50 of human DNA is composed of repeats
Error! Glued together two distant regions
16What can we do about repeats?
- Two main approaches
- Cluster the reads
- Link the reads
17What can we do about repeats?
- Two main approaches
- Cluster the reads
- Link the reads
18What can we do about repeats?
- Two main approaches
- Cluster the reads
- Link the reads
19Sequencing and Fragment Assembly
3x109 nucleotides
ARB, CRD or ARD, CRB ?
20Sequencing and Fragment Assembly
3x109 nucleotides
21Strategies for whole-genome sequencing
- Hierarchical Clone-by-clone
- Break genome into many long pieces
- Map each long piece onto the genome
- Sequence each piece with shotgun
- Example Yeast, Worm, Human, Rat
- Online version of (1) Walking
- Break genome into many long pieces
- Start sequencing each piece with shotgun
- Construct map as you go
- Example Rice genome
- Whole genome shotgun
- One large shotgun pass on the whole genome
- Example Drosophila, Human (Celera),