Title: Optional Reading:
1Optional Reading Yeast as a Genomic Model
2Midterm III5/24
- Closed Books, Closed Papers,
- Open Note Page
- one 8.5 x 11 inch page,
- two-sided OK,
- handwritten or typed (computer OK) text,
- NO CUT AND PASTE FIGURES,
- hand drawn figures are OK,
- Handed in with exam.
- Exceptions from these rules will not be
tolerated.
3Complete Genomic SequencesDNAReagent for the
21st Century
- 2001
- 9 ARCHAEAL
- 36 BACTERIAL
- 6 EUKARYAL
2004 (May) 17 ARCHAEAL 144 BACTERIAL 39
EUKARYAL (rough and finished)
2004 1,286 Viral Genomes, 547 Organelles,
others
4Public Data Set
GBREL.TXT Genetic Sequence Data Bank
February 15 2001
NCBI-GenBank Flat File Release 122.0
Distribution Release Notes 10896781
loci, 11720120326 bases, from 10896781 reported
sequences
GBREL.TXT Genetic Sequence Data Bank
April 15 2004
NCBI-GenBank Flat File Release 141.0
Distribution Release Notes 33676218
loci, 38989342565 bases, from 33676218 reported
sequences
5Genome Project Goals(General From the Outset)
- Establish an integrated WEB-based database and
research interface, - Assemble physical and genetic maps,
- Generate genomic and expressed (mRNA) gene
sequences, - Identify and annotate the complete set of genes
encoded within a genome, - Compile atlases of gene expression,
- Accumulate functional data (functional genomics
reverse genetics, proteomics, etc), - Characterize sequence diversity between and among
organisms.
6Post-Genomic Goals10 Year ScheduleArabidopsis
(2000 - 2010)
1- to 3-Year Goals
Develop essential genetic tools, including the
following
- - comprehensive sets of sequence-indexed mutants,
accessible via database search, - - whole-genome mapping and gene expression DNA
chips, - facile gene expression systems
- - produce antibodies against, or epitope tags on,
all deduced proteins.
all done.
7Post-Genomic Goals10 Year ScheduleArabidopsis
(2000 - 2010)
3- to 6-Year Goals
- Create a complete library of full-length cDNAs
(cloned mRNAs). - Construct defined deletions of linked,
duplicated genes. - Develop methods for directed mutations and
site-specific recombination. - Describe global mRNA expression profiles at
organ, cellular, and - subcellular levels under various environmental
conditions. - Develop global understanding of
post-translational modification. - - Undertake global metabolic profiling at organ,
cellular, and sub-cellular levels under various
environmental conditions.
progressing.
8Post-Genomic Goals10 Year ScheduleArabidopsis
(2000 - 2010)
10-Year Goals
- Artificial chromosomes.
- Identify cis regulatory sequences of all genes.
- Identify regulatory circuits controlled by each
transcription factor. - Determine biochemical function for every protein.
- Describe three-dimensional structures of members
of every plant-specific protein family. - Undertake systems analysis of the uptake,
transport, and storage of ions and metabolites. - Describe globally protein-protein,
protein-nucleic acid, and protein-other
interactions at organ, cellular, and subcellular
levels under various environmental conditions. - Survey genomic sequencing, and deep EST sampling
from phylogenetic node species. - - Define a predictive basis for conservation
versus diversification of gene function. - Compare genomic sequences within species.
progressing.
9Disclaimer this review is heavily biased toward
the public sequencing consortium.
10Map First then sequence
Sequence First then map
11Genome Sequencing Strategy 1
- Clone-by Clone Approach
- Order clones along the genome, then sequence,
- not dependent on acceleration of sequencing
capacity, - not dependent on advanced computer analysis,
- not dependent on as-of-yet sequencing
technologies. - heavy up-front demand for human labor.
12Clone-by-Clone Ordered Approach
Online Primer mapping.html
13Genomic Libraries
how many clones to cover a genome?
14Vectors(carry insert DNA)
Vector
Host
Inserts
- Plasmid E. coli up to 15 kb,
- Phage E. coli up to 25 kb,
- Cosmid E. coli up to 45 kb,
- BAC E. coli 100-500 kb,
- YAC Yeast 250-1000 kb.
-
plasmid/phage hybrid
15Genomic Sequences and Coverage
- N ln(1 -
.9999) - ln(1 - v/2,900,000,000)
- v average vector insert size
plasmid (5 kb) 5.3 x 106 phage (20 kb)
1.3 x 106 BAC (125 kb) 2.2 x 105
YAC (500 kb) 27,000 clones
16Bacterial Artificial ChromosomesBACs
- Universal Priming Sites,
- On the vector, flanking the genomic insert.
17Clone-by-Clone Ordered Approach
18Contigs(Contiguos Sequences)
Find overlapping ends
Clone 1
Sequence,
Restriction Fragment Length Polymorphisms
(RFLPs).
19Sequence Contig
20RFLP
Restriction enzymes cut specific
DNA specifically,
Fragment lengths provide clone identification
data.
21(No Transcript)
22Contigs(Contiguos Sequences)
Find overlapping ends
Merge good pairs of reads into longer contigs
- Find the minimal Tilling Path,
- - minimum set of overlapping clones that cover
the genome.
23Minimal Tilling Path
Shotgun Sequence Each Clone
24Shotgun(self-quiz)
8x - 10x coverage To shotgun sequence 10,000
bp, youd need 80k - 100k bp of sequence, or 160
- 180 sequencing reactions.
But, 10,000 bp, at 500 bp per sequencing
reaction could be done in as few as 20 sequencing
reactions.
Why Shotgun?
25Contigs
QC
26(No Transcript)
27Structural Genomic Strategies 2
- Whole Genome Assembly Approach
- Sequence first, then order,
- dependent on advances in computer analysis and
sequencing technologies, - dependent on automated labor.
28WGA
29Read Pairs Mate End Pairs
- Paired End Sequencing,
- sequence both ends of the vector insert, using
vector derived primers, - Maintain mate pair data.
5
3
5
3
30Example Sequence Output(example 5 kb insert)
5 read(543 bp)-atatgtatattgaattacatacatattattaatg
cacatttttatccggagttgtggaccatagaaagacatattgactcctca
aagtaaattctgcatgttacattgaaatcataggctaaatttgagatgca
ctatttttagaaagtgtagagaaaaggacaggaagaaataagcgaaagct
ttggtaagccaccaaacctgattactggaagaaaagaaaaaagttccgag
aatagagttagatcgctggtgagggttttaaatggaacacaacaatggtt
gttttagagtgtgttattcttttgtatttataccttctcataggtttctt
gtaatacacgcttcttcctctctctccctctctcttatggcctcgtcttg
aaagcgtcttgcatgctaagagaaggctttagagcaaggagagaagggag
aagttgatttatacgtccatcggatatatcttctttttatatctgtctct
cttttaaggaagaaaaatggcgactgaattctcgtgggatgaaatcaaga
aagaaaatg...
- rest of insert (unsequenced, 3.9 kb) -
...ggcttgaaatatttggggcaaacaagcttgaagagaaatcagagaac
aagtttttgaaattcttggggttcatgtggaatcctctctcatgggttat
ggagtctgctgcaatcatggctattgttttagctaatggaggaggaaagg
cgccggattggcaagattttatcggtattatggtgttgcttatcatcaac
tccaccataagtttcatcgaggagaacaatgctggcaatgccgctgctgc
tctcatggcaaatcttgcaccaaagactaaggtatgcaaatttctcaata
catatatataggtatgtattttctaaaaaggagagttatataacctatgt
gtgaatgtaggtgttgagagatggtaaatggggggagcaagaggcttcaa
tcttggttccgggtgatttgataagcatcaaattgggtgacattgttcct
gctgatgctcgtctcctcgaaggagatcctttaaaaattgaccaatctgc
tcttactggtgaatcccttccaaccaccaaacacccaggagat - 3
read(540 bp)
plus trace data files associated with these
sequence runs.
31WGA
32Structural Genomic Strategies 3 (Hybrid)
33Monday
- WGA,
- Shotgun Sequencing,
- Hybrid Approach.
- Compartmentalized
- Shotgun
- Approach
- Please read
- Science 291 1304-1315