Title: Analysis
1Analysis Synthesis of Omes
DOE Wed 3-Nov-2004 1130 AM
Thanks to Agencourt, Ambergen, Atactic,
BeyondGenomics, Caliper, Genomatica, Genovoxx,
Helicos, MJR, NEN, Nimblegen, ThermoFinnigan,
Xeotron/Invitrogen For more info see
arep.med.harvard.edu
2Systems Biology Loop
Synthetic Biology Tools
Metabolic optimality
Models
Experimental designs
(Systematic) Data
Flux Competitive growth
DNA RNA Polony-Seq
Syntheses Perturbations
Proteasome targeting Genome engineering
3DOE Synthetic Genomes Why?
- Cheaper/faster "standard biology", hypothesis
testing - Systems Biology Multiple simultaneous tests
- Viruses Aid strain transfer generate variants,
new haplotypes - Anti-viral vaccines and therapeutics (including
variants) - In vitro Make products toxic in E.coli.
- Microbes Interspecific hybrids (e.g. codon
usage) - Structural biology variants
- Rapid vaccine response to engineered
bioterrorism. - Cell-mediated immunity humoral.
- Fix mismatch between genome analysis synthesis
4DOE Synthetic Genomes Why?
- In vitro
- Microbial Human Antimutators
- Artificial ecosystems (laboratory scales)
- Energy aiding pathway improvement
- Instrustrial production Enzymes,
SingleCellProtein, Protein-drugs - Remediation Hybrid genomes (opt. codons),
combinatorial pathway (Maxygen Diversa).
Xylose Oil - Pharmaceuticals Combinatorial syntheses
- Nano science Combinatorial syntheses, Complex
nanosystems, more general nanoassembly (in reach
of polymerases and ribosome-like factories) - Health research 10X faster results per current
(cost/benefit) - Hypothesize test unknown gene combinations
- Synthetic standards (arrays, MS, quantitation,
etc) - Agriculture salt, cold, drought, pest tolerant
hybrid genomes
5Motif Co-occurrence, comparative genomics, RNA
clusters, and/or ChIP2-location data
P 10-6 to 10-11
Genome Res. 14201208 Bulyk, McGuire,Masuda,Churc
h
6Synthetic testing of DNA motif combinations
1.3 2.4 (1.3 in
DargR) 1.1 1.3
0.7
2.5 0.2
1.4 1.4 3.5
RNA Ratio (motif- to wild type) for each flanking
gene
Bulyk, McGuire,Masuda,Church Genome Res.
14201208
7Synthetic Genomes Proteomes. Why?
- Test or engineer cis-DNA/RNA-elements
- Access to any protein (complex) including
- post-transcriptional modifications
- Affinity agents for the above.
- Mass spectrometry standards, protein design
- Utility of molecular biology DNA-RNA-Protein
- in vitro "kits" (e.g. PCR, SP6, Roche)
- Toward these goals design a chassis
- 115 kbp genome. 150 genes.
- Nearly all 3D structures known.
- Comprehensive functional data.
8(PURE) translation utility
- Removing tRNA-synthetases,
- translational release-factors,
- RNases proteases
- Selection of scFvs specific for HBV DNA
polymerase using ribosome display. Lee et al.
2004 J Immunol Methods. 284147 - Programming peptidomimetic syntheses by
translating genetic codes designed de novo.
Forster et al. 2003 PNAS 1006353 - High level cell-free expression specific
labeling of integral membrane proteins. Klammt et
al. 2004 Eur J Biochem 271568 - Cell-free translation reconstituted with purified
components. Shimizu et al. 2001 Nat
Biotechnol. 19751-5.
9 in vitro genetic codes
5'
Second base
3'
A
C
U
A
U
C
A
C
mS
yU
U
G
eU
80 average yield per unnatural coupling.
bK biotinyllysine , mS Omethylserine
eU2-amino-4-pentenoic acid yU
2-amino-4-pentynoic acid
Forster, et al. (2003) PNAS 1006353-7
10 Mirror world enzyme, parasite,
predator resistance access 2n diastereomers (n
chiral atoms)
- L-amino acids D-ribose (rNTPs, dNTPs)
- Transition EF-Tu, peptidyl transferase,
DNA-ligase - D-amino acids L-ribose (rNTPs, dNTPs)
Dedkova, et al. (2003) Enhanced D-amino acid
incorporation into protein by modified
ribosomes. J Am Chem Soc 125, 6616-7
11Oligos for 150 776 synthetic genes(for E.coli
minigenome M.mobile whole genome respectively)
Forster Church
12Up to 760K Oligos/Chip18 Mbp for 700 raw
(6-18K genes)
- lt1K Oxamer Electrolytic acid/base
- 8K Atactic/Xeotron/Invitrogen
Photo-Generated Acid - Sheng , Zhou, Gulari, Gao (U.Houston)
- 24K Agilent Ink-jet standard reagents
- 48K Febit
- 100K Metrigen
- 380K Nimblegen Photolabile 5'protection
- Nuwaysir, Smith, Albert
Tian, Gong, Church
13Improve DNA Synthesis Cost
- Synthesis on chips in pools is 5000X less
expensive per oligonucleotide, but amounts are
low (1e6 molecules rather than usual 1e12)
bimolecular kinetics slow with square of
concentration decrease!) - Solution Amplify the oligos then release them.
10 50 10 gt
ss-70-mer (chip)
gt ds-90-mer
gt ds-50-mer
20-mer PCR primers with restriction sites at the
50mer junctions
Tian, Gong, Sheng , Zhou, Gulari, Gao, Church
14Improve DNA Synthesis Accuracyvia mismatch
selection
Other mismatch methods MutS (H,L)
Tian Church
15Genome assembly
50 75 125 225 425 825
1002(n-1)
Moving forward 1. Tandem, inverted and
dispersed repeats (hierarchical assembly,
size-selection and/or scaffolding) 2. Reduce
mutations (goal lt1e-6 errors) to reduce of
intermediates 3. gt30 kbp homologous (Nick
Reppas) 4. Phage integrase site-specific
recombination, also for counters. Stemmer et al.
1995. Gene 16449-53Mullis 1986 CSHSQB.
16All 30S-Ribosomal-protein DNAs(codon
re-optimized)
1.7 kb
0.3 kb
Tian, Gong, Sheng , Zhou, Gulari, Gao, Church
17Improving synthesis accuracy 9-fold
Method Total bp Clones Trans-ition Trans-version Deletion Addition Bp/error
Hyb selection, PCR 23641 9 7 3 5 2 1391
Gel selection, PCR 24546 35 28 12 11 3 455
No selection, ligation PCR 6093 25 6 6 22 4 160
No selection, PCR 9243 21 25 13 19 1 159
Tian Church
18Extreme mRNA makeover for protein expression in
vitro
RS-2,4,5,6,9,10,12,13,15,16,17,and 21 detectable
initially. RS-1, 3, 7, 8, 11, 14, 18, 19, 20
initially weak or undetectable. Solution
Iteratively resynthesize all mRNAs with less
mRNA structure.
Western blot based on His-tags
Tian Church
19Systems Biology Loop
Synthetic Biology Tools
Metabolic optimality
Models
Experimental designs
(Systematic) Data
Flux Competitive growth
DNA RNA Polony-Seq
Syntheses Perturbations
Proteasome targeting Genome engineering
20Why sequence?
- Cancer mutation sets for individual clones,
loss-of-heterozygosity - Pathogen "weather map", biowarfare sensors
- RNA splicing chromatin modification patterns.
- Synthetic biology lab selections
- Antibodies or "aptamers" for any protein
- B T-cell receptor diversity Temporal
profiling, clinical - Preventative medicine genotypephenotype
associations - Cell-lineage during development
- Phylogenetic footprinting, biodiversity
Shendure et al. 2004 Nature Rev Gen 5, 335.
21Sequencing single molecules
Ecosystem studies really need single-cell
amplification because of multiple chromosomes
( RNAs)
(Even an 80 genome coverage is better than 100
kb BACs)
22Single bacterial chromosome amplification
Ratio to unamplified hybridization along
the chromosome of Escherichia
Prochlorococus on Affymetrix chips.
23Convergence on non-electrophorectic tag
sequencing methods?
- Tag gt400 14-26 20 100 26 bp
(2-ends) - EST SAGE MPSS 454 Polony-Seq
-
- Single-molecule vs. amplified single molecule.
- Array vs. bead packing vs. random
- Rapid scans vs. long scans (chemically limited,
454) - Number of immobilized primers
- 0 Chetverin'97 "Molecular Colonies"
- 1 Mitra'99 gt Agencourt "Bead Polonies"
- 2 Kawashima'88, Adams'97 gt Lynx/Solexa
"Clusters"
http//arep.med.harvard.edu/Polonator/Plone.htm
24Polony Fluorescent In Situ Sequencing Libraries
1 to 100kb Genomic
2x20bp after MmeI (BceAI, AcuI)
L R M
M
Sequencing primers
PCR bead
Greg Porreca Abraham Rosenbaum
Dressman et al PNAS 2003 emulsion
25Cleavable dNTP-Fluorophore ( terminators)
Reduce or photo- cleave
Mitra,RD, Shendure,J, Olejnik,J, Olejnik,EK, and
Church,GM (2003) Fluorescent in situ Sequencing
on Polymerase Colonies. Analyt. Biochem. 32055-65
26Polony-FISSeq up to 2 billion beads/slide
0.5 of full gel area
27Polony-FISSeq up to 2 billion beads/slide Cy5
primer (570nm) Cy3 dNTP (666nm)
Jay Shendure
28Polony FISSeq Stats
- of bases sequenced (total) 23,703,953
- bases sequenced (unique) 73
- Avg fold coverage 324,711 X
- Pixels used per bead (analysis) 3.6
- Read Length per primer 14-15 bp
- Insertions 0.5
- Deletions 0.7
- Substitutions (raw) 4e-5
- Throughput 360,000 bp/min
Current capillary sequencing 1400 bp/min (600X
speed/cost ratio, 5K/1X) (This may omit PCR ,
homopolymer, context errors)
Shendure
29Systems Biology Loop
Synthetic Biology Tools
Metabolic optimality
Models
Experimental designs
(Systematic) Data
Flux Competitive growth
DNA RNA Polony-Seq
Syntheses Perturbations
Proteasome targeting Genome engineering
30.
31High accuracy special case homopolymers (e.g.
AAA, CC, etc.)
- Use "compressed" tags , ACG ACCGACCCG
- Quantitate incorporation
- Reversible terminators
- "Wobble sequencing"
- All of these work.
- Maintenance of amplification fidelity using
linear amplification from initial genomic
fragment
32"Wobble sequencing" for homopolymers
- 6 positions 16 primers 4 dNTPs gt 13 bp
(paired ends) - CCTCATTCTCT AA dATP (then C, )
- CCTCATTCTCT AC dATP (then C, )
- . . .
- CCTCATTCTCTnnAA dATP (then C, )
- . . .
- CCTCATTCTCTnnNNnnNNnnTT dATP (then C, )
- 4.5/64 bp/cycle (for wobble sequencing) vs.
- 2.5/4 bp/cycle (for simple sequential
base-extension)