Title: Gene Synthesis using DNAWorks
1Gene Synthesis using DNAWorks
- Dr. David Hoover
- Helix Systems, SCB, CIT, NIH
2Gene Synthesis
- Several methods
- ligation - incredibly tedious and inefficient
- FokI - sequence dependent (type IIs r.e.)
- serial cloning - sequence dependent
- assembly or self-priming PCR
3Gene Synthesis Methods
Thermodynamically Balanced Conventional
Thermodynamically Balanced Inside-Out
4Protein Expression
- Protein/Structure Independent Factors
- promoters and upstream elements
- translational initiation and termination
- mRNA stability
- codon bias
- Protein/Structure Dependent Factors
- folding and aggregation
- proteolysis and degradation
- secretion and localization
5Codon Bias
6Synthetic Genes
- Benefits
- Codon use optimized for host
- Flexibility in subcloning
- Ease of complex mutagenesis
- Problems
- Time consuming
- Complicated
- Error-prone
7Commercial Sources
- Blue Heron Biotechnology (http//www.blueheronbio.
com) - DNA 2.0 (http//www.dna20.com/)
- Gene Script Corporation (http//www.genscript.com/
) - BioNexus Inc. (http//www.genesynthesis.net/)
- Entelechon (http//www.entelechon.com/)
- GeneArt (http//www.geneart.com/)
- Codon Devices (http//www.codondevices.com/)
8Commerical Sources
- Typical costs
- 0.79 - 3.60 / bp
- Complexities?
- Intellectual property?
- 800 bp 1000 (Gene Script)
9Genes From Scratch
- oligos 0.20 / nt (NIH discount)
- PCR reagents 2 / reaction
- sequencing 20 / 600 bp
- electrophoresis 3 / gel
- labor 20 / hr
- GFP, 238 aa, 714 bp, 20 oligos, 1134 nt, 2
reactions, 2 gels, 4 sequences, 10 hrs 517
10(No Transcript)
11How to design oligos
- reverse-translate protein into DNA, optimum codon
usage - break into fragments of equal overlap Tm
- optimize
- hairpins / mRNA structure
- repeats / mispriming
- restriction site inclusion / exclusion
- length
12DNAWorks
http//helixweb.nih.gov/dnaworks/
13DNAWorks Output
181 TCTGGTGAAGGCGAGGGTGACGCGACCTACGGTAAACTCACTCT
CAAAT agaccact
TGCCATTTGAGTGAGAGTTTAAGTAGACGTGG lt--- 4
S G E G E G D A T Y G K L T L K
F I C T
7 ---gt 241 ggttccttggccgaccctggttac
taccttctcttacggtgttcag TGCCCGTTTGACGGCCAAGGAA
CCGGCTGG tc
lt--- 6 T G K L P V
P W P T L V T T F S Y G V Q
14DNAWorks Options
15DNAWorks Options
- Codon Frequency Table
- E. coli (standard, class II), H. sapiens, C.
elegans, D. melanogaster, M. musculus, P.
pastoris, R. norvegicus, S. cerevesiae, X. laevis - Custom CFT
16Gly GGG 599428.00 16.49 0.25 Gly GGA
597986.00 16.45 0.25 Gly GGT 392298.00
10.79 0.16 Gly GGC 814464.00 22.41 0.34
Glu GAG 1441162.00 39.65 0.58 Glu GAA
1043166.00 28.70 0.42 Asp GAT 789799.00
21.73 0.46 Asp GAC 914677.00 25.16 0.54
Val GTG 1028789.00 28.30 0.46 Val GTA
257442.00 7.08 0.12 Val GTT 399567.00
10.99 0.18 Val GTC 528840.00 14.55 0.24
Ala GCG 271820.00 7.48 0.11 Ala GCA
579156.00 15.93 0.23 Ala GCT 672416.00
18.50 0.26 Ala GCC 1018345.00 28.02 0.40
Arg AGG 432954.00 11.91 0.21 Arg AGA
434655.00 11.96 0.21 Ser AGT 441137.00
12.14 0.15 Ser AGC 706723.00 19.44 0.24
Lys AAG 1163126.00 32.00 0.57 Lys AAA
879684.00 24.20 0.43
17DNAWorks Options
- Parameters
- Annealing Temperature
- Oligo Length (random)
- Codon Frequency Threshold (random, strict,
scored) - Oligonucleotide, Na/K, Mg2 Concentrations
- Number of Solutions
- TBIO
- No gaps in assembly
18DNAWorks Options
- Balancing act
- Fast, simple, cheap?
- Slow, complex, expensive? - reliable
- Reusable and interchangeable oligos?
19DNAWorks Options
- Others
- Restriction Site Screen (non-degenerate,
degenerate sequences) - Custom Site Screen (mind the format!)
- Weights (experimental)
20DNAWorks Options
- Sequences
- protein (X stop)
- nucleotide (can be degenerate)
- almost any file format
- reverse sequence
- fix sequence in gap
21DNAWorks Output
- Web output
- Input for DNAWorks (standalone version)
- Header
- Initial parameters
- Optimization log
- Final scores
- Final summary
22DNAWorks Output
- Total output
- Sequence blocks
- CFT blocks
- Pattern block
- Trials
- Final Summary
23DNAWorks Output
- Trial outputs
- Initial parameters
- Final DNA sequence
- Assembly
- Final scores
- Codon report
- Histograms
- Oligo sequences
24Scores / Penalties
- codon usage
- length
- melting temperature
- repeat
- pattern
- mispriming
- AT/GC contents
- gapfix
25Mutant Run
- Design oligos based on previous set of oligos
- Parameters taken from previous run
- For single mutation, will output 1 or 2 oligos
only
26What to look for
- Final Summary
- Avoid misprimes and repeats
- Make sure overlaps are gt 12 nt (Short)
- Tm range should not be gt 3C (TmRange)
- Don't depend entirely on scores
- Arbitrary, somewhat dependent on length
27Tricks
- Choosing codons
- random - slower optimization, less constrained
- strict - for the fussy
- scored - if codon score really matters
- Tm, Length ranges, Number of Solutions
- To find the "very best" solution
- no more than 999
28Tricks
- Design multi-use and interchangeable oligos
- Flanking primers with standard overlaps
- Intersperse nucleotide elements between protein
elements - Gap-fix restriction sites
- Allow for mutations later on
- Random mutagenesis
- Nucleotide sequences can be degenerate
29Tricks
- Thermodynamically Balanced Inside-Out Mode
- Multi-step PCR
- More controlled, reliable method
- Gao X., et al., Nucleic Acids Res 2003
- Random oligo lengths
- Faster, better optimization
- For the not-so-fussy
- Probably best for DNA-only genes
30Tricks
- Set Tm higher
- 64C - 70C
- longer oligos, extra purification ()
31Always double check!
- Nothing is foolproof
- Think carefully about what you need BEFORE
starting work - Always run final sequences through alternate
program (EMBOSS, GCG-Lite) - Make sure oligos are what you intended
32PCR
- Mix all oligos and additives
- Specific PCR protocols
- Analytical gel
- Isolate desired products
33Assembly Protocol
Oligos 1 µl 625 nM each 25 nM each
dNTPs 2 µl 2.5 mM each 0.25 mM each
H2O 19 µl
Buffer 2.5 µl 10X 1X
Pfu pol. 0.5 µl
95C 2.0 ' 1X
95C 0.5 '
65gt55(-0.5) 0.5 ' 20X
72C 0.5 '
72C 5 ' 1X
4C hold
34Amplification Protocol
PCR mix 2 µl ? ?
dNTPs 8 µl 2.5 mM each 0.2 mM each
3' primer 4 µl 10 µM 400 nM
5' primer 4 µl 10 µM 400 nM
Buffer 10 µl 10X 1X
H2O 70 µl
Pfu pol. 2 µl
95C 2.0 ' 1X
95C 0.5 '
62C 0.5 ' 20X
72C 0.5 '
72C 5 ' 1X
4C hold
35(No Transcript)
36Problems
- No product (complete failure)
- Wrong size product (mispriming)
- Mutations (2 out of 3 correct, 2 errors/kb)
- Sequencing is warranted...
37Fixes
- Optimize PCR conditions
- Break gene synthesis into steps (TBIO)
38Errors
p mutation rate / 1000 nt / duplication (Cline
et al., Nucleic Acids Res 24 (1996)) Taq
polymerase 0.008 KOD (Novagen)
0.0027 PfuUltra (Stratagene) 0.00043 The
probability of a gene n bp in length having no
errors using a polymerase with mutation rate p
p' (1 - p)n Therefore, p' for a 738 bp
gene (1 - 0.00043)738 0.728
39Errors
The number of clones needed to screen to find a
correct gene with 95 confidence N
log(0.05)/log(1-p') Thus, log(0.05)/log(1-0.728)
3 clones need to be sequenced. From Wu et al.,
J Biotech 124 (2006)
40Time
- Find protein of interest, design oligos, order
oligos - Run PCR, integrate into sequencing vector,
transform - Pick colony, grow overnight culture
- Miniprep construct, integrate into expression
vector, transform - Pick colony, grow overnight culture
- Run expression growth trials
- 1 week between concept and initial trial (at
best!!) - Can be automated and parallelized (96 well
plates?)