Title: DNA Computing: Mathematics with Molecules
1DNA Computing Mathematics with Molecules
Russell DeatonProfessor Comp. Sci. Engr.The
University of Arkansas Fayetteville, AR
72701 rdeaton_at_uark.edu
2What is DNA Computing (DNAC) ?
The use of biological molecules, primarily DNA,
DNA analogs, and RNA, for computational purposes.
3Why Nucleic Acids?
- Density (Adleman, Baum)
- DNA 1 bit per nm3, 1020 molecules
- Video 1 bit per 1012 nm3
- Efficiency (Adleman)
- DNA 1019 ops / J
- Supercomputer 109 ops / J
- Speed (Adleman)
- DNA 1014 ops per s
- Supercomputer 1012 ops per s
4What makes DNAC possible?
- Great advances in molecular biology
- PCR (Polymerase Chain Reaction)
- DNA Microarrays
- New enzymes and proteins
- Better understanding of biological molecules
- Ability to produce massive numbers of DNA
molecules with specified sequence and size - DNA molecules interact through template matching
reactions
5What is a the typical methodology?
- Encoding Map problem instance onto set of
biological molecules and molecular biology
protocols - Molecular Operations Let molecules react to
form potential solutions - Extraction/Detection Use protocols to extract
result in molecular form
6PHYSICAL STRUCTURE OF DNA
20 Å
3 OH
5 C
Minor Groove
34 Å
5
3
Sugar-Phosphate Backbone
Major Groove
5
3
Nitrogenous Base
C 5
3 0H
Central Axis
7What is an example?
- Molecular Computation of Solutions to
Combinatorial Problems - Adleman, Science, v. 266, p. 1021.
8(No Transcript)
9Algorithm
- Generate Random Paths through the graph.
- Keep only those paths that begin with vin and end
with vout. - If graph has n vertices, then keep only those
paths that enter exactly n vertices. - Keep only those paths that enter all the vertices
at least once. - In any paths remain, say Yes otherwise, say
No
10INTER-STRAND HYDROGEN BONDING
()
(-)
()
(-)
to Sugar-Phosphate Backbone
to Sugar-Phosphate Backbone
Adenine
Thymine
11STRAND HYBRIDIZATION
100 C
HEAT
COOL
OR
12DNA LIGATION
?
?
?
?
?
?
?
?
?
?
Ligase Joins 5' phosphate to 3' hydroxyl
13Encoding
GCATGGCC
0
CCGGTCGA
1
CCGGTACC
AGCTTAGG
2
ATGGCATG
0
0
2
1
GCATGGCCATGGCATG CCGGTACC
GCATGGCCAGCTTAGG CCGGTCGA
14(No Transcript)
15Massively Parallel Search
16Algorithm
- Generate Random Paths through the graph.
- Keep only those paths that begin with vin and end
with vout. - If graph has n vertices, then keep only those
paths that enter exactly n vertices. - Keep only those paths that enter all the vertices
at least once. - In any paths remain, say Yes otherwise, say
No
17DNA Polymerase
18POLYMERASE CHAIN REACTION
19Start V0, Stop V6
20Algorithm
- Generate Random Paths through the graph.
- Keep only those paths that begin with vin and end
with vout. - If graph has n vertices, then keep only those
paths that enter exactly n vertices. - Keep only those paths that enter all the vertices
at least once. - In any paths remain, say Yes otherwise, say
No
21GEL ELECTROPHORESIS - SIZE SORTING
Electrode
Samples
Slower
Gel
Buffer
Electrode
Faster
22Right Length
23Algorithm
- Generate Random Paths through the graph.
- Keep only those paths that begin with vin and end
with vout. - If graph has n vertices, then keep only those
paths that enter exactly n vertices. - Keep only those paths that enter all the vertices
at least once. - In any paths remain, say Yes otherwise, say
No
24ANTIBODY AFFINITY
Add oligo with Biotin label
B
GTGGTACACTG
Anneal
Heat and cool
Add Paramagnetic-Streptavidin Particles
B
GTGGTACACTG
Bind
Isolate with Magnet
GTGGTACACTG
25Every Vertex
26Algorithm
- Generate Random Paths through the graph.
- Keep only those paths that begin with vin and end
with vout. - If graph has n vertices, then keep only those
paths that enter exactly n vertices. - Keep only those paths that enter all the vertices
at least once. - In any paths remain, say Yes otherwise, say
No
27Hamiltonian Path
28Mismatches
29DNA Word Design
- Importance of Template-Matching Hybridization
Reactions in DNA Computing (DNAC) - Sequence design should implement DNAC
architecture. - Planned Hybridizations
- Problem Size
- Subsequent Processing Reactions
- Designed sequences should minimize unplanned
cross-hybridizations. - Consequences of Bad Designs Errors and Poor
Efficiency
30DNA Word Design
- Design problem is hard.
- As number of sequences required to represent the
problem increases, this constraints increasingly
conflicts with the requirement of
non-crosshybridization. - How much of DNA sequence space is available for
computation?
31Why In Vitro?
- In Vitro Selection and Evolution
- PCR as tool for selection
- Ability to synthesis huge, random starting
populations - Mutagenesis
- Oligos manufactured under conditions for use
- Use massive parallelism of DNAC to solve word
design problem
32Protocol Outline
- Start with huge population of random sequences
with attached primers. - Anneal rapidly to quench oligos in mismatched
configurations. - Using temperature as a control, melt most
mismatched pairs. - Amplify and purify
- Repeat
33(No Transcript)
34(No Transcript)
35Experimental Results
36Experimental Results
37Latest Results
38DNA Memories
39Overview
Input DNAs (Unknown Seq.)
Sequences Comple- mentary to Input DNAs
New Unknown Input DNAs
Labeled Tag Sequence Complements
Tag1
Random Probe
Learning
Recall
Output
Memory DNA Strands (With the 3 end
Comple- mentary to the Input DNAs)
Separates Memory DNA Strands that Match
or Partially Match the New Inputs from
Those That Dont Match
40Learning
- Learning Information acquired from examples
rather than programmed - Protocol to store input DNAs (possibly of unknown
sequence) - Higher level representation of the input
sequences - Not individual sequence memories but whole
populations - Clustering of input sequences in vitro
- Massively random and parallel copying or sampling
depending on number of inputs and probes
41(No Transcript)
42Base-by-Base Amplification
Input DNA
Tag
Probe
Extension
43Sampling
Input DNA
Tag
Probe
Extension
44Energy Surface Manipulation through Learning
Before Learning
After Learning
45Tags
- Non-Crosshybridizing Sequences
- Convenient for Input/Output in absence of input
sequence information - Manipulate memory without input sequences
- Implement DNA2DNA Computations (Landweber and
Lipton, DNA 3)
46Recall
- Hybridization to retrieve memories
- Similar sequences patterns matched
- Pattern matching done against whole memory
- Single memory associated with single tags
- Memory composite of output on multiple tags
47(No Transcript)
48(No Transcript)
49Experiments
- Test learning and recall with plasmid
- Test of sensitivity in concentration
- Test coverage of input sequence space with
- Plasmids (5k bp)
- E. Coli (5M bp)
- Test sequence resolution of protocols
50(No Transcript)
51Learning
Input 1 is a 3 kb linear DNA (pBluescript)
Input 2 is a 5 kb linear DNA (?x 174)
52Recall
Plasmid inputs learned, similar sequences
recalled, and dissimilar not matched.
53Concentration Sensitivity
- Plasmids digested with Hpa II
- 1 ?g pBluescript
- 10ng - 800ng ?x 174
- Blotted with ?x 174 memory
- 1 ?x 174 detected in background of pBluescript
54Input Space Coverage
- Randomly digested input
- Learning on both inputs
- Blots nearly identical
55E. coli
- E. coli digested
- 219bp fragment of ?x 174 added
- Learning with and without fragment
- Fragment distinguished when learned
56Application
57Team
- Russell Deaton, University of Arkansas, Computer
Science and Engineering - Junghuei Chen, University of Delaware, Chemistry
and Biochemistry - Hong Bi, University of Delaware, Chemistry and
Biochemistry - Max Garzon, University of Memphis, Computer
Science - Harvey Rubin, University of Pennsyvania, School
of Medicine - David Wood, University of Delaware, Computer and
Information Science
58Acknowledgement
- This work was supported by the NSF QuBIC program,
award number EIA-0130385