DNA - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

DNA

Description:

DNA Overview. Markov Chains & Models. Sequence Alignment. Future ... What if we align the DNA. sequence to a. model, instead of. another sequence? Our Solution ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 56
Provided by: NMH61
Category:
Tags: dna | dna | dog | match | model

less

Transcript and Presenter's Notes

Title: DNA


1
DNA Sequence Comparison and Alignment
Nick Heppenstall (biology) Michal Dvir
(Mathematics/CS) Andrew Dittmore (physics) Under
guidance of Dr. Yung-Pin Chen (Mathematics)
2
Outline
  • Pi in base 4
  • DNA Overview
  • Markov Chains Models
  • Sequence Alignment
  • Future Plans

3
p
in base 10
3.14159
in base 4
3.021003331
4
The Normality of Pi
Looking at p in base 4, the chance of seeing 2
is 14 22 is 116 222 is 164 2222 is 1256
5
Digits of p in base 4
3, 0, 2, 1, 0, 0, 3, 3, 3, 1, 2, 2, 2, 2, 0, 2,
0, 2, 0, 1, 1, 2, 2, 0, 3, 0, 0, 2, 0, 3, 1, 0,
3, 0, 1, 0, 3, 0, 1, 2, 1, 2, 0, 2, 2, 0, 2, 3,
2, 0, 0, 0, 3, 1, 3, 0, 0, 1, 3, 0, 3, 1, 0, 1,
0, 2, 2, 1, 0, 0, 0, 2, 1, 0, 3, 2, 0, 0, 2, 0,
2, 0, 2, 2, 1, 2, 1, 3, 3, 0, 3, 0, 1, 3, 1, 0,
0, 0, 0, 2,0, 0, 2, 3, 2, 3, 3, 2, 2, 2, 1, 2, 0,
3, 2, 3, 0, 1, 0, 3, 2, 1, 2, 3, 0, 2, 0, 2, 1,
1, 0, 1, 1, 0, 2, 2, 0, 0, 2, 0, 1, 3, 2, 1, 2,
0, 3, 2, 0, 3, 1, 0, 0, 0, 1, 0, 3, 1, 3, 1, 3,
2, 3, 3, 2, 1, 1, 1,0, 1, 2, 1, 2, 3, 0, 3, 3, 0,
3, 1, 0, 3, 2, 2, 1, 0, 0, 3, 0, 1, 2, 3, 0, 3,
0, 0, 0, 2, 2, 3, 0, 0, 2, 2, 1, 2, 3, 1, 3, 3,
0, 2, 1, 1, 3, 3, 0, 1, 1, 0, 0, 3, 1, 3, 1, 0,
3, 3, 3, 2, 0, 1, 0, 3, 1, 1, 1, 2, 3, 1, 1, 2,
3, 1, 1, 1, 0, 1, 3, 0, 0, 2, 1, 0, 1, 1, 3, 2,
1, 0, 2, 0, 1, 1, 2, 3, 1, 1, 1, 3, 1, 2, 1, 2,
0, 2, 1, 1, 3, 2, 1, 3, 3, 2, 3, 0, 1, 2, 3, 3,
1, 0, 1, 0, 3, 0, 1, 0, 0, 2, 3, 2, 2, 1, 2, 2,
1, 2, 0, 3, 1, 3, 3, 2, 3, 1, 1, 2, 2, 3, 0, 0,
2, 3, 3, 3, 3, 3, 1, 1, 3, 0, 2, 3, 1, 2, 3, 3,
1, 0, 0, 0, 1, 2, 2, 3, 1, 3, 3, 2, 3, 1, 3, 2,
3, 2, 0, 3, 2, 0, 1, 2, 2, 3, 3, 3, 2, 3, 1, 1,
2, 2, 2, 0, 2, 1, 2, 1, 3, 3, 2, 2, 1, 1, 2, 2,
3, 2, 2, 1, 3, 3, 0, 2, 1, 0, 0, 1, 0, 1, 1, 3,
3, 0, 1, 0, 2, 3, 0, 1, 3, 3, 3, 2, 1, 2, 1, 0,
2, 1, 0, 2, 2, 0, 1, 2, 1, 2, 1, 1, 0, 1, 3, 2,
3, 0, 3, 2, 1, 0, 1, 1, 2, 3, 0, 3, 3, 1, 3, 0,
0, 2, 0, 0, 0, 0, 1, 3, 3, 0, 2, 3, 2, 0, 2, 2,
0, 1, 1, 2, 0, 3, 2, 3, 3, 3, 0, 0, 1, 1, 2, 1,
2, 0, 3, 1, 2, 2, 1, 0, 2, 0, 0, 3, 1, 2, 0, 1,
3, 0 . . .
6
First 5000 digits of p in base 4.
7
Bases of the cowpox genome
t,a,g,t,a,a,a,a,t,t,a,a,a,t,t,a,a,t,t,a,t,a,a,a,a,
t,t,a,t,a,t,a,t,a,t,a,a,t,t,t,a,c,t,a,a,c,t,t,t,a,
g,t,t,a,g,a,t,a,a,a,t,t,a,a,t,a,a,t,a,t,a,t,a,a,g,
t,t,t,t,a,g,t,a,c,a,t,t,a,a,t,a,t,t,a,t,a,t,t,t,t,
a,a,a,t,a,t,t,t,t,a,t,t,t,a,g,t,g,t,c,t,a,g,a,a,a,
a,a,a,a,t,g,t,g,t,a,a,c,c,c,a,t,g,a,c,t,g,t,a,g,g,
a,a,a,c,t,c,t,a,ga,g,g,g,t,a,a,g,a,a,a,g,a,t,c,g,a
,t,c,g,c,t,t,t,a,t,a,g,a,g,a,c,c,a,t,c,a,g,a,a,a,g
,a,g,g,t,t,t,a,a,t,a,t,t,t,t,t,g,t,g,a,g,a,c,c,a,t
,t,g,a,a,g,a,g,a,g,a,a,a,g,a,g,a,a,a,g,a,g,a,a,t,a
,a,a,a,a,t,a,t,t,t,t,a,g,t,g,a,c,t,c,c,a,tc,a,g,a,
a,a,g,a,g,g,t,t,t,a,a,t,a,t,t,t,t,t,g,t,g,a,g,a,c,
c,a,t,t,g,a,a,g,a,g,a,g,a,a,a,g,a,g,a,a,a,g,a,g,a,
a,t,a,a,a,a,a,t,a,t,t,t,t,a,g,t,g,a,c,t,c,ca,t,c,a
,g,a,a,a,g,a,g,g,t,t,t,a,a,t,a,t,t,t,t,t,g,t,g,a,g
,a,c,c,a,t,c,g,a,a,g,a,g,a,g,a,a,a,g,a,g,a,a,t,a,a
,a,a,a,t,a,t,t,t,t,t,g,t,a,a,a,a,c,t,t,t,t,t,t,a,t
,g,a,g,a,c,c,a,t,t,g,a,a,g,a,g,a,g,a,a,a,g,a,g,a,a
,t,a,a,a,a,a,t,a,t,t,tt,t,g,t,a,a,a,a,c,t,t,t,t,t,
t,a,t,g,a,g,a,c,c,a,t,t,g,a,a,g,a,g,a,g,a,a,a
8
First 5000 bases of the cowpox genome.
9
Three-Dimensional Trajectories
Pi (random)
DNA
H.T. Chang, N Lo, W. Lu, C.J. Kuo,
Visualization and Comparison of DNA Sequences by
Use of Three-Dimensional Trajectories.
10
DNA
  • Deoxyribonucleic Acid
  • Double helix
  • Chain of nucleotide subunits
  • Four bases in DNA (A,T,C,G)
  • Hold information for maintaining life
  • Passed from parent(s) to offspring

11
Mutations
  • Environmental factors
  • Copying errors
  • Single base substitutions
  • Insertions/Deletions
  • Duplications
  • Translocations
  • Inversions

12
DNA sequence comparison
  • Homologous genes
  • Conserved sequences
  • Identify mutations
  • Forensics
  • Evolution

QUANTITATIVE!
13
Markov Chain
Definition A collection of random variables
having the property that, given the present, the
future is conditionally independent of the past.
Example Annual percentage migration between city
and country
0.03
Country
City
0. 97
0. 95
0.05
14
Hidden Markov Model
A Hidden Markov Model is a Markov chain, where
each state (City/Country) generates an
observation or emission (Pet). The state can be
predicted by observing emissions.
Example Annual percentage migration between city
and country
0.03
Cow 0.5 Dog 0.3 Cat 0.1 None 0.1
Cow 0.0 Dog 0.1 Cat 0.4 None 0.5
0. 97
0. 95
0.05
City
Country
15
HMM State Transitions
Match
Mismatch
InDel
States Match, Mismatch and Indel
16
HMM Emissions
A
C
G
T
Match
A/C
A/G
A/T
C/G
C/T
G/T
Mismatch
InDel
A/-
C/-
G/-
T/-
Emissions A, C, G and T
17
Alignment/Comparison
  • Mutations are recorded in DNA
  • Allow for comparison/alignment
  • Types of alignment
  • Local
  • Global
  • Gapped
  • Ungapped

18
Scoring matrices
A C G T
A 1 0 0 0
C 0 1 0 0
G 0 0 1 0
T 0 0 0 1
Gap -1
19
Local alignment
Human TATGGTGGCGAGCAAACGTTGCGTGCGTA Mouse
GAGCAAA
20
Local alignment
Human TATGGTGGCGAGCAAACGTTGCGTGCGTA Mouse
GAGCAAA

Score 0100000 1
21
Local alignment
Human TATGGTGGCGAGCAAACGTTGCGTGCGTA Mouse
GAGCAAA

Score 0010000 1
22
Local alignment
Human TATGGTGGCGAGCAAACGTTGCGTGCGTA Mouse
GAGCAAA

Score 0010000 1
23
Local alignment
Human TATGGTGGCGAGCAAACGTTGCGTGCGTA Mouse
GAGCAAA

Score 1111111 7
24
Global alignment
Human TATGGTGGCGAGCAAACGTTGCGTGCGTA Mouse
CATTGTGGTGAGCAAAGCGGTGGGCGGGTA
25
Global alignment
14 matches 16 mismatches
Score 14(1)16(0) 14
26
Global alignment
24 Matches 5 Mismatches 1 Indel
Score 24(1)5(0)1(-1) 23
27
The scoring problem
28
Our Solution
What if we align the DNA sequence to a model,
instead of another sequence?
29
Why is this a solution?
Start with an initial model with equally likely
probabilities. Then modify the model recursively
using one or more parent sequences. The initial
model is updated to replace the random
probabilities.
1/3 1/3 1/3
1/3 1/3 1/3
1/3 1/3 1/3
0.92 0.03 0.05
0.18 0.69 0.13
0.14 0.19 0.67
Recursive
Modification
30
How does it score?
  1. Modification number
  2. Length of original sequence
  3. Transition matrix
  4. Each emission matrix

ACTGTGTAG
The Model
  • Match/Match
  • Match/Mismatch
  • Match/Indel
  • 4. Mismatch/Match
  • .
  • .
  • .

Without knowing the initial state, the algorithm
checks all possible state transitions and
emissions for a best fit to the model.
31
How does it score?
ACTGTGTAG
  1. Modification number
  2. Length of original sequence
  3. Transition matrix
  4. Each emission matrix

The Model
  • Match/Match
  • Match/Mismatch
  • Match/Indel

Now the previous state is defined, so we have
only 3 possible transitions to consider.
32
How does it score?
  1. Modification number
  2. Length of original sequence
  3. Transition matrix
  4. Each emission matrix

ACTGTGTAG
The Model
  • Mismatch/Match
  • Mismatch/Mismatch
  • Mismatch/Indel

This process will continue through the sequence,
calculating the score and remembering the best
fit to the model.
33
Future Plans
  • Create working Hidden Markov Model.
  • Find convergence as the Model is modified.
  • Apply similar model to codon analysis.
  • Develop DNA trajectories as an alternative

approach to sequence comparison.
34
Modeling DNA with a Tetrahedron
35
Directional Vectors
G
A
C
T
36
AGTTCG
G
A
C
T
37
AGTTCG
G
A
C
T
38
AGTTCG
G
A
C
T
39
G
AGTTCG
A
C
T
40
G
AGTTCG
A
C
T
41
AGTTCG
G
A
C
T
42
AGTTCG
G
A
C
T
43
AGTTCG
G
A
C
T
44
AGTTCG
G
A
C
T
45
AGTTCG
G
A
C
T
46
AGTTCG
G
A
C
T
47
AGTTCG
G
A
C
T
48
AGTTCG
G
A
C
T
49
(No Transcript)
50
Change Points
51
Approximate Vectors Between Change Points
52
Quantify Regions Between Change Points
  • Trajectory Length
  • Tells the base count
  • Vector Direction
  • Tells the relative frequencies of each base
  • Vector Length vs. Trajectory Length
  • Tells how much the trajectory deviates from a
    straight line

53
DNA trajectories can be used to
  • Match patterns by grouping similar vectors
  • Find conserved regions (vectors that do not
    change from sequence to sequence)
  • Perform many local alignments to assemble global
    alignments

54
Thanks!
  • Kellar Autumn
  • Jeff Ely
  • Amanda Gassett
  • Deborah Lycan
  • Harvey Schmidt
  • Collin Trail
  • Greg Hermann
  • Matt Wilkinson

55
Work supported by
  • John S. Rogers Science Research Program
Write a Comment
User Comments (0)
About PowerShow.com