Physical Mapping of DNA - PowerPoint PPT Presentation

About This Presentation

Title:

Physical Mapping of DNA

Description:

... 'hangs' on the longest path will be a 2-node length path (dangler) ... Dangler-First Search II. pc= 6, 2, 3, 9, 7, 1, 4. 6A 6C 8B 2C 14A 9C 15B 1C 5A 4C 4B ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 39

Provided by: Y860

Learn more at: https://www.cse.lehigh.edu

Category:

more less

Transcript and Presenter's Notes

Title: Physical Mapping of DNA

1
Physical Mapping of DNA

Shanna Terry
March 2, 2004

2
Overview

Background
Types of Mapping
Mathematical Models
Enhanced Double Digest Problem

3
Background

Given a sequence of DNA, how do we figure out
where on some larger chromosome the sequence
lies?
?

4
Background II

Look for markers that match in both the
chromosome and the shorter sequence.
Markers Usually short, precisely defined
sequences
match!

5
Background III

How do we create the original map?
Generate fingerprints (markers) with
Restriction site mapping
Hybridization

6
Background IV

But why? Cant we just expand the sequence
assembly techniques weve already learned?
NO! (with one exception)
Why not?
A chromosome isnt just 150k bps long.
Human chromosomes range in length from 51
million to 245 million base pairs.

7
Overview

Background
Types of Mapping
Mathematical Models
Enhanced Double Digest Problem

8
Restriction Site Mapping

In this situation, the fingerprint is the length
between restriction sites of given enzymes
(recall from previous lectures).
Make three copies of target DNA strings A, B, C.
Apply one enzyme (a) to string A, another (ß) to
string B, and both (a and ß) to string C.
Line up the fragments in A and B so they match C
this is the double digest problem.

6 14
7
5
8 3
15
4
6 2 3
9 7 1
4
9
Restriction Site Mapping II

A variant is the partial digest approach
Use only one enzyme, but allow it to act for
different time periods. Different restriction
sites will be recognized.
Fragment sites 6, 20, 27, 32 14, 21, 26 7,
12 and 5

6 14
7
5
10
Restriction Site Mapping III

6
20
14
21
etc
Fragment sites 6, 20, 27, 32 14, 21, 26 7, 12
and 5

6
6 14

14
14
7
6
14 7
5
11
Hybridization Mapping

Check whether specific small sequences (called
probes) bind (hybridize) to fragments (clones)
The fingerprint is the subset of probes that
successfully hybridize to the clone.
If some portion of one clones fingerprint
matches another, they are likely to be from
overlapping regions of the target.

12
Hybridization Mapping II

Probes x, y, z, bound to clone A x, w and z
bound to clone B overlap in x and z.
y x z w
Except we dont know that much. We only know
which probes bind to which clones. Not ordering
or even relative lengths!

Background
Types of Mapping
Mathematical Models
Enhanced Double Digest Problem

14
Restriction Site Models

Back to the double digest problem weve split
the strings A, B, C into fragments with two
enzymes.
We have the multisets made up of the fragment
lengths
From previous example
A 5, 6, 7, 14
B 3, 4, 8, 15
C 1, 2, 3, 4, 6, 7, 9
Find permutations of A and B such that there is a
one-to-one correspondence between all the
subintervals and C. Not too bad, right?

15
Restriction Site Models II

BAD NEWS
The double digest problem is NP-complete. It is
a generalization of the set-partition problem,
already known to be NP-complete.
To give you an ideathe number of solutions is
(k-1)! for k number of restriction sites.
BUT we will see a heuristic later

16
Interval Graph Models

Model hybridization mapping in terms of interval
graphs
Interval graph A graph G which is mapped from a
series of intervals. For each interval there is
a vertex in G. For each intersection of
intervals there is an edge in G.

17
Interval Graph Models II

Ex a
b
c
d
e
b
a c
e
d

18
Interval Graph Models III

To apply this to the hybridization mapping
problem
We create graphs with vertices representing
clones (fragments), and edges representing
overlaps between clones.
Two graphs one for known overlaps and one with
known and unknown overlap information (neither
are necessarily interval graphs).

19
Interval Graph Models IV

Now find the true interval graph (a subgraph of
the known and unknown graph) given the two
graphs.
Known overlaps Known/Unknown
Actual Interval
Overlaps Graph
Hmm Not too easy.

20
Interval Graph Models V

MORE BAD NEWS!
This is NP-hard.
Maybe another model?
There are two other possible models for
hybridization mapping (described in the book).
But.
Those are NP-hard too!

21
Consecutive Ones Property

Were sick of NP hard problems. Give us
something a little easier.
The Consecutive Ones Property Model (C1P) can be
solved in linear time!
Assumptions
The probes are unique.
There are no errors. (!!)
All of the correspondences of clones and probes
have been found. (!!!)

22
Consecutive Ones Property II

Build a matrix (n x m), n of clones, m of
probes. Entry i,j is a binary code for whether
probe j hybridized to clone i.
above probe 1 hybridized to clone 1, probe 2
hybridized to clone 1, probe 1 hybridized to
clone 3, probe 4 hybridized to clone 3.

23
Consecutive Ones Property III

Find a permutation of the columns (probes) such
that all the 1s in each row (clone) are
consecutive.

24
Consecutive Ones Property IV

This algorithm can be run in linear time!
Unfortunately, the assumption that there are no
errors isnt useful because biology isnt a
mathematical model. Probes may not bind, DNA may
be replicated incorrectly.
And generalizations make the problem NP-hard
again!
We need a good heuristic

Background
Types of Mapping
Mathematical Models
Enhanced Double Digest Problem

26
Enhanced Double Digest Problem

The Enhanced Double Digest (EDD) problem is
NP-hard in the general case, but if the lengths
of fragments in C (the string acted upon by both
types of enzymes) are distinct, it can be solved
in linear time!
Why do the fragments have to be distinct?
What if all the fragments are the same length?

27
Problem Formulation

We have the multisets A and B.
A 6, 14, 7, 5
B 8, 3, 15, 4
Take the actual fragments corresponding to each
member of the either set (since the sets are only
lengths). Apply the other enzyme to the fragment
(i.e. apply enzyme ß to fragments from A, and
vice versa) to create subfragments.
ABi is the multiset of subfragments created by
applying enzyme ß to fragments from A BAj from
applying enzyme a to fragments of B.

28
Problem Formulation II

Example
8 2,6 5 1,4
A5, 6, 7, 14
B3, 4, 8, 15
AB11,4, AB26, AB37, AB42,3,9
BA13, BA24, BA32,6, BA41,7,9

6 14
7
5
8 3
15
4
6 2 3
9 7 1
4
29
Algorithm

Given A, B, ABi and BAj for all i, j, construct
an undirected graph that connects each element of
A and B to its corresponding AB/BA. Note that
all elements in C will be covered
A 5 6 7 14
C 1 2 3 4 6 7 9
B 3 4 8 15

30
Algorithm II

Create a spanning tree
Start at random node, follow all paths from the
node, dont repeat edges.
6A 6C 8B 2C 14A 9C
15B 1C 5A 4C 4B
3C 7C
3B 7A

31
Properties

The graph (G) will always be connected, and every
node in A and B will only be adjacent to nodes
from C. Each node from C connects to only one
node each from A and B.
If the problem can be solved G will be a
spanning tree, and any subtree that hangs on
the longest path will be a 2-node length path
(dangler).

32
Properties II

Danglers
6A 6C 8B 2C
14A 9C 15B 1C 5A
4C 4B
3C
7C
3B
7A

33
Algorithm III

If the graph G is not a spanning tree, and not
all subtrees hanged off the longest path are
danglers, then there is no valid permutation.
Perform Dangler-first search on the graph G

34
Dangler-First Search

Traverse G starting at one end of a path S with
the largest number of edges, reading only the
nodes from C. Whenever reaching a node with
degree greater than 2 (must have a dangler), read
the nodes in C from the hanging danglers first,
then continue to traverse S. This sequence is pc.

35
Dangler-First Search II

pc 6, 2, 3, 9, 7, 1, 4.
6A 6C 8B 2C
14A 9C 15B 1C 5A
4C 4B
3C
7C
3B
7A

36
Algorithm IV

The elements in each ABi form a consecutive
subsequence in pc. Likewise, the elements in
each BAj also form a consecutive subsequence in
pc.
This permutation is a valid permutation meaning
you have the answer!

37
Solution

A 6, 14, 7, 5
pc 6, 2, 3, 9, 7, 1, 4
B 8, 3, 15, 4
6A 6C 8B 2C
14A 9C 15B 1C 5A
4C 4B
3C
7C
3B
7A

6 14
7 5
8 3
15 4
6 2 3 9
7 1 4
38
Enhanced Double Digest Problem II

This can be solved in O(n) time!
A generalization (assuming a constant number of
duplicates) can also be solved in O(n) time.
The general enhanced double digest problem is
still NP-Hard. This can be shown by reduction
from the Hamiltonian Path problem.

Write a Comment

User Comments (0)