Physical Mapping of DNA - PowerPoint PPT Presentation

About This Presentation
Title:

Physical Mapping of DNA

Description:

... 'hangs' on the longest path will be a 2-node length path (dangler) ... Dangler-First Search II. pc= 6, 2, 3, 9, 7, 1, 4. 6A 6C 8B 2C 14A 9C 15B 1C 5A 4C 4B ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 39
Provided by: Y860
Category:

less

Transcript and Presenter's Notes

Title: Physical Mapping of DNA


1
Physical Mapping of DNA
  • Shanna Terry
  • March 2, 2004

2
Overview
  • Background
  • Types of Mapping
  • Mathematical Models
  • Enhanced Double Digest Problem

3
Background
  • Given a sequence of DNA, how do we figure out
    where on some larger chromosome the sequence
    lies?
  • ?

4
Background II
  • Look for markers that match in both the
    chromosome and the shorter sequence.
  • Markers Usually short, precisely defined
    sequences
  • match!

5
Background III
  • How do we create the original map?
  • Generate fingerprints (markers) with
  • Restriction site mapping
  • Hybridization

6
Background IV
  • But why? Cant we just expand the sequence
    assembly techniques weve already learned?
  • NO! (with one exception)
  • Why not?
  • A chromosome isnt just 150k bps long.
  • Human chromosomes range in length from 51
    million to 245 million base pairs.

7
Overview
  • Background
  • Types of Mapping
  • Mathematical Models
  • Enhanced Double Digest Problem

8
Restriction Site Mapping
  • In this situation, the fingerprint is the length
    between restriction sites of given enzymes
    (recall from previous lectures).
  • Make three copies of target DNA strings A, B, C.
  • Apply one enzyme (a) to string A, another (ß) to
    string B, and both (a and ß) to string C.
  • Line up the fragments in A and B so they match C
    this is the double digest problem.

6 14
7
5
8 3
15
4
6 2 3
9 7 1
4
9
Restriction Site Mapping II
  • A variant is the partial digest approach
  • Use only one enzyme, but allow it to act for
    different time periods. Different restriction
    sites will be recognized.
  • Fragment sites 6, 20, 27, 32 14, 21, 26 7,
    12 and 5

6 14
7
5
10
Restriction Site Mapping III
  • 6
  • 20
  • 14
  • 21
  • etc
  • Fragment sites 6, 20, 27, 32 14, 21, 26 7, 12
    and 5

6
6 14

14
14
7
6
14 7
5
11
Hybridization Mapping
  • Check whether specific small sequences (called
    probes) bind (hybridize) to fragments (clones)
  • The fingerprint is the subset of probes that
    successfully hybridize to the clone.
  • If some portion of one clones fingerprint
    matches another, they are likely to be from
    overlapping regions of the target.

12
Hybridization Mapping II
  • Probes x, y, z, bound to clone A x, w and z
    bound to clone B overlap in x and z.
  • y x z w
  • Except we dont know that much. We only know
    which probes bind to which clones. Not ordering
    or even relative lengths!

13
  • Background
  • Types of Mapping
  • Mathematical Models
  • Enhanced Double Digest Problem

14
Restriction Site Models
  • Back to the double digest problem weve split
    the strings A, B, C into fragments with two
    enzymes.
  • We have the multisets made up of the fragment
    lengths
  • From previous example
  • A 5, 6, 7, 14
  • B 3, 4, 8, 15
  • C 1, 2, 3, 4, 6, 7, 9
  • Find permutations of A and B such that there is a
    one-to-one correspondence between all the
    subintervals and C. Not too bad, right?

15
Restriction Site Models II
  • BAD NEWS
  • The double digest problem is NP-complete. It is
    a generalization of the set-partition problem,
    already known to be NP-complete.
  • To give you an ideathe number of solutions is
    (k-1)! for k number of restriction sites.
  • BUT we will see a heuristic later

16
Interval Graph Models
  • Model hybridization mapping in terms of interval
    graphs
  • Interval graph A graph G which is mapped from a
    series of intervals. For each interval there is
    a vertex in G. For each intersection of
    intervals there is an edge in G.

17
Interval Graph Models II
  • Ex a
  • b

  • c
  • d
    e
  • b
  • a c
    e
  • d

18
Interval Graph Models III
  • To apply this to the hybridization mapping
    problem
  • We create graphs with vertices representing
    clones (fragments), and edges representing
    overlaps between clones.
  • Two graphs one for known overlaps and one with
    known and unknown overlap information (neither
    are necessarily interval graphs).

19
Interval Graph Models IV
  • Now find the true interval graph (a subgraph of
    the known and unknown graph) given the two
    graphs.
  • Known overlaps Known/Unknown
    Actual Interval
  • Overlaps Graph
  • Hmm Not too easy.

20
Interval Graph Models V
  • MORE BAD NEWS!
  • This is NP-hard.
  • Maybe another model?
  • There are two other possible models for
    hybridization mapping (described in the book).
    But.
  • Those are NP-hard too!

21
Consecutive Ones Property
  • Were sick of NP hard problems. Give us
    something a little easier.
  • The Consecutive Ones Property Model (C1P) can be
    solved in linear time!
  • Assumptions
  • The probes are unique.
  • There are no errors. (!!)
  • All of the correspondences of clones and probes
    have been found. (!!!)

22
Consecutive Ones Property II
  • Build a matrix (n x m), n of clones, m of
    probes. Entry i,j is a binary code for whether
    probe j hybridized to clone i.
  • above probe 1 hybridized to clone 1, probe 2
    hybridized to clone 1, probe 1 hybridized to
    clone 3, probe 4 hybridized to clone 3.

23
Consecutive Ones Property III
  • Find a permutation of the columns (probes) such
    that all the 1s in each row (clone) are
    consecutive.

24
Consecutive Ones Property IV
  • This algorithm can be run in linear time!
  • Unfortunately, the assumption that there are no
    errors isnt useful because biology isnt a
    mathematical model. Probes may not bind, DNA may
    be replicated incorrectly.
  • And generalizations make the problem NP-hard
    again!
  • We need a good heuristic

25
  • Background
  • Types of Mapping
  • Mathematical Models
  • Enhanced Double Digest Problem

26
Enhanced Double Digest Problem
  • The Enhanced Double Digest (EDD) problem is
    NP-hard in the general case, but if the lengths
    of fragments in C (the string acted upon by both
    types of enzymes) are distinct, it can be solved
    in linear time!
  • Why do the fragments have to be distinct?
  • What if all the fragments are the same length?

27
Problem Formulation
  • We have the multisets A and B.
  • A 6, 14, 7, 5
  • B 8, 3, 15, 4
  • Take the actual fragments corresponding to each
    member of the either set (since the sets are only
    lengths). Apply the other enzyme to the fragment
    (i.e. apply enzyme ß to fragments from A, and
    vice versa) to create subfragments.
  • ABi is the multiset of subfragments created by
    applying enzyme ß to fragments from A BAj from
    applying enzyme a to fragments of B.

28
Problem Formulation II
  • Example
  • 8 2,6 5 1,4
  • A5, 6, 7, 14
  • B3, 4, 8, 15
  • AB11,4, AB26, AB37, AB42,3,9
  • BA13, BA24, BA32,6, BA41,7,9

6 14
7
5
8 3
15
4
6 2 3
9 7 1
4
29
Algorithm
  • Given A, B, ABi and BAj for all i, j, construct
    an undirected graph that connects each element of
    A and B to its corresponding AB/BA. Note that
    all elements in C will be covered
  • A 5 6 7 14
  • C 1 2 3 4 6 7 9
  • B 3 4 8 15

30
Algorithm II
  • Create a spanning tree
  • Start at random node, follow all paths from the
    node, dont repeat edges.
  • 6A 6C 8B 2C 14A 9C
    15B 1C 5A 4C 4B
  • 3C 7C
  • 3B 7A

31
Properties
  • The graph (G) will always be connected, and every
    node in A and B will only be adjacent to nodes
    from C. Each node from C connects to only one
    node each from A and B.
  • If the problem can be solved G will be a
    spanning tree, and any subtree that hangs on
    the longest path will be a 2-node length path
    (dangler).

32
Properties II
  • Danglers
  • 6A 6C 8B 2C
    14A 9C 15B 1C 5A
    4C 4B
  • 3C
    7C
  • 3B
    7A

33
Algorithm III
  • If the graph G is not a spanning tree, and not
    all subtrees hanged off the longest path are
    danglers, then there is no valid permutation.
  • Perform Dangler-first search on the graph G

34
Dangler-First Search
  • Traverse G starting at one end of a path S with
    the largest number of edges, reading only the
    nodes from C. Whenever reaching a node with
    degree greater than 2 (must have a dangler), read
    the nodes in C from the hanging danglers first,
    then continue to traverse S. This sequence is pc.

35
Dangler-First Search II
  • pc 6, 2, 3, 9, 7, 1, 4.
  • 6A 6C 8B 2C
    14A 9C 15B 1C 5A
    4C 4B
  • 3C
    7C
  • 3B
    7A

36
Algorithm IV
  • The elements in each ABi form a consecutive
    subsequence in pc. Likewise, the elements in
    each BAj also form a consecutive subsequence in
    pc.
  • This permutation is a valid permutation meaning
    you have the answer!

37
Solution
  • A 6, 14, 7, 5
  • pc 6, 2, 3, 9, 7, 1, 4
  • B 8, 3, 15, 4
  • 6A 6C 8B 2C
    14A 9C 15B 1C 5A
    4C 4B
  • 3C
    7C
  • 3B
    7A

6 14
7 5
8 3
15 4
6 2 3 9
7 1 4
38
Enhanced Double Digest Problem II
  • This can be solved in O(n) time!
  • A generalization (assuming a constant number of
    duplicates) can also be solved in O(n) time.
  • The general enhanced double digest problem is
    still NP-Hard. This can be shown by reduction
    from the Hamiltonian Path problem.
Write a Comment
User Comments (0)
About PowerShow.com