Aligning Multiple Genome Sequences With the Threaded Blockset Aligner - PowerPoint PPT Presentation

About This Presentation
Title:

Aligning Multiple Genome Sequences With the Threaded Blockset Aligner

Description:

Alignment between the chloroplast genomes of Arabidopsis thaliana(???? ) and ... a threaded blockset for the chloroplast genomes of Arabidopsis thaliana(a) and ... – PowerPoint PPT presentation

Number of Views:208
Avg rating:3.0/5.0
Slides: 30
Provided by: csieN
Category:

less

Transcript and Presenter's Notes

Title: Aligning Multiple Genome Sequences With the Threaded Blockset Aligner


1
Aligning Multiple Genome Sequences With the
Threaded Blockset Aligner
  • Blanchette, W., Kent, W.J., Riemer, C., Elnitski,
    L., Smit, A.F.A., Roskin, K.M., Baertsch, R.,
    Rosenbloom, K., Clawson, H., Green, E.D.,
    Haussler, D., and Miller, W.
  • Genome Research 2004

2
Outline
  • Introduction
  • TBA
  • MULTIZ
  • How TBA was built
  • Evaluation of alignment accuracy
  • Accuracy of the Multiple Alignments
  • Experiment results

3
Introduction
  • Reference Sequence Idea
  • A sequence is fixed as the reference to which all
    other sequences are compared

S1 A T G C T C S2 A G A G C S3 T T C T
G S4 A T T G C A T G C
S1 D(S1,S2) D(S1,S3) D(S1,S4) 9 S2
D(S2,S1) D(S2,S3) D(S2,S4) 12 S3 D(S3,S1)
D(S3,S2) D(S3,S4) 12 S4 D(S4,S1)
D(S4,S2) D(S4,S3) 11
S1 A T - G C - T - C S2 A - -
G A - G - C S3 - T - T C - T -
G S4 A T T G C A T G C
S1 A T G C T C S2 A - G A
G C
S1 A T G C T C S2 A - G A
G C S3 - T T C T G
Efficient methods for multiple sequence alignment
with guaranteed error bounds, Gusfield, D., Bull.
Math. Biol., 1993, Vol. 55, pp. 141-54.
4
  • Benefit
  • Simplicity
  • Drawbacks
  • Regions conserved in a subset of the species, but
    absent from the reference sequence, are not
    identified.
  • Alignments generated with different reference
    sequences may be inconsistent.
  • Inconsistent
  • Two positions that are aligned to each other
    using one reference sequence might be aligned to
    different positions when another reference
    sequence is chosen.

S1 A T - G C - T - C S2 A - -
G A - G - C S3 - T - T C - T -
G S4 A T T G C A T G C
S1 A T G C T C S2 A G A G C S3 T T C T
G S4 A T T G C A T G C
5
TBA
  • Threaded Blockset Aligner
  • Block
  • A local alignment of the sequences
  • Blockset
  • A set of Blocks

6
TBA
h human (400bp) m mouse (400bp) r rat
(350bp)
7
TBA
  • Thread
  • A sequence S threads a blockset if every position
    in the sequence S appears exactly once in some
    block of the blockset.
  • Threaded blockset
  • A blockset is threaded by each of the original
    sequences.

8
TBA
h human (400bp) m mouse (400bp) r rat
(350bp)
9
  • Ref-blockset
  • A Blockset where every block has a row from a
    particular sequence which is designated as the
    reference for that ref-blockset.
  • Projection
  • Given a thread blockset,
  • generate an S-ref blockset for any sequence S.

10
TBA
  • Any two ref-blocksets generated by projection
    from the same threaded blockset are consistent.

11
TBA
  • Threaded Blockset Aligner
  • TBA produces a set of blocks in which each
    position in the given sequences to be aligned
    appears once and only once.
  • Any detected match among some or all of the
    sequences is represented among the blocks, and
    mutually consistent reference-sequence alignments
    can be extracted at will.

12
  • Alignment between the chloroplast genomes of
    Arabidopsis thaliana(???? ) and Oenothera
    elata(??? ) by PipMaker.
  • Blocks of a threaded blockset for the chloroplast
    genomes of Arabidopsis thaliana(a) and Oenothera
    elata(p).

13
Applying TBA to vertebrate HOX clusters
Tilapia
Mammals
Fish
14
Applying TBA to vertebrate HOX clusters
Human
Mammals
Fish
15
  • Assumption
  • The matching regions occur in the same order and
    orientation in all species.
  • Partial order
  • For a sequence S, Ss segments in block A
    precedes Ss segments in block B, and we say that
    block A precedes block B.
  • Local alignment
  • Pairwise alignment BLATZ
  • Three or more sequences alignments MULITZ

16
MULTIZ
  • Deals with alignments between three or more
    sequences .
  • MULTIZ
  • Merge two blocksets by assistance of another
    guiding blockset.
  • HUMOR
  • A specialized version of MULTIZ used in The Rat
    Genome Sequencing Consortium 2003.s

17
How does it work?
18
How does it work? Cont.
  • Proceeds in order along S (The reference for G, M
    and the output).
  • Access the corresponding (to Ss position)
    portion of N according to G.
  • Collect each aligned columns.

19
HUMOR
  • Stands for Human-Mouse-Rat
  • Starts with pairwise human-ref blocksets for
    human-mouse and for human-rat.
  • Trims columns from the ends of the blocks to make
    the human components identical.
  • Aligns the mouse and rat intervals to each
    other.
  • Aligns the human interval to the resulting
    mouse-rat block.

20
How TBA was built
21
(No Transcript)
22
Evaluation of Alignment Accuracy
  • Simulate sequence evolution, starting with some
    ancestral sequence and performing mutation along
    the branches of a predetermined phylogenetic
    tree.
  • Use the agreement between the truth and the
    result as a scoring method.

23
Accuracy of the Multiple Alignments (9 Mammals)
24
Accuracy of the Multiple Alignments (H,M,R)
25
Experimental results
  • Accuracy of the closely related sequences is
    better than more diverged ones.
  • TBA uniformly stands out for the more diverged
    pairs.
  • For most programs, their accuracy increases when
    therere smaller number of species (indicates
    improvement, more species should have more
    information).

26
Experimental results
  • MULTIZ suffers mouse-rat alignment.
  • Human-rat is also slightly worse than the
    human-mouse alignment because rat is aligned to
    human only through mouse.
  • Score of 1.0 may be impossible to achieve,
    because a certain information is lost during
    sequence evolution.
  • Score of 1.0 is usually not necessary, some
    errors are inconsequential.

27
Experimental results
  • Running Time
  • Only the four programs (MULTIZ, TBA, MAVID,
    MLAGAN) actually designed for aligning large
    regions run fast enough.
  • MAVID super fast!

28
  • Thank you

29
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com