Fire - PowerPoint PPT Presentation

About This Presentation
Title:

Fire

Description:

Tandem Repeats (TR's) in ... Approximate tandem repeat (ATR) if the copies of the motif ... Tandem repeats play a role in gene regulation and contribute to ... – PowerPoint PPT presentation

Number of Views:126
Avg rating:3.0/5.0
Slides: 20
Provided by: uni6171
Category:
Tags: fire | tandem

less

Transcript and Presenter's Notes

Title: Fire


1
FireµSatAn Algorithm to Detect Tandem
Repeats in DNA
2
Introduction
  • What are tandem repeats in DNA?
  • How are we going to detect tandem repeats in DNA?
  • Why would anybody want to detect tandem repeats
    in DNA?

3
Genetic sequences
  • DNA consists of four different nucleotides,
    namely
  • Adenine (A) Guanine (G)
  • Cytosine (C) Thiamine (T)
  • Genetic databanks e.g. Genbank, Emboss and Entrez
    stores DNA sequences as concatenated single
    letter codes in FASTA format.

4
Tandem Repeats (TRs) in genome sequences
  • DNA molecules are subject to numerous mutational
    events. One of the consequences of these events
    that can be detected by computationally analyzing
    genome sequences is tandem duplication.
  • A TR or TR-zone is a string of DNA molecules that
    is characterized by a certain motif that
    introduces the string, contiguously followed by a
    number of copies of the motif, e.g.,
    ACGACGACGACGACG

5
Tandem Repeats
  • Perfect tandem repeat (PTR) if the copies are
    exact e.g. ACGACGACGACGACG, hence five copies of
    the motif ACG.
  • Approximate tandem repeat (ATR) if the copies of
    the motif include non-exact copies, thus
    mutational events have, most likely occurred e.g.
    ACGACACGAGGACGAG.
  • In the absence of further qualification,
    reference to a tandem repeat should be construed
    as a reference to either a PTR or an ATR.

6
Tandem Repeat Elements
  • A PTR element (PTRE) is a TR element that matches
    the motif. If the motif is for example ACG then
    the PTRE will also be ACG.
  • An ATR element (ATRE) is a TR element similar to
    the motif but not an exact copy thereof. If the
    motif is ACG then an ATRE may for example be AC.

7
Microsatellites
  • The length of PTREs may vary satellites,
    minisatellites and microsatellites
  • Microsatellites is a subset of TRs
  • (conforming to Benson, Delgrange, Rivals
    Abajian)

8
Formal problem statement
  • A PTR whose motif is ? is repeated p times where
    p 1, is denoted by ?p. An ATR u that is
    derived from this PTR ?p must always have the
    motif (?) as its prefix. It therefore has the
    form ?u2up where each ATRE, uk(k 2p), is the
    result of at most e mutations on ?. Here e is the
    so called motif error.
  • Besides the restrictions applicable to the motif
    error threshold values are also introduced that
    manipulate the attributes of the detected TR.

9
Tolerated error types
  • Errors regarding the motif or PTRE (motif
    errors)
  • deletions
  • mismatches
  • insertions
  • Errors related to the detected TR (TR errors)
  • in terms of the ratio between PTREs and ATREs
  • the minimum number TREs to be reported
  • the maximum number of ATREs consecutively

10
Motif errors
  • Maximum of 50 error toleration
  • If ? 2 or ? 3 then ? 0 or ? 1
  • (default 1)
  • If ? 4 or ? 5 then ? 0 ? 1 or ? 2
  • (default 2)
  • Consider ACGTT then ACT will be an ATRE where two
    deletions have occurred.

11
Motif errors Types of Mutations
  • Deletion
  • Refers to the absence of a base pair in the
    motif.
  • Insertion
  • An ATRE with up to e base pairs inserted
    into any position of the PTRE.
  • Mismatch
  • Refers to the replacement of a base pair in
    the motif by another.

12
Detected TR errors the substring error
  • The substring error
  • where is the maximum substring error allowed
    and
  • (n_d x p_d) (n_i x p_i) (n_m x p_m)
    n_ptre
  • where
  • n_d number of deletions
  • n_i number of insertions
  • n_m number of mismatches
  • p_d penalty allocated to deletions
  • p_i penalty allocated to insertions
  • p_m penalty allocated to mismatches

13
Detected TR errors the minimum number of TREs
  • tn_tre tn_ptre tn_atre
  • tn_tre
  • the default value for 2
  • to prevent the output of unwanted data

14
Detected TR errors the maximum number of
consecutive ATREs
  • tn_atreC
  • tn_atreC is incremented for every ATRE read
  • tn_atreC is set to zero whenever a PTRE is read
  • the default of tn_atreC is 0

15
DeletionRefers to the absence of a base pair in
the motif
FAD(ACG,1)
16
MismatchRefers to the replacement of a base pair
in the motif by another.
FAm(ACG,1)
17
High-level Descriptionof FireµSat
  • generateWords(?,e) generates a set of all words
  • of length ?Length from the alphabet
  • S A,C,G,T.
  • createFATR(?,e) returns FATR(?,e) as discussed.
  • findIndices(gSeq, FATR, t, a, ß, p_m, p_d, p_i)
    returns a set of index pairs in gSeq of an
    identified TR.
  • the TR is such that it complies with the
    constraints specified by t, a, ß. Various
    counters have to be updated to ensure correct
    output.

18
Why does anybody want to detect TRs in DNA?
  • The cause of several human diseases can be traced
    to having too many copies of a certain nucleotide
    triplet.
  • TRs play a role in the development of immune
    system cells.
  • TRs serves as genetic markers in plant and
    animal species.
  • Tandem repeats play a role in gene regulation and
    contribute to the breeding of disease resistant
    cultivars.

19
Conclusion
  • A new theoretical approach to detect TRs in DNA
    has been introduced. The time complexity of
    FireµSat is linear in gSeq.
  • The practical implementation of FireµSat is in
    progress. The following matters constitute a
    future research agenda
  • the performance of FireµSat
  • the possibility of reducing FATR
  • and, if successful, the latter results could
    suggest ways of adapting FireµSat to detect
    minisatellites and satellites as well.
Write a Comment
User Comments (0)
About PowerShow.com