MicroRNA identification based on sequence and structure alignment - PowerPoint PPT Presentation

About This Presentation
Title:

MicroRNA identification based on sequence and structure alignment

Description:

Both the candidate sequence and it's reverse complement are analyzed by RNA fold ... Sequence similarity score between the candidate and known mature miRNAs is ... – PowerPoint PPT presentation

Number of Views:261
Avg rating:3.0/5.0
Slides: 20
Provided by: neeta4
Category:

less

Transcript and Presenter's Notes

Title: MicroRNA identification based on sequence and structure alignment


1
MicroRNA identification based on sequence
andstructure alignment
Xiaowo Wang, Jing Zhang, Fei Li, Jin Gu, Tao
He, Xuegong Zhang and Yanda Li
  • Presented by -
  • Neeta Jain

2
Outline
  • Introduction
  • Motivation
  • Experiment
  • Materials
  • Methods
  • Results
  • Conclusion

3
Introduction
  • What are miRNAs and why are they important?
  • miRNAs are 22 nt long non-coding RNAs
  • They are derived from their 70 nt precursors,
    which typically have a hairpin structure
  • Importance of miRNAs
  • They are found to regulate the expression of
    target genes via complementary base pair
    interactions.

4
Motivation
  • Since miRNAs are short (22 nt), conventional
    sequence alignment methods can only find
    relatively close homologues
  • It has been reported that miRNA genes are more
    conserved in their secondary structure than in
    primary structure
  • This paper exploits this secondary structure
    conservation and proposes a novel computational
    approach to detect miRNAs based on both sequence
    and structure alignment
  • The authors devised a tool miRAlign and have
    compared its performance with existing searching
    methods such as BLAST and ERPIN

5
Experiment
  • Materials
  • Reference sets
  • Consists of 1298 miRNAs from 12 species out of
    which 1054 were animal miRNAs.
  • 1054 animal miRNAs and their precursors(1104)
    composed our raw training set Train_All.
  • Train_Sub_1 All animal miRNAs except those from
    C.briggsae
  • Train_Sub_2 All animal miRNAs except those from
    C.briggsae and C.elegans
  • Genomic sequences
  • Sequences of 6 species were used.

6
  • Methods
  • Preprocessing
  • Known precursors from training set are used to
    BLAST against the genome
  • Potential regions are cut from the genome with 70
    nt flanking sequences to each end
  • Such regions are scanned using a 100nt window
    with 10 nt step
  • Overlapping sequences with repeat sequences are
    discarded.

7
  • Methods (contd)
  • miRAlign
  • Secondary Structure Prediction
  • Both the candidate sequence and its reverse
    complement are analyzed by RNA fold to predict
    hairpins.
  • Only hairpins with MFE lower than -20 kcal/mol
    are retained.
  • Pairwise sequence alignment
  • Sequences from previous step are aligned pairwise
    to all the 22 nt known miRNA sequences from the
    training set
  • Sequence similarity score between the candidate
    and known mature miRNAs is calculated by
    CLUSTALW.
  • If the score exceeds a user-defined threshold,
    then the candidate to known miRNA pairs are kept
    for further analysis

8
  • Methods (contd)
  • Checking miRNAs position on stemloop
  • 3 properties for miRNAs position are considered
  • Should not locate on terminal loop of hairpin
  • Should locate on the same arm of hairpin
  • Position of potential miRNA on hairpin should not
    differ too much from its known homologues
  • Position difference of miRNA on precursors A and
    B

9
  • Methods (contd)
  • RNA secondary structure alignment
  • RNAforester computes pairwise structure alignment
    and gives similarity score
  • Score is a summation of all base (base pair)
    match (insertion, deletion).
  • Normalized similarity score of structure C and m
    is given as

where, C Candidate sequence m known
pre-miRNA sigma_local(C,m) raw local alignment
score between C and m Sigma(m,m) self-alignment
score of m
10
  • Methods (contd)
  • Total similarity score
  • After aligning all potential homologue pairs, a
    total similarity score (tss) is assigned to each
    candidate sequence.

Where, C- candidate sequence R set composed
of all Cs
11
Methods (contd)
Summary -
12
Results
  • Application on C.briggsae
  • Detection of miRNA homologues -
  • miRAlign was applied on C.briggsaes data with
    training
  • set Train_Sub_1 and sensitivity and specificity
    were recorded.
  • Identification of miRNAs in distantly related
    species -
  • miRAlign was applied on C.briggsaes data with
    training
  • set Train_Sub_1 and sensitivity and specificity
    were recorded

13
Results (contd)
Graph 1 -
14
Results (contd)
Graph 2 -
15
Results (contd)
Comparison of miRAlign with BLAST -
16
Results (contd)
Comparison of miRAlign with ERPIN -
17
Results (contd)
  • Other results
  • miRAlign was applied to A. gambiae and 59
    putative miRNAs with tss gt 35 were detected .
    This was validated when 38 A. gambiae miRNAs were
    reported in the MicroRNA registry 6.0 and 37 of
    them were covered by miRAlign
  • miRAlign was also applied to plant, Zea mays and
    detected 28 out of 40 known Zea Mays miRNAs.

18
Conclusion
  • Combining sequence and structure alignments,
    miRAlign has better performance than previously
    reported homologue search methods
  • Although, mirAlign was based on animal data, the
    miRNAs predicted in Zea mays indicates that
    miRAlign can be applied to plants. Further
    investigation regarding this is underway.

19
THANK YOU
Questions ??
Write a Comment
User Comments (0)
About PowerShow.com