Title: MicroRNA identification based on sequence and structure alignment
1MicroRNA identification based on sequence
andstructure alignment
Xiaowo Wang, Jing Zhang, Fei Li, Jin Gu, Tao
He, Xuegong Zhang and Yanda Li
- Presented by -
- Neeta Jain
2Outline
- Introduction
- Motivation
- Experiment
- Materials
- Methods
- Results
- Conclusion
3Introduction
- What are miRNAs and why are they important?
- miRNAs are 22 nt long non-coding RNAs
- They are derived from their 70 nt precursors,
which typically have a hairpin structure
- Importance of miRNAs
- They are found to regulate the expression of
target genes via complementary base pair
interactions.
4Motivation
- Since miRNAs are short (22 nt), conventional
sequence alignment methods can only find
relatively close homologues - It has been reported that miRNA genes are more
conserved in their secondary structure than in
primary structure - This paper exploits this secondary structure
conservation and proposes a novel computational
approach to detect miRNAs based on both sequence
and structure alignment - The authors devised a tool miRAlign and have
compared its performance with existing searching
methods such as BLAST and ERPIN
5Experiment
- Materials
- Reference sets
- Consists of 1298 miRNAs from 12 species out of
which 1054 were animal miRNAs. - 1054 animal miRNAs and their precursors(1104)
composed our raw training set Train_All. - Train_Sub_1 All animal miRNAs except those from
C.briggsae - Train_Sub_2 All animal miRNAs except those from
C.briggsae and C.elegans - Genomic sequences
- Sequences of 6 species were used.
6- Methods
- Preprocessing
- Known precursors from training set are used to
BLAST against the genome - Potential regions are cut from the genome with 70
nt flanking sequences to each end - Such regions are scanned using a 100nt window
with 10 nt step - Overlapping sequences with repeat sequences are
discarded.
7- Methods (contd)
- miRAlign
- Secondary Structure Prediction
- Both the candidate sequence and its reverse
complement are analyzed by RNA fold to predict
hairpins. - Only hairpins with MFE lower than -20 kcal/mol
are retained. - Pairwise sequence alignment
- Sequences from previous step are aligned pairwise
to all the 22 nt known miRNA sequences from the
training set - Sequence similarity score between the candidate
and known mature miRNAs is calculated by
CLUSTALW. - If the score exceeds a user-defined threshold,
then the candidate to known miRNA pairs are kept
for further analysis
8- Methods (contd)
- Checking miRNAs position on stemloop
- 3 properties for miRNAs position are considered
- Should not locate on terminal loop of hairpin
- Should locate on the same arm of hairpin
- Position of potential miRNA on hairpin should not
differ too much from its known homologues - Position difference of miRNA on precursors A and
B
9- Methods (contd)
- RNA secondary structure alignment
- RNAforester computes pairwise structure alignment
and gives similarity score - Score is a summation of all base (base pair)
match (insertion, deletion). - Normalized similarity score of structure C and m
is given as
where, C Candidate sequence m known
pre-miRNA sigma_local(C,m) raw local alignment
score between C and m Sigma(m,m) self-alignment
score of m
10- Methods (contd)
- Total similarity score
- After aligning all potential homologue pairs, a
total similarity score (tss) is assigned to each
candidate sequence.
Where, C- candidate sequence R set composed
of all Cs
11Methods (contd)
Summary -
12Results
- Application on C.briggsae
- Detection of miRNA homologues -
- miRAlign was applied on C.briggsaes data with
training - set Train_Sub_1 and sensitivity and specificity
were recorded. - Identification of miRNAs in distantly related
species - - miRAlign was applied on C.briggsaes data with
training - set Train_Sub_1 and sensitivity and specificity
were recorded -
13Results (contd)
Graph 1 -
14Results (contd)
Graph 2 -
15Results (contd)
Comparison of miRAlign with BLAST -
16Results (contd)
Comparison of miRAlign with ERPIN -
17Results (contd)
- Other results
- miRAlign was applied to A. gambiae and 59
putative miRNAs with tss gt 35 were detected .
This was validated when 38 A. gambiae miRNAs were
reported in the MicroRNA registry 6.0 and 37 of
them were covered by miRAlign - miRAlign was also applied to plant, Zea mays and
detected 28 out of 40 known Zea Mays miRNAs.
18Conclusion
- Combining sequence and structure alignments,
miRAlign has better performance than previously
reported homologue search methods - Although, mirAlign was based on animal data, the
miRNAs predicted in Zea mays indicates that
miRAlign can be applied to plants. Further
investigation regarding this is underway.
19THANK YOU
Questions ??