Title: Lecture 8.2: RNA
1Lecture 8.2 RNA
Jennifer Gardy Centre for Microbial Diseases and
Immunity Research University of British Columbia
jennifer_at_cmdr.ubc.ca
2Bad News Disclaimer
- Jenn is not an RNA researcher
3Good News Disclaimer
- All new RNA lecture! Now with less scary
algorithms! And no dynamic programming exercises!
AND NO MATH! NO, NO, NO, NO, NO, NO, NO MATH!
4Outline
- What is RNA?
- RNAs many roles in the cell
- Levels of RNA structure
- Secondary structure
- Elements
- Predictive methods for single and multiple
queries - Tertiary structure prediction
- RNA databases
- Finding functional RNAs in the genome
5What is RNA?
- Ribonucleic acid
- Ribose sugar
- vs DNAs deoxyribose
- Less stable
- Uracil base
- vs DNAs thymine
- Cheap to produce
- Single-stranded
http//en.wikipedia.org/wiki/RNA
6DNA vs RNA Structure
- Single strand
- RNA base pairing G-C, A-U, G-U and more
- Canonical, Watson-Crick
- GU wobble
- AU reverse Hoogsteen
- Double strand
- DNA base pairing G-C, A-T Canonical,
Watson-Crick
Single-stranded RNA will fold in on itself to
form all sorts of structures!
7From Structure to Function
- Single-stranded RNA can fold into a variety of
secondary and tertiary structures
RNA is able to play a number of functional roles
in the cell!
What are some of the types of RNA youve heard of?
8Types of RNA mRNA (Coding/Informational)
- Messenger RNA is the middleman between gene and
protein - RNA pol transcribes a gene into mRNA
- mRNA is processed and transported out of nucleus
- mRNA is translated into protein sequence at the
ribosome - Used transcripts are degraded
Molecular Cell Biology, 4Uh ed.
9Types of RNA ncRNAs (Functional)
- Non-coding RNAs are RNAs that carry out their
function without ever being translated into
protein - tRNA (transfer RNA)
- Anticodon recognizes triplet on mRNA
- Opposite end charged with specific amino acid
- rRNA (Ribosomal RNA)
- 80 of cells RNA
- Together with certain proteins, form the ribosome
Molecular Cell Biology, 4th ed.
10Other families of ncRNAs
- ncRNAs are involved in a variety of cellular
processes - Gene regulation (micro RNAs/miRNAs, riboswitches)
- miRNAs bind 3 UTR of specific gene to suppress
translation - Riboswitches are cis-acting elements in 3 UTR of
certain genes - Bind a target molecule, translation
up/down-regulated when bound - Modification of RNAs (small RNAs sn/sno/gRNAs)
- Catalysis of reactions (ribozymes)
- Protein trafficking (small cytosolic RNAs/scRNAs)
- Interference with protein synthesis (antisense,
RNAi)
11Take Home Messages
- The RNA world consists of MUCH more than mRNA
- Non-coding, or functional, RNAs (ncRNAs) perform
a number of important functions and are of
significant biological interest - Virtually all RNA bioinformatics centres around
the identification, analysis, and structure of
ncRNAs
12RNA Structure has Protein-like Hierarchy
Molecular Cell Biology, 4th ed.
13Levels of RNA Structure
- Primary sequence itself
- Single-stranded RNA
- Bases want to pair with each other
- Unusual pairings are possible
CGGUGCUCUUUCUGCCGCGCAGAAGAGCGGCGCGGAUCUGUUCGUUCUUG
CUGAGCACGGGCGUGGGAACCAGUGCCGGCUCGGGCGCCUCGACCUUGGC
GCUCUGUUUGAGGCUCAGGAAGCGCUUCUGCGCGACCAGCCGGUCUUCGG
CCGACAAGGGUUCGACCGGUUCGCCGUCGAUGUUGUGGCGCAUGGAAUCG
GGCUGGGCGCUGGCGAAAUAAUAGCGCCGGCAAUGGACAAAGGCAGCGGU
CGCGCGCCGGAGCGUCGUUACACCGGCCUCGGGCUUCAAGAGCGGACGCA
GCUCGUUGAAGAGGCCGACGGCGAAGGGAAGAACGGGAUCGCCGGGCUUG
GCCGGAAGCACGCCGACCGGCCGGAUCAACAUGCUGUUGAUCGCGUUGGC
CUUCUCCACGUCGAGCUCGGUCGCCGCAAUCGGCCCGCGGCUGAUCUUCC
AGGGUUUGUCCAUGCAUCCUCCAUAAAGGCCGACUGUGAUGGUAUCUUGA
CGGAUGGGGCAAUAGCGGUGCGGCCUGCAUAUUGCUAGCCCCGCUGCAAG
CCUGCGACGAGACGCCGUGUCCCGCGGAUAUAGGCCCGCCUUUCUCGGCC
GGCGAUUCUCUAUUCAAUCAAUUUUAUCGCCGAUUAUGCAUUGACCUCCA
GCAUCGAUUACACUUCUUAUCCGCGCCCAAGAUCAAUGCCGGCCGCGGGG
GAGGAUAUAUGCGGGUUUUCUCGAGCAUCGAUGAGCUGCGCCACACGCUC
GAUGCGCUCAAACGC
14Levels of RNA Structure
- Secondary structure
- Patterns formed by base pairing
15Levels of RNA Structure
- Tertiary 3D structure formed by interaction of
2 elements
16Secondary Structure is Interesting!
- Easy to study
- Secondary structure usually determined before
tertiary - Chemical modification assays chew up bases not
involved in specific secondary structure
interactions - ncRNA functions correlated to specific secondary
structure elements - E.g. hairpin loop shape involved in gene
regulation - Therefore, most RNA resources/methods are focused
on secondary structure - Visualization
- Prediction
- Annotation of elements
17Elements of Secondary Structure
Hairpin Loops Backbone makes 180 bend
18Elements of Secondary Structure
Internal Loops Pairing of both strands
interrupted equally
19Elements of Secondary Structure
Bulge Loops Pairing of one strand interrupted,
unequal
20Elements of Secondary Structure
Multibranch Loops AKA helical junction, joins
two or more stems with no bulge
21Elements of Secondary Structure
22Elements of Secondary Structure
23Visualizing 2 Structure
- Dot-bracket
- () paired bases (stems)
- . unpaired base (loops)
GUUUGGUUCAAAAC ((((......))))
24Secondary Structure Prediction
- Two categories of 2 prediction methods
- Ones that take a single sequence as input
- Mfold, Vienna, RNAStructure, Sfold
- Ones that require multiple (aligned) input
sequences - Infernal, ConStruct, Alifold, Pfold, FOLDALIGN,
Dynalign - Important to both categories is the idea of free
energy minimization
Molecules fold to achieve the lowest energy state
possible (minimum free energy, or MFE) Given a
set of potential structures, those with the
lowest free energies are most stable and most
likely to be found in a cell.
25Calculating a 2 Structures Free Energy
- Computed using nearest-neighbour parameters
- Each possible pair of neighbouring structural
elements has an associated free energy value
(kcal/mol) - Negative values good, found in stable,
base-paired stems - Positive values bad, found in loops and bulges
- Sum all values over structure to get overall free
energy
262 Structure Prediction Approach 1 MFE
- Used when you have a single sequence as input
- Forms the basis of the tools Mfold, Vienna,
RNAStructure and Sfold - Basic principle
- Generate a series of possible secondary
structures - Calculate the free energy of each
- Returns lowest energy structure(s)
- Two possible implementations
- Naïve MFE
- Dynamic programming MFE
27Naïve MFE Prediction
- Fold the query RNA into ALL possible secondary
structures - Calculate free energy for each structure
- Problem A 50-base RNA can have over
- Naïve MFE prediction is virtually never used
- Reminiscent of database searching problem
- Solution Heuristic approach that breaks down the
RNA sequence
5000 BILLION POSSIBLE STRUCTURES!!!
28Dynamic Programming MFE Prediction
- Break RNA query into small subsequences
- Generate possible secondary structures for
subsequences, select lowest free energy
substructure - Combine substructures into overall structure
using dynamic programming
S. Eddy, Nat. Biotech. 2004
29Dynamic Programming MFE Prediction
- Typically yields one lowest energy secondary
structure and a number of suboptimal structures - Not the lowest energy, but still energetically
favourable - Benefits of DPMFE
- Only needs single sequence as input, fast
- Pitfalls of DPMFE
- Correctly predicts structure of only 50-70 of
bases in a given RNA - Thermodynamic parameters have 5-10 error rate
- Many known secondary structures are not the
lowest free energy may be within 5-10 kcal/mol - Lowest free energy structure not always
biologically correct - Can improve structure predictions with constraint
information - This residue must base pair with this residue
302 Structure Prediction Approach 2 Comparative
Sequence Analysis
- Used when you have multiple RELATED input RNAs
- Two underlying principles
Different RNA sequences (different primary
structures) can fold into IDENTICAL secondary and
tertiary structures. A ncRNAs structure and
function is maintained throughout evolution. A
mutation in one member of a pair of interacting
residues necessitates a change in the other
member of the pair. These are called compensatory
base changes (CBCs).
31The Two Principles in Graphical Form
- Different RNA sequences (different primary
structures) can fold into IDENTICAL secondary and
tertiary structures. - A mutation in one member of a pair of interacting
residues necessitates a change in the other
member.
GUUUGGUUCAAAAC ((((......))))
GGAUGGUUCAAUCC ((((......))))
GUUUGGUUCAAAAC ((((......))))
GGAUGGUUCAAUCC ((((......))))
32From Alignment to 2 Structure
- Comparative sequence analysis (CSA) methods
require an alignment of related RNAs as input - Gaps and conserved columns are removed, leaving
only variant columns - Variant residues in sequence 1 that might base
pair are noted - Check for covariance could these also base pair
in seqs 2 and 3? - Provides constraint information
GUUUGG-UCAAAAC GGAUGGUUCAAUCC GCAGGGU-CAAUGC
4/5 columns contain residues that may pair
33CSA/Covariance Prediction Methods
- Constraint information derived from covariance
analysis is combined with energy minimization and
dynamic programming to generate a final
prediction - Disclaimer this is a very simplified
explanation. In reality, this type of analysis
requires knowledge of concepts from information
theory and math that are very, very scary indeed
34Ha!
35CSA/Covariance Methods Scary but Useful
- Infernal, ConStruct, Alifold, Pfold, FOLDALIGN,
Dynalign do the scary stuff for you - Requires multiple related sequences for input,
but provides MUCH better predictions - Limited to 500-base input sequence
- Secondary structures of as many as 97 of bases
in a given RNA are correctly predicted using this
method - Vs. 50-70 with basic MFE methods
36Tertiary Structure Prediction
- How do we go from 2 structure to 3 structure?
- We dont. Well, not easily, anyway.
37Why is 3 Structure Prediction So Hard?
- Relative lack of 3D RNA structures available
- NMR
- Small loops
- Practical limit of 50 base pairs
- Few automated methods mean that 3 structure
prediction requires lots of user guidance and
knowledge about the field - Methods produce coarse-grain resolution
structures major features predicted correctly,
finer atomic-level contacts incorrect
383 Structure Prediction MC-SYM
- Developed by Francois Major, U. Montreal
- Uses information derived from known 3D RNA
structures to build a series of models - Assemble bases into structures matching known
elements - Avoid elements not found in known structures
- Can incorporate constraint information
- Requires complex input
- MC-SYM script
- Generates PDB files as output
39 Sample MC-SYM Script
sequence (r 31 ACUGAAGAU) // Conformations
-------------------------------------------
residue ( 31 helix 1 39 helix 1
32 38 type_A 15 ) // Relations
----------------------------------------------- co
nnect ( 31 33 stack 20 33 34 ! stack
20 34 39 stack 20 ) pair (31 39 wct
1) // Building ----------------------------------
-------------- anticodon backtrack ( (31
39) (39 38 37 36 35 34) (31 32 33) ) //
Constraint ---------------------------------------
------- adjacency (anticodon 1.0 2.5) res_clash
( anticodon fixed_distance 1.0 all
no_hydrogen ) // Exploration
---------------------------------------------
explore ( anticodon rmsd (1.0 base_only
no_hydrogen) file_pdb ("ANTI/anti-04d.pdb"
zipped) )
40RNA Databases
- NAR Database Portal lists 51 RNA DBs
- http//www3.oup.co.uk/nar/database/cat/2
RNA databases tend to be specialized
41Rfam RNA Families Database
- http//www.sanger.ac.uk/Software/Rfam/
42An Rfam Entry Represents 1 RNA Family
- Class of ncRNA with specific function and 2
structure
Annotation
2 structure
CSA/Covariance alignments
Lit refs
43Rfam CSA/Covariance Structures
- Colour blocks show which bases pair with each
other - Red with red, blue with blue, green with green
- Dot-bracket notation also helps in visualization
44Searching Rfam
- Keyword search
- Name of RNA or any word in annotation (function,
interactor) - E.g. Spot 42, spf, regulation, galactose, OxyS
- EMBL ID search
- BLAST search
- lt 2kb sequence allowed
- Browse
- Gene, cis-reg, intron
- Genomes
- Rfam ncRNAs identified in many genomes
- How many families occur how many times in my
genome?
45SCOR Structural Classification of RNA
- http//scor.lbl.gov/scor.html
46SCOR Structural Hierarchy
47SCOR Structural Classification of RNA
- http//scor.lbl.gov/scor.html
48SCOR Functional Hierarchy
49RNABase 3D Structures of RNA
50RNABase
- Daily download of RNA structures from PDB and NDB
- X-ray crystallography NMR
- Provides annotations to go with structures
- Measures of structure quality
- Can be searched/browsed by category, keyword,
technique, resolution, structure quality
51Finding ncRNAs in the Genome
- Rfam currently contains 503 families why so
few? - Rapid, accurate computational identification of
ncRNAs from genome sequence is not a trivial task - Most available information was derived in the
laboratory - Three approaches
- Similarity search
- Transcription prediction
- Comparative genome analysis
52Similarity Searching
- BLAST/FASTA primary sequence alignments dont
work very well - 4-letter alphabet (low information content)
- RNA structure is more important than sequence
- How can we incorporate structure information into
database similarity searching? - Need good secondary structure predictions
- Need a good alignment scoring method that
properly weights sequence and structural
contributions - Stochastic context-free grammars
- Conceptually related to HMMs, good over very long
distances - Computationally intensive
- Will need heuristics and improved computer power
53Transcription Prediction
- Gene-finding programs for DNA sequences look for
signals indicating transcription initiation,
termination and processing events could we do
the same for RNA? - It would be difficult
- ncRNA signals are not as strong as gene signals
- ncRNAs dont show statistically significant
biases in nucleotide composition - Some ncRNAs are not transcribed at all, they are
excised out of introns - Different ncRNAs are processed by different RNA
polymerases
54Comparative Genome Analysis
- Most successful approach to date
- Compare 2 species
- Identify regions that are conserved between the
two species that are not involved in
protein-coding - Look for conserved secondary structure elements
in these regions
55Take Home Messages
- RNA is a unique biomolecule and requires unique
computational analysis methods - ncRNAs play a number of important roles in the
cell and are an area of increasing research
interest - ncRNA function depends primarily on 2 structure
- Many methods for 2 structure predictions and
structures themselves are available, however few
3D RNA structures are available - Many specialized and general-interest RNA
databases are available over the web - The identification of novel ncRNAs in the genome
requires improved computational approaches