Lecture 8.2: RNA - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

Lecture 8.2: RNA

Description:

... secondary structures for subsequences, select lowest free energy substructure. Combine substructures into overall structure using dynamic programming: S. Eddy, ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 56
Provided by: stephe78
Category:

less

Transcript and Presenter's Notes

Title: Lecture 8.2: RNA


1
Lecture 8.2 RNA
Jennifer Gardy Centre for Microbial Diseases and
Immunity Research University of British Columbia
jennifer_at_cmdr.ubc.ca
2
Bad News Disclaimer
  • Jenn is not an RNA researcher

3
Good News Disclaimer
  • All new RNA lecture! Now with less scary
    algorithms! And no dynamic programming exercises!
    AND NO MATH! NO, NO, NO, NO, NO, NO, NO MATH!

4
Outline
  • What is RNA?
  • RNAs many roles in the cell
  • Levels of RNA structure
  • Secondary structure
  • Elements
  • Predictive methods for single and multiple
    queries
  • Tertiary structure prediction
  • RNA databases
  • Finding functional RNAs in the genome

5
What is RNA?
  • Ribonucleic acid
  • Ribose sugar
  • vs DNAs deoxyribose
  • Less stable
  • Uracil base
  • vs DNAs thymine
  • Cheap to produce
  • Single-stranded

http//en.wikipedia.org/wiki/RNA
6
DNA vs RNA Structure
  • Single strand
  • RNA base pairing G-C, A-U, G-U and more
  • Canonical, Watson-Crick
  • GU wobble
  • AU reverse Hoogsteen
  • Double strand
  • DNA base pairing G-C, A-T Canonical,
    Watson-Crick

Single-stranded RNA will fold in on itself to
form all sorts of structures!
7
From Structure to Function
  • Single-stranded RNA can fold into a variety of
    secondary and tertiary structures

RNA is able to play a number of functional roles
in the cell!
What are some of the types of RNA youve heard of?
8
Types of RNA mRNA (Coding/Informational)
  • Messenger RNA is the middleman between gene and
    protein
  • RNA pol transcribes a gene into mRNA
  • mRNA is processed and transported out of nucleus
  • mRNA is translated into protein sequence at the
    ribosome
  • Used transcripts are degraded

Molecular Cell Biology, 4Uh ed.
9
Types of RNA ncRNAs (Functional)
  • Non-coding RNAs are RNAs that carry out their
    function without ever being translated into
    protein
  • tRNA (transfer RNA)
  • Anticodon recognizes triplet on mRNA
  • Opposite end charged with specific amino acid
  • rRNA (Ribosomal RNA)
  • 80 of cells RNA
  • Together with certain proteins, form the ribosome

Molecular Cell Biology, 4th ed.
10
Other families of ncRNAs
  • ncRNAs are involved in a variety of cellular
    processes
  • Gene regulation (micro RNAs/miRNAs, riboswitches)
  • miRNAs bind 3 UTR of specific gene to suppress
    translation
  • Riboswitches are cis-acting elements in 3 UTR of
    certain genes
  • Bind a target molecule, translation
    up/down-regulated when bound
  • Modification of RNAs (small RNAs sn/sno/gRNAs)
  • Catalysis of reactions (ribozymes)
  • Protein trafficking (small cytosolic RNAs/scRNAs)
  • Interference with protein synthesis (antisense,
    RNAi)

11
Take Home Messages
  • The RNA world consists of MUCH more than mRNA
  • Non-coding, or functional, RNAs (ncRNAs) perform
    a number of important functions and are of
    significant biological interest
  • Virtually all RNA bioinformatics centres around
    the identification, analysis, and structure of
    ncRNAs

12
RNA Structure has Protein-like Hierarchy
Molecular Cell Biology, 4th ed.
13
Levels of RNA Structure
  • Primary sequence itself
  • Single-stranded RNA
  • Bases want to pair with each other
  • Unusual pairings are possible

CGGUGCUCUUUCUGCCGCGCAGAAGAGCGGCGCGGAUCUGUUCGUUCUUG
CUGAGCACGGGCGUGGGAACCAGUGCCGGCUCGGGCGCCUCGACCUUGGC
GCUCUGUUUGAGGCUCAGGAAGCGCUUCUGCGCGACCAGCCGGUCUUCGG
CCGACAAGGGUUCGACCGGUUCGCCGUCGAUGUUGUGGCGCAUGGAAUCG
GGCUGGGCGCUGGCGAAAUAAUAGCGCCGGCAAUGGACAAAGGCAGCGGU
CGCGCGCCGGAGCGUCGUUACACCGGCCUCGGGCUUCAAGAGCGGACGCA
GCUCGUUGAAGAGGCCGACGGCGAAGGGAAGAACGGGAUCGCCGGGCUUG
GCCGGAAGCACGCCGACCGGCCGGAUCAACAUGCUGUUGAUCGCGUUGGC
CUUCUCCACGUCGAGCUCGGUCGCCGCAAUCGGCCCGCGGCUGAUCUUCC
AGGGUUUGUCCAUGCAUCCUCCAUAAAGGCCGACUGUGAUGGUAUCUUGA
CGGAUGGGGCAAUAGCGGUGCGGCCUGCAUAUUGCUAGCCCCGCUGCAAG
CCUGCGACGAGACGCCGUGUCCCGCGGAUAUAGGCCCGCCUUUCUCGGCC
GGCGAUUCUCUAUUCAAUCAAUUUUAUCGCCGAUUAUGCAUUGACCUCCA
GCAUCGAUUACACUUCUUAUCCGCGCCCAAGAUCAAUGCCGGCCGCGGGG
GAGGAUAUAUGCGGGUUUUCUCGAGCAUCGAUGAGCUGCGCCACACGCUC
GAUGCGCUCAAACGC
14
Levels of RNA Structure
  • Secondary structure
  • Patterns formed by base pairing

15
Levels of RNA Structure
  • Tertiary 3D structure formed by interaction of
    2 elements

16
Secondary Structure is Interesting!
  • Easy to study
  • Secondary structure usually determined before
    tertiary
  • Chemical modification assays chew up bases not
    involved in specific secondary structure
    interactions
  • ncRNA functions correlated to specific secondary
    structure elements
  • E.g. hairpin loop shape involved in gene
    regulation
  • Therefore, most RNA resources/methods are focused
    on secondary structure
  • Visualization
  • Prediction
  • Annotation of elements

17
Elements of Secondary Structure
Hairpin Loops Backbone makes 180 bend
18
Elements of Secondary Structure
Internal Loops Pairing of both strands
interrupted equally
19
Elements of Secondary Structure
Bulge Loops Pairing of one strand interrupted,
unequal
20
Elements of Secondary Structure
Multibranch Loops AKA helical junction, joins
two or more stems with no bulge
21
Elements of Secondary Structure
22
Elements of Secondary Structure
23
Visualizing 2 Structure
  • Graphical
  • Dot-bracket
  • () paired bases (stems)
  • . unpaired base (loops)

GUUUGGUUCAAAAC ((((......))))
24
Secondary Structure Prediction
  • Two categories of 2 prediction methods
  • Ones that take a single sequence as input
  • Mfold, Vienna, RNAStructure, Sfold
  • Ones that require multiple (aligned) input
    sequences
  • Infernal, ConStruct, Alifold, Pfold, FOLDALIGN,
    Dynalign
  • Important to both categories is the idea of free
    energy minimization

Molecules fold to achieve the lowest energy state
possible (minimum free energy, or MFE) Given a
set of potential structures, those with the
lowest free energies are most stable and most
likely to be found in a cell.
25
Calculating a 2 Structures Free Energy
  • Computed using nearest-neighbour parameters
  • Each possible pair of neighbouring structural
    elements has an associated free energy value
    (kcal/mol)
  • Negative values good, found in stable,
    base-paired stems
  • Positive values bad, found in loops and bulges
  • Sum all values over structure to get overall free
    energy

26
2 Structure Prediction Approach 1 MFE
  • Used when you have a single sequence as input
  • Forms the basis of the tools Mfold, Vienna,
    RNAStructure and Sfold
  • Basic principle
  • Generate a series of possible secondary
    structures
  • Calculate the free energy of each
  • Returns lowest energy structure(s)
  • Two possible implementations
  • Naïve MFE
  • Dynamic programming MFE

27
Naïve MFE Prediction
  • Fold the query RNA into ALL possible secondary
    structures
  • Calculate free energy for each structure
  • Problem A 50-base RNA can have over
  • Naïve MFE prediction is virtually never used
  • Reminiscent of database searching problem
  • Solution Heuristic approach that breaks down the
    RNA sequence

5000 BILLION POSSIBLE STRUCTURES!!!
28
Dynamic Programming MFE Prediction
  • Break RNA query into small subsequences
  • Generate possible secondary structures for
    subsequences, select lowest free energy
    substructure
  • Combine substructures into overall structure
    using dynamic programming

S. Eddy, Nat. Biotech. 2004
29
Dynamic Programming MFE Prediction
  • Typically yields one lowest energy secondary
    structure and a number of suboptimal structures
  • Not the lowest energy, but still energetically
    favourable
  • Benefits of DPMFE
  • Only needs single sequence as input, fast
  • Pitfalls of DPMFE
  • Correctly predicts structure of only 50-70 of
    bases in a given RNA
  • Thermodynamic parameters have 5-10 error rate
  • Many known secondary structures are not the
    lowest free energy may be within 5-10 kcal/mol
  • Lowest free energy structure not always
    biologically correct
  • Can improve structure predictions with constraint
    information
  • This residue must base pair with this residue

30
2 Structure Prediction Approach 2 Comparative
Sequence Analysis
  • Used when you have multiple RELATED input RNAs
  • Two underlying principles

Different RNA sequences (different primary
structures) can fold into IDENTICAL secondary and
tertiary structures. A ncRNAs structure and
function is maintained throughout evolution. A
mutation in one member of a pair of interacting
residues necessitates a change in the other
member of the pair. These are called compensatory
base changes (CBCs).
31
The Two Principles in Graphical Form
  • Different RNA sequences (different primary
    structures) can fold into IDENTICAL secondary and
    tertiary structures.
  • A mutation in one member of a pair of interacting
    residues necessitates a change in the other
    member.

GUUUGGUUCAAAAC ((((......))))
GGAUGGUUCAAUCC ((((......))))
GUUUGGUUCAAAAC ((((......))))


GGAUGGUUCAAUCC ((((......))))
32
From Alignment to 2 Structure
  • Comparative sequence analysis (CSA) methods
    require an alignment of related RNAs as input
  • Gaps and conserved columns are removed, leaving
    only variant columns
  • Variant residues in sequence 1 that might base
    pair are noted
  • Check for covariance could these also base pair
    in seqs 2 and 3?
  • Provides constraint information

GUUUGG-UCAAAAC GGAUGGUUCAAUCC GCAGGGU-CAAUGC




4/5 columns contain residues that may pair
33
CSA/Covariance Prediction Methods
  • Constraint information derived from covariance
    analysis is combined with energy minimization and
    dynamic programming to generate a final
    prediction
  • Disclaimer this is a very simplified
    explanation. In reality, this type of analysis
    requires knowledge of concepts from information
    theory and math that are very, very scary indeed

34
Ha!
35
CSA/Covariance Methods Scary but Useful
  • Infernal, ConStruct, Alifold, Pfold, FOLDALIGN,
    Dynalign do the scary stuff for you
  • Requires multiple related sequences for input,
    but provides MUCH better predictions
  • Limited to 500-base input sequence
  • Secondary structures of as many as 97 of bases
    in a given RNA are correctly predicted using this
    method
  • Vs. 50-70 with basic MFE methods

36
Tertiary Structure Prediction
  • How do we go from 2 structure to 3 structure?
  • We dont. Well, not easily, anyway.

37
Why is 3 Structure Prediction So Hard?
  • Relative lack of 3D RNA structures available
  • NMR
  • Small loops
  • Practical limit of 50 base pairs
  • Few automated methods mean that 3 structure
    prediction requires lots of user guidance and
    knowledge about the field
  • Methods produce coarse-grain resolution
    structures major features predicted correctly,
    finer atomic-level contacts incorrect

38
3 Structure Prediction MC-SYM
  • Developed by Francois Major, U. Montreal
  • Uses information derived from known 3D RNA
    structures to build a series of models
  • Assemble bases into structures matching known
    elements
  • Avoid elements not found in known structures
  • Can incorporate constraint information
  • Requires complex input
  • MC-SYM script
  • Generates PDB files as output

39
Sample MC-SYM Script
sequence (r 31 ACUGAAGAU) // Conformations
-------------------------------------------
residue ( 31 helix 1 39 helix 1
32 38 type_A 15 ) // Relations
----------------------------------------------- co
nnect ( 31 33 stack 20 33 34 ! stack
20 34 39 stack 20 ) pair (31 39 wct
1) // Building ----------------------------------
-------------- anticodon backtrack ( (31
39) (39 38 37 36 35 34) (31 32 33) ) //
Constraint ---------------------------------------
------- adjacency (anticodon 1.0 2.5) res_clash
( anticodon fixed_distance 1.0 all
no_hydrogen ) // Exploration
---------------------------------------------
explore ( anticodon rmsd (1.0 base_only
no_hydrogen) file_pdb ("ANTI/anti-04d.pdb"
zipped) )
40
RNA Databases
  • NAR Database Portal lists 51 RNA DBs
  • http//www3.oup.co.uk/nar/database/cat/2

RNA databases tend to be specialized
41
Rfam RNA Families Database
  • http//www.sanger.ac.uk/Software/Rfam/

42
An Rfam Entry Represents 1 RNA Family
  • Class of ncRNA with specific function and 2
    structure

Annotation
2 structure
CSA/Covariance alignments
Lit refs
43
Rfam CSA/Covariance Structures
  • Colour blocks show which bases pair with each
    other
  • Red with red, blue with blue, green with green
  • Dot-bracket notation also helps in visualization

44
Searching Rfam
  • Keyword search
  • Name of RNA or any word in annotation (function,
    interactor)
  • E.g. Spot 42, spf, regulation, galactose, OxyS
  • EMBL ID search
  • BLAST search
  • lt 2kb sequence allowed
  • Browse
  • Gene, cis-reg, intron
  • Genomes
  • Rfam ncRNAs identified in many genomes
  • How many families occur how many times in my
    genome?

45
SCOR Structural Classification of RNA
  • http//scor.lbl.gov/scor.html

46
SCOR Structural Hierarchy
47
SCOR Structural Classification of RNA
  • http//scor.lbl.gov/scor.html

48
SCOR Functional Hierarchy
49
RNABase 3D Structures of RNA
  • http//www.rnabase.org

50
RNABase
  • Daily download of RNA structures from PDB and NDB
  • X-ray crystallography NMR
  • Provides annotations to go with structures
  • Measures of structure quality
  • Can be searched/browsed by category, keyword,
    technique, resolution, structure quality

51
Finding ncRNAs in the Genome
  • Rfam currently contains 503 families why so
    few?
  • Rapid, accurate computational identification of
    ncRNAs from genome sequence is not a trivial task
  • Most available information was derived in the
    laboratory
  • Three approaches
  • Similarity search
  • Transcription prediction
  • Comparative genome analysis

52
Similarity Searching
  • BLAST/FASTA primary sequence alignments dont
    work very well
  • 4-letter alphabet (low information content)
  • RNA structure is more important than sequence
  • How can we incorporate structure information into
    database similarity searching?
  • Need good secondary structure predictions
  • Need a good alignment scoring method that
    properly weights sequence and structural
    contributions
  • Stochastic context-free grammars
  • Conceptually related to HMMs, good over very long
    distances
  • Computationally intensive
  • Will need heuristics and improved computer power

53
Transcription Prediction
  • Gene-finding programs for DNA sequences look for
    signals indicating transcription initiation,
    termination and processing events could we do
    the same for RNA?
  • It would be difficult
  • ncRNA signals are not as strong as gene signals
  • ncRNAs dont show statistically significant
    biases in nucleotide composition
  • Some ncRNAs are not transcribed at all, they are
    excised out of introns
  • Different ncRNAs are processed by different RNA
    polymerases

54
Comparative Genome Analysis
  • Most successful approach to date
  • Compare 2 species
  • Identify regions that are conserved between the
    two species that are not involved in
    protein-coding
  • Look for conserved secondary structure elements
    in these regions

55
Take Home Messages
  • RNA is a unique biomolecule and requires unique
    computational analysis methods
  • ncRNAs play a number of important roles in the
    cell and are an area of increasing research
    interest
  • ncRNA function depends primarily on 2 structure
  • Many methods for 2 structure predictions and
    structures themselves are available, however few
    3D RNA structures are available
  • Many specialized and general-interest RNA
    databases are available over the web
  • The identification of novel ncRNAs in the genome
    requires improved computational approaches
Write a Comment
User Comments (0)
About PowerShow.com