Gene Ontology (GO) - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Gene Ontology (GO)

Description:

C E N T R F O R I N T E G R A T I V E E B I O I N F O R M A T I C S V U Master Course DNA/Protein Structure-function Analysis and Prediction Lecture 12 DNA/RNA ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 32
Provided by: VictorAS
Category:

less

Transcript and Presenter's Notes

Title: Gene Ontology (GO)


1
Master CourseDNA/Protein Structure-function
Analysis and PredictionLecture 12DNA/RNA
Structure Prediction
2
Epigenectics Epigenomics Gene Expression
  • Transcription factors (TF) are essential for
    transcription initialisation
  • Transcription is done by polymerase type II
    (eukaryotes)
  • mRNA must then move from nucleus to ribosomes
    (extranuclear) for translation
  • In eukaryotes there can be many TF-binding sites
    upstream of an ORF that together regulate
    transcription
  • Nucleosomes (chromatin structures composed of
    histones) are structures round of which DNA
    coils. This blocks access of TFs

3
Epigenectics Epigenomics Gene Expression
TF binding site (closed)
mRNA transcription
TATA
Nucleosome
TF binding site (open)
4
Expression
  • Because DNA has flexibility, bound TFs can move
    in order to interact with pol II, which is
    necessary for transcription initiation (see next
    slide)
  • Recent TF-based initialisation theory includes a
    wave function (Carlsberg) of TF-binding, which is
    supposed to go from left to right. In this way
    the TF-binding site nearest to the TATA box would
    be bound by a TF which will then in turn bind Pol
    II.
  • It has been suggested that Speckles have
    something to do with this (speckels are observed
    protein plaques in the nucleus)
  • Current prediction methods for gene
    co-expression, e.g. finding a single shared TF
    binding site, do not take this TF cooperativity
    into account (parking lot optimisation)

5
Expression..
Speckel
6
DNA/RNA Structure-Function relationships
  • Apart from coding for proteins via genes, DNA is
    now known to code for many more RNA-based cell
    components (snRNA, rRNA,..)
  • The importance of structural features of DNA
    (e.g. bendability, binding histones, methylation)
    is becoming ever more important.
  • For the many different classes of RNA molecules,
    structure is directly causing function
  • It is therefore important to analyse and predict
    DNA structure, but particularly, RNA structure

7
Canonical base pairs
The complementary bases, C-G and A-U form stable
base pairs with each other through the creation
of hydrogen bonds between donor and acceptor
sites on the bases. These are called Watson-Crick
base pairs and are also referred to as canonical
base pairs. In addition, we consider the weaker
G-U wobble pair, where the bases bond in a skewed
fashion. Other base pairs also occur, some of
which are stable. These are all called
non-canonical base pairs.
8
RNA secondary structure
The secondary structure of an RNA molecule is the
collection of base pairs that occur in its
3-dimensional structure. An RNA sequence will be
represented as R r1, r1, r2, r3,, rn, where ri
is called the ith (ribo)nucleotide. Each ri
belongs to the set a,c,g,u.                    
      .
9
Secondary Structure and Pseudoknots
  • A secondary structure, or folding, on R is a set
    S of ordered pairs, written as i-j, satisfying
  • j - i gt 4
  • If i-j  and i-j are 2 base pairs, (assuming
    without loss in generality that i ? i ), then
    either
  • i  i  and j  j  (they are the same base
    pair),
  • i ? j ? i ? j (i-j precedes i-j), or
  • i ? i ? j  ? j (i-j includes
    i-j)                     

The last condition excludes pseudoknots. These
occur when 2 base pairs, i-j  and i-j, satisfy
i ? i ? j  ? j.
10
Pseudoknots
Pseudoknots are not taken into account in
secondary structure prediction because energy
minimizing methods cannot deal with them. It is
not known how to assign energies to the loops
created by pseudoknots and dynamic programming
methods that compute minimum energy structures
break down. For this reason, pseudoknots are
often considered as belonging to tertiary
structure. However, pseudoknots are real and
important structural features. However,
covariance methods (next slide) are able to
predict them from aligned, homologous RNA
sequences. The Figure on the next slide
represents a small pseudoknot model.
11
A 3D model of a pseudoknot
12
  • A 3D model of a pseudoknot
  • The 2 helices in the structure (preceding slide)
    are stacked coaxially.
  • RNA structure can be predicted from sequence
    data. There are two basic routes.
  • The first attempts structure prediction of single
    sequences based on minimizing the free energy of
    folding.
  • The second computes common foldings for a family
    of aligned, homologous RNAs. Usually, the
    alignment and secondary structure inference must
    be performed simultaneously, or at least
    iteratively (see next slide)

13
Predicting RNA Secondary Structure
  • By Thermodynamics Method
  • Minimize Gibbs Free Energy
  • By Phylogenetic Comparison Method (Covariance
    method)
  • Compare RNA Sequences of Identical Function From
    Different Organisms
  • By Combination of the Above Two Methods
  • In principle, this could be the most powerful
    method

14
Thermodynamics
  • Gibbs Free Energy, G
  • Describes the energetics of biomolecules in
    aqueous solution. The change in free energy, ?G,
    for a chemical process, such as nucleic acid
    folding, can be used to determine the direction
    of the process
  • ?G0 equilibrium
  • ?Ggt0 unfavorable process
  • ?Glt0 favorable process
  • Thus the natural tendency for biomolecules in
    solution is to minimize free energy of the entire
    system (biomolecules solvent).

15
Thermodynamics
  •  ?G ?H - T?S
  • ?H is enthalpy, ?S is entropy, and T is the
    temperature in Kelvin.
  • Molecular interactions, such as hydrogen bonds,
    van der Waals and electrostatic interactions
    contribute to the ?H term. ?S describes the
    change of order of the system.
  • Thus, both molecular interactions as well as the
    order of the system determine the direction of a
    chemical process.
  • For any nucleic acid solution, it is extremely
    difficult to calculate the free energy from first
    principle
  • Biophysical methods can be used to measure free
    energy changes

16
Thermodynamics
The Equilibrium Partition Function
  • For a population of structures S, a partition
    function Q and the probability for a particular
    folding, s can be calculated
  • The heat capacity for the RNA can be obtained
  • and
  • Heat capacity Cp (heat required to change
    temperature by 1 degree) can be measured
    experimentally, and can then be used to get
    information on G

is probability
17
Zukers Energy Minimization Method (mFOLD)
  • An RNA Sequence is called R r1,r2,r3rn, where
    ri is the ith ribonucleotide and it belongs to a
    set of A, U, G, C
  • A secondary structure of R is a set S of base
    pairs, i.j, which satisfies
  • 1ltiltjltn
  • j-igt4 (cant have loop containing less than 4
    nucleotides)
  • If i,j and i.j are two basepairs, (assume i lt
    i), then either
  • i i and j j (same base pair)
  • i lt j lt i lt j (i.j proceeds i.j) or
  • i lt i lt jlt j (i.j includes i. j) (this
    excludes pseudoknots which is iltiltjltj)
  • If e(i,j) is the energy for the base pair i.j,
    the total energy for R is
  • The objective is to minimize E(S).

5
3
18
Zukers Energy Minimization Method (mFOLD)
Free Energy Parameters
  • Extensive database of free energies for the
    following RNA units has been obtained (so called
    Tinoco Rules and Turner Rules)
  • Single Strand Stacking energy
  • Canonical (AU GC) and non-canonical (GU)
    basepairs in duplexes
  • Still lacking accurate free energy parameters for
  • Loops
  • Mismatches (AA, CA etc)
  • Using these energy parameters, the current
    version of mFOLD can predict 73
    phylogenetically deduced secondary structures.

19
Dynamic Programming (mFOLD)
  • An Example of W(i,j)
  • A matrix W(i,j) is computed that is dependent on
    the experimentally measured basepair energy
    e(i,j)
  • Recursion begins with i1, jn
  • If W(i1,j)W(i,j), then i is not paired. Set
    ii1 and start the recursion again.
  • If W(i,j-1)W(i,j), then j is not paired. Set
    jj-1 and start the recursion again.
  • If W(i,j)W(i,k)W(k1,j) , the fragment k1,j
    gets put on a stack and the fragment ik is
    analyzed by setting j k and going back to the
    recursion beginning.
  • If W(i,j)e(i,j)W(i1,j-1), a basepair is
    identified and is added to the list by setting
    ii1 and jj-1

20
Suboptimal Folding (mFOLD)
  • For any sequence of N nucleotides, the expected
    number of structures is greater than 1.8N
  • A sequence of 100 nucleotides has 3x1025
    foldings. If a computer can calculate 1000
    strs./s-1, it would take 1015 years!
  • mFOLD generates suboptimal foldings whose free
    energy fall within a certain range of values.
    Many of these structures are different in trivial
    ways. These suboptimal foldings can still be
    useful for designing experiments.

21
A computer predicted folding of Bacillus subtilis
RNase P RNA
These three representations are equivalent..
22
Secondary Structure Prediction for Aligned RNA
Sequences
  • Both energy as well as RNA sequence covariation
    can be combined to predict RNA secondary
    structures
  • To quantify sequence covariation, let fi(X) be
    the frequency of base X at aligned position I and
    fij(XY) be the frequency of finding X in i and Y
    in j, the mutual information score is (Chiu
    Kolodziejczak and Gutell Woese)
  • if for instance only GC and GU pairs at
    positions i and j then Mij0.
  • The total energy for RNA is set to a linear
    combination of measured free energy plus the
    covariance contribution

23
Other Secondary Prediction Methods
  • Nusinov algorithm (historically important),
    Hogeweg and Hesper (1984)
  • Vienna http//www.tbi.univie.ac.at/ivo/RNA/
  • uses the same recursive method in searching the
    folding space
  • Added the option of computing the population of
    RNA secondary structures by the equilibrium
    partition function
  • Specific heat of an RNA can be calculated by
    numerical differentiation from the equilibrium
    partition function
  • RNACADhttp//www.cse.ucsc.edu/research/compbio/ss
    urrna.html
  • An effort in improving multiple RNA sequence
    alignment by taking into account both primary as
    well secondary structure information
  • Use Stochastic Context-Free Grammars (SCFGs), an
    extension of hidden Markov models (HMMs) method
  • Bundschuh, R., and Hwa, T. (1999) RNA secondary
    structure formation A solvable model of
    heteropolymer folding. PHYSICAL REVIEW LETTERS
    83, 1479-1482.
  • This work treats RNA as heteropolymer and uses a
    simplified Go-like model to provide an exact
    solution for RNA transition between its native
    and molten phases.

24
Running mFOLD
  • http//bioinfo.math.rpi.edu/mfold/rna/form1.cgi
  • Constraints can be entered
  • force bases i,i1,...,ik-1 to be double stranded
    by enteringF   i   0   k on 1 line in the
    constraint box.
  • force consecutive base pairs i.j,i1.j-1,
    ...,ik-1.j-k1 by enteringF   i   j   k on 1
    line in the constraint box.
  • force bases i,i1,...,ik-1 to be single stranded
    by enteringP   i   0   k on 1 line in the
    constraint box.
  • prohibit the consecutive base pairs i.j,i1.j-1,
    ...,ik-1.j-k1 by enteringP   i   j   k on 1
    line in the constraint box.
  • prohibit bases i to j from pairing with bases k
    to l by enteringP   i-j   k-l on 1 line in the
    constraint box.

25
Running mFOLD5-CUUGGAUGGGUGACCACCUGGG-3
No constraint F 1 21 2 entered
26
Predicting RNA 3D Structures
  • Currently available RNA 3D structure prediction
    programs make use the fact that a tertiary
    structure is built upon preformed secondary
    structures
  • So once a solid secondary structure can be
    predicted, it is possible to predict its 3D
    structure
  • The chances of obtaining a valid 3D structure can
    be increased by known space constraints among the
    different secondary segments (e.g. cross-linking,
    NMR results).
  • However, there are far less thermodynamic data on
    3-D RNA structures which makes 3-D structure
    prediction challenging.

27
Mc-Sym
  • Mc-Sym uses backtracking method to solve a
    general problem in computer science called the
    constraint satisfaction problem (CSP)
  • Backtracking algorithm organizes the search space
    as a tree where each node corresponds to the
    application of an operator
  • At each application, if the partially folded RNA
    structure is consistent with its RNA
    conformational database, the next operator is
    applied, otherwise the entire attached branch is
    pruned and the algorithm backtracks to the
    previous node.

28
Mc-Sym (Continued)
  • The selection of a spanning tree for a particular
    RNA is left to the user, but it is suggested that
    the nucleotides imposing the most constraints are
    introduced first
  • Users also supply a particular Mc-Sym
    conformation for each nucleotide. These
    conformers are derived from currently available
    3D databases

29
Mc-Sym (Continued)
Sample script SEQUENCE 1 A
r GAAUGCCUGCGAGCAUCCC DECLARE
1 helixA 2 helixA
3 helixA 4 helixA
5 helixA 6 helixA
19 helixA
  • RELATIONS
  • 18 helix 19
  • 17 helix 18
  • 16 helix 17
  • .
  • 5 helix 6
  • 4 helix 5
  • 3 helix 4
  • 2 helix 3
  • 1 helix 2
  • BUILD
  • 19 18 17 16 15 14
    13 12
  • 12 11 10 9 8 7 6
    5
  • 4 3 2 1
  • CONSTRAINTS

30
RNA-protein Interactions
  • There is currently no computational method that
    can predict the RNA-protein interaction
    interfaces
  • Statistical methods have been applied to identify
    structure features at the protein-RNA interface.
    For instance, ENTANCLE finds that most atoms
    contributed from a protein to recogonizing an RNA
    are from main chains (C, O, N, H), not from side
    chains! But much remains to be done
  • Electrostatic potential has primary importance in
    protein-RNA recognition due to the negatively
    charged phosphate backbones. Efforts are made to
    quantify electrostatic potential at the molecular
    surface of a protein and RNA in order to predict
    the site of RNA interaction. This often provides
    good prediction at least for the site on the
    protein.

31
References
  • Predicting RNA secondary structures
  • good reviews
  • 1. Turner, D. H., and Sugimoto, N. (1988) RNA
    structure prediction. Annu Rev Biophys Biophys
    Chem 17, 167-92.
  • 2. Zuker, M. (2000) Calculating nucleic acid
    secondary structure. Curr Opin Struct Biol 10,
    303-10.
  • Obtaining experimental thermodynamics parameters
  • 3. Xia, T., SantaLucia, J., Jr., Burkard, M.
    E., Kierzek, R., Schroeder, S. J., Jiao, X., Cox,
    C., and Turner, D. H. (1998) Thermodynamic
    parameters for an expanded nearest-neighbor model
    for formation of RNA duplexes with Watson-Crick
    base pairs. Biochemistry 37, 14719-35.
  • 4. Borer, P. N., Dengler, B., Tinoco, I., Jr.,
    and Uhlenbeck, O. C. (1974) Stability of
    ribonucleic acid double-stranded helices. J Mol
    Biol 86, 843-53.
  • Thermodynamics Theory for RNA structure
    prediction
  • 5. Bundschuh, R., and Hwa, T. (1999) RNA
    secondary structure formation A solvable model
    of heteropolymer folding. PHYSICAL REVIEW LETTERS
    83, 1479-1482.
  • 6. McCaskill, J. S. (1990) The equilibrium
    partition function and base pair binding
    probabilities for RNA secondary structure.
    Biopolymers 29, 1105-19.
Write a Comment
User Comments (0)
About PowerShow.com