A mathematical model of the genetic code: structure and applications

1 / 59
About This Presentation
Title:

A mathematical model of the genetic code: structure and applications

Description:

Correlation between codon probabilities for different a.a. ... Correlations in codon usage frequencies computed over the whole exonic region ... –

Number of Views:204
Avg rating:3.0/5.0
Slides: 60
Provided by: sciarrino
Category:

less

Transcript and Presenter's Notes

Title: A mathematical model of the genetic code: structure and applications


1
A mathematical model of the genetic code
structure and applications
  • Antonino Sciarrino
  • Università di Napoli Federico II
    INFN, Sezione
    di Napoli
  • TAG 2006 Annecy-leVieux, 9
    November 2006

2
Mathematical Model of the Genetic Code
Work in collaboration with
Luc FRAPPAT Paul SORBA Diego COCURULLO
3
SUMMARY
  • Introduction
  • Description of the model
  • Applications Codon usage frequencies
  • DNA dimers free
    energy
  • Work in progress

4
It is amazing that the complex biochemical
relations between DNA and proteins were very
quickly reduced to a mathematical model. Just
few months after the WATSON-CRICK discovery G.
GAMOW proposed the diamond code
5
Gamow diamond code
Gamow, Nature (1954)
Nucleotides are denoted by number 1,2,3,4
Amino-acids FIT the rhomb -shaped holes
formed by the 4 nucleotides
? 20 a.a. !

6
Since 1954 many mathematical modelisations of the
genetic coded have been proposed
(based on informatiom, thermodynamic,
symmetry, topology arguments)
Weak point of the models
often poor explanatory and/or predictive power
7
The genetic code
8
Crystal basis model of
the genetic code
L.Frappat, A. Sciarrino, P. Sorba Phys.Lett. A
(1998)
? 4 basis C, U/T (Pyrimidines) G, A
(Purines) are identified by a couple of
spin labels
( ? 1/2, - ? -1/2)
Mathematically - C,U/T,G,A transform as the 4
basis vectors of irrep. (1/2, 1/2) of U q ?
0 (sl(2)H ? sl(2)V)

9
Crystal basis model of
the genetic code
  • Dinucleotides are composite states
  • (? 16 basis vectors of (1/2, 1/2)?2 )
  • belonging to sets identified by two
    integer numbers
  • JH JV In each set the dinucleotide
    is
  • identified by two labels
  • - JH ? JH,3 ? JH
    - JV ? JV,3 ? JV
  • Ex.
  • CU (,) ? (, -)
  • ( JH 1/2, JH,3 1/2 JV 1/2, JV,3
    1/2)
  • ? Follows from property of U(q ? 0) (sl(2))

10
DINUCLEOTIDE
Representation Content


11
Crystal basis model of
the genetic code
  • Codons are composite states
  • (? 64 basis vectors of (1/2, 1/2)?? )
  • belonging to sets identified by half-
    integer JH JV
  • (set ? irreducible
    representation irrep.)
  • Ex.
  • CUA (,) ? (-, ) ? (-,-)
  • ( JH 1/2, JH,3 1/2 JV 1/2, JV,3
    1/2)
  • ? Follows from property of U(q ? 0) (sl(2))

12
Codons in the crystal basis
13
Codon usage frequency
  • Synonymous codons are not used uniformly (codon
    bias)
  • codon bias (not fully understood) ascribed to
    evolutive-selective effects
  • codon bias depends
  • ? Biological species (b.sp.)
  • ? Sequence analysed
  • ? Amino acid (a.a.) encoded
  • ? Structure of the considered multiplet
  • ? Nature of codon XYZ
  • ? .

14
Codon usage in Homo sap.
15
Our analysis deals with global codon usage , i.e.
computed over all the coding sequences (exonic
region) for the b.sp. of the considered specimen

? To put into evidence possible general features
of the standard eukaryotic genetic code
ascribable to its organisation and its evolution

16
Let us define the codon usage probability for
the codon XZN (X,Z,N ? A,C,G,U?T in
DNA )P(XZN) limit n ? ? n XZN / N tot

n XZN number of times codon XZN
used in the processes

N tot total number of codons in the same
processes For fixed XZ Normalization ? N
P(XZN) 1

Note - Sextets are considered
quartets doublets ?

8 quartets
17
Def. - Correlation coefficient rXY for two
variables X ? P..X Y ? P..Y
18
Specimen (GenBank Release
149.0 09/2005 - Ncodons gt 100.000)
  • 26 VERTEBRATES
  • 28 INVERTEBRATES
  • 38 PLANTS
  • TOTAL - 92 Biological species

19
Correlation coefficient VERTEBRATES
20
Correlation coefficient PLANTS
21
Correlation coefficient INVERTEBRATES
22
Averaged value of P(..N)
23
Averaged value of P(..N)
24
Averaged value of sum of two correlated P(N)
?
?
25
Ratios of ?obs2(XY) and ?th2(XY) ?obs2(X)
?obs2(Y) averaged over the 8 a.a. for the sum of
two codon probabilities
26
? Indication for correlation for codon usage
probabilities P(A) and P(C) (? P(U) and P(G))
for quartets.
27
Correlation between codon probabilities for
different a.a.
  • Correlation coefficients between the 28 couples
    P XZN-XZN where XZ (XZ) specify 8
    quartets. The following pattern comes out for the
    whole eucaryotes specimen (n 92)

28
The set of 8 quartets splits into 3 subsets
  • 4 a.a. with correlated codon usage (Ser,
    Pro, Arg, Thr)
  • 2 a.a. with correlated codon usage (Leu,
    Val)
  • 2 a.a. with generally uncorrelated codon usage
    (Arg, Gly)

29
  • Statistical analysis
  • ?
  • ? Correlation for P(XZA)-P(XZC), XZ ? quartets
  • ? Correlation for P(N) between Ser, Pro, Thr,
    Ala and
  • Leu, Val

The observed correlations well fit in the
mathematical scheme of the crystal basis model
of the genetic code
30
In the crystal basis model P(XYZ) can be written
as function of
31
ASSUMPTION
32
?
SUM RULES
K INDEPENDENT OF THE b.s.
XZ ? QUARTETS
33
SUM RULES ?

Theoretical correlation matrixXZ
NC,CG,GG,CU,GU
34
Observed averaged value of the correlation
matrix , in red the theoretical value
35
(No Transcript)
36
Shannon Entropy
Let us define the Shannon entropy for the
amino-acid specified by the first two nucleotide
XZ (8 quartes)
37
Shannon Entropy
Using the previous expression for P(XZN) we get
?N ? ?(XZN), HbsN ? Hbs(XZN), PN ? P(XZN)
?
SXZ largely independent of the b.sp.
38
Shannon Entropy
39
DNA dinucleotide free energy
Free energy for a pair of nucleotides, ex. GC,
lying on one strand of DNA, coupled with
complementary pair, CG, on the other
strand. CG from 5 ? 3 correlated with GC
from 3 ? 5
40
DINUCLEOTIDE
Representation Content


41
(No Transcript)
42
SUM RULES for FREE ENERGY
43
Comparison with exp. data
?G in Kcal/mol
44
DINUCLEOTIDE Distribution
45
(No Transcript)
46
Comparison with experimental data
47
Work in progress and future perspectives
Fron the correspondence C,U/T,G,A ? I.R.
(1/2,1/2) of U q ? 0 (sl(2)H ? sl(2)V)
?
Any ordered N nucleotides sequence ? Vector of
I.R. ? (1/2,1/2)?N of U q ? 0 (sl(2)H ?
sl(2)V)
?
New pametrization of nucleotidees sequences
48

Spin parametrisation
49
Algorithm for the spin parametrisation of
orderedn-nucleotide sequence
50
From this parametrisation
  • Alternative construction of mutation model, where
    mutation intensitydoes not depend from the
    Hamming distance between the sequences, but from
    the change of labels of the sets.
    C. Minichini, A.S., Biosystems
    (2006)
  • Characterization of particular sequences (exons,
    introns, promoter, 5 or 3 UTR sequences,.)
  • L. Frappat, P. Sorba, A.S., L. Vuillon, in
    progress

51
For each gene of Homo Sap. (total 28.000 genes)
  • Consider the N-nucleotide coding sequence (CDS)
  • Compute the labels JH, J3H JV, J3V
  • for any n-nucleotide subsequence
    (1 ? n ? N)
  • ? Plot labels versus n

52
Red JH - Green J3H Blue JV - Black J3V
53
Red JH - Green J3H Blue JV - Black J3V
54
Red JH - Green J3H Blue JV - Black J3V
55
Red JH - Green J3H Blue JV - Black J3V
56
Numerical estimator
  • Define for any sequence of length N

Plot number of CDS with the same value of Diff
(Sum) versus Diff (Sum) Compute Diff (Sum) for
28.000 random sequences (300 lt N lt 4300) with
uniform probability for each nucleotide Comparison
number of CDS - random sequences
57
(No Transcript)
58
(No Transcript)
59
Conclusions
  • Correlations in codon usage frequencies computed
    over the whole exonic region fit well in the
    mathematical scheme of the crystal basis model
    of the genetic code Missing explanation for the
    correlations
  • Formalism of crystal basis model useful to
    parametrize free energy for DNA dimers
  • More generally, use of U q ? 0 (sl(2)H ? sl(2)V)
    mathematical structure may be useful to describe
    sequences of nucleotides .
Write a Comment
User Comments (0)
About PowerShow.com