A mathematical model of the genetic code: structure and applications

1 / 59

About This Presentation

Title:

A mathematical model of the genetic code: structure and applications

Description:

Correlation between codon probabilities for different a.a. ... Correlations in codon usage frequencies computed over the whole exonic region ... –

Number of Views:204

Avg rating:3.0/5.0

Slides: 60

Provided by: sciarrino

Category:

more less

Transcript and Presenter's Notes

Title: A mathematical model of the genetic code: structure and applications

1
A mathematical model of the genetic code
structure and applications

Antonino Sciarrino
Università di Napoli Federico II
INFN, Sezione
di Napoli
TAG 2006 Annecy-leVieux, 9
November 2006

2
Mathematical Model of the Genetic Code
Work in collaboration with
Luc FRAPPAT Paul SORBA Diego COCURULLO
3
SUMMARY

Introduction
Description of the model
Applications Codon usage frequencies
DNA dimers free
energy
Work in progress

4
It is amazing that the complex biochemical
relations between DNA and proteins were very
quickly reduced to a mathematical model. Just
few months after the WATSON-CRICK discovery G.
GAMOW proposed the diamond code
5
Gamow diamond code
Gamow, Nature (1954)
Nucleotides are denoted by number 1,2,3,4
Amino-acids FIT the rhomb -shaped holes
formed by the 4 nucleotides
? 20 a.a. !

6
Since 1954 many mathematical modelisations of the
genetic coded have been proposed
(based on informatiom, thermodynamic,
symmetry, topology arguments)
Weak point of the models
often poor explanatory and/or predictive power
7
The genetic code
8
Crystal basis model of
the genetic code
L.Frappat, A. Sciarrino, P. Sorba Phys.Lett. A
(1998)
? 4 basis C, U/T (Pyrimidines) G, A
(Purines) are identified by a couple of
spin labels
( ? 1/2, - ? -1/2)
Mathematically - C,U/T,G,A transform as the 4
basis vectors of irrep. (1/2, 1/2) of U q ?
0 (sl(2)H ? sl(2)V)

9
Crystal basis model of
the genetic code

Dinucleotides are composite states
(? 16 basis vectors of (1/2, 1/2)?2 )
belonging to sets identified by two
integer numbers
JH JV In each set the dinucleotide
is
identified by two labels
- JH ? JH,3 ? JH
- JV ? JV,3 ? JV
Ex.
CU (,) ? (, -)
( JH 1/2, JH,3 1/2 JV 1/2, JV,3
1/2)
? Follows from property of U(q ? 0) (sl(2))

10
DINUCLEOTIDE
Representation Content

11
Crystal basis model of
the genetic code

Codons are composite states
(? 64 basis vectors of (1/2, 1/2)?? )
belonging to sets identified by half-
integer JH JV
(set ? irreducible
representation irrep.)
Ex.
CUA (,) ? (-, ) ? (-,-)
( JH 1/2, JH,3 1/2 JV 1/2, JV,3
1/2)
? Follows from property of U(q ? 0) (sl(2))

12
Codons in the crystal basis
13
Codon usage frequency

Synonymous codons are not used uniformly (codon
bias)
codon bias (not fully understood) ascribed to
evolutive-selective effects
codon bias depends
? Biological species (b.sp.)
? Sequence analysed
? Amino acid (a.a.) encoded
? Structure of the considered multiplet
? Nature of codon XYZ
? .

14
Codon usage in Homo sap.
15
Our analysis deals with global codon usage , i.e.
computed over all the coding sequences (exonic
region) for the b.sp. of the considered specimen

? To put into evidence possible general features
of the standard eukaryotic genetic code
ascribable to its organisation and its evolution

16
Let us define the codon usage probability for
the codon XZN (X,Z,N ? A,C,G,U?T in
DNA )P(XZN) limit n ? ? n XZN / N tot

n XZN number of times codon XZN
used in the processes

N tot total number of codons in the same
processes For fixed XZ Normalization ? N
P(XZN) 1

Note - Sextets are considered
quartets doublets ?

8 quartets
17
Def. - Correlation coefficient rXY for two
variables X ? P..X Y ? P..Y
18
Specimen (GenBank Release
149.0 09/2005 - Ncodons gt 100.000)

26 VERTEBRATES
28 INVERTEBRATES
38 PLANTS
TOTAL - 92 Biological species

19
Correlation coefficient VERTEBRATES
20
Correlation coefficient PLANTS
21
Correlation coefficient INVERTEBRATES
22
Averaged value of P(..N)
23
Averaged value of P(..N)
24
Averaged value of sum of two correlated P(N)
?
?
25
Ratios of ?obs2(XY) and ?th2(XY) ?obs2(X)
?obs2(Y) averaged over the 8 a.a. for the sum of
two codon probabilities
26
? Indication for correlation for codon usage
probabilities P(A) and P(C) (? P(U) and P(G))
for quartets.
27
Correlation between codon probabilities for
different a.a.

Correlation coefficients between the 28 couples
P XZN-XZN where XZ (XZ) specify 8
quartets. The following pattern comes out for the
whole eucaryotes specimen (n 92)

28
The set of 8 quartets splits into 3 subsets

4 a.a. with correlated codon usage (Ser,
Pro, Arg, Thr)
2 a.a. with correlated codon usage (Leu,
Val)
2 a.a. with generally uncorrelated codon usage
(Arg, Gly)

Statistical analysis
?
? Correlation for P(XZA)-P(XZC), XZ ? quartets
? Correlation for P(N) between Ser, Pro, Thr,
Ala and
Leu, Val

The observed correlations well fit in the
mathematical scheme of the crystal basis model
of the genetic code
30
In the crystal basis model P(XYZ) can be written
as function of
31
ASSUMPTION
32
?
SUM RULES
K INDEPENDENT OF THE b.s.
XZ ? QUARTETS
33
SUM RULES ?

Theoretical correlation matrixXZ
NC,CG,GG,CU,GU
34
Observed averaged value of the correlation
matrix , in red the theoretical value
35
(No Transcript)
36
Shannon Entropy
Let us define the Shannon entropy for the
amino-acid specified by the first two nucleotide
XZ (8 quartes)
37
Shannon Entropy
Using the previous expression for P(XZN) we get
?N ? ?(XZN), HbsN ? Hbs(XZN), PN ? P(XZN)
?
SXZ largely independent of the b.sp.
38
Shannon Entropy
39
DNA dinucleotide free energy
Free energy for a pair of nucleotides, ex. GC,
lying on one strand of DNA, coupled with
complementary pair, CG, on the other
strand. CG from 5 ? 3 correlated with GC
from 3 ? 5
40
DINUCLEOTIDE
Representation Content

41
(No Transcript)
42
SUM RULES for FREE ENERGY
43
Comparison with exp. data
?G in Kcal/mol
44
DINUCLEOTIDE Distribution
45
(No Transcript)
46
Comparison with experimental data
47
Work in progress and future perspectives
Fron the correspondence C,U/T,G,A ? I.R.
(1/2,1/2) of U q ? 0 (sl(2)H ? sl(2)V)
?
Any ordered N nucleotides sequence ? Vector of
I.R. ? (1/2,1/2)?N of U q ? 0 (sl(2)H ?
sl(2)V)
?
New pametrization of nucleotidees sequences
48

Spin parametrisation
49
Algorithm for the spin parametrisation of
orderedn-nucleotide sequence
50
From this parametrisation

Alternative construction of mutation model, where
mutation intensitydoes not depend from the
Hamming distance between the sequences, but from
the change of labels of the sets.
C. Minichini, A.S., Biosystems
(2006)
Characterization of particular sequences (exons,
introns, promoter, 5 or 3 UTR sequences,.)
L. Frappat, P. Sorba, A.S., L. Vuillon, in
progress

51
For each gene of Homo Sap. (total 28.000 genes)

Consider the N-nucleotide coding sequence (CDS)
Compute the labels JH, J3H JV, J3V
for any n-nucleotide subsequence
(1 ? n ? N)
? Plot labels versus n

52
Red JH - Green J3H Blue JV - Black J3V
53
Red JH - Green J3H Blue JV - Black J3V
54
Red JH - Green J3H Blue JV - Black J3V
55
Red JH - Green J3H Blue JV - Black J3V
56
Numerical estimator

Define for any sequence of length N

Plot number of CDS with the same value of Diff
(Sum) versus Diff (Sum) Compute Diff (Sum) for
28.000 random sequences (300 lt N lt 4300) with
uniform probability for each nucleotide Comparison
number of CDS - random sequences
57
(No Transcript)
58
(No Transcript)
59
Conclusions

Correlations in codon usage frequencies computed
over the whole exonic region fit well in the
mathematical scheme of the crystal basis model
of the genetic code Missing explanation for the
correlations
Formalism of crystal basis model useful to
parametrize free energy for DNA dimers
More generally, use of U q ? 0 (sl(2)H ? sl(2)V)
mathematical structure may be useful to describe
sequences of nucleotides .