Title: The 7 Bridges in K
1The 7 Bridges in Königsberg and Compositional
Representation of Protein Sequences
- Bailin Hao (???)
- (ITP BGI, CAS )
- Huimin Xie (???)
- (Math Dept. Suzhou U)
- Shuyu Zhang (???)
- (IP. Acad. Sinica)
2Compositional Approach inProkaryote Phylogeny
- Justification of using K-tuples instead of
primary protein sequences. - Problem of uniqueness of reconstruction of
protein sequence from its constituent K-tuples. - Picking up a special class of proteins without
biological knowledge.
3(No Transcript)
47 bridges in Königsberg
Euler (1736) 4 odd
nodes No! Dénes König, Theory of
Finite and Infinite Graphs
1st ed.(1932). Birkhaüser (1990)
From Königsberg to Königs book, So runs the
graphic tale
5Basic Notions
- A Graph G(V, E), where V is a set of nodes
(vertices), E is a set of edges (bonds) - Edges undirected (u,v)(v,u), uand v adjacent
directed (u,v) differs from (v,u), u incident to
v. - A weight may be associated with (u,v) cost,
distance, transfer function, reaction rate, etc. - Eulerian graph each edge appears once and only
once in a path - Hamiltonian graph each vertex appears once and
only once in a path. Hamiltonian cycle of minimal
weight --- Travel Salesman Problem (TSP)
6- An Euler path
- An Euler loop
- Euler grahp ?
loop - Semi-Euler praph ?
path, no loop - Problem of Eulerian loop simple, known solutions
- Problem of Hamiltonian paths much harder
7Hamiltonian Loops much harder
- 10 nodes
- 15 arcs
- di3 ? nodes
- Traveling Salesman Problem
- NP-hard problems
No!
Yes!
8- Graph nodes arcs
- Directed , Labeled arcs and nodes
- Simple graph
- No rings at nodes
- No repeated arcs
i
i
j
9- Indegree din(i)
-
Outdegree dout(i) - Euler graph
- din(i) dout(i) ? di
?i
10Simple Euler GraphDiagonal matrix
Mdiag( d1, d2, dn )Adjacent
matrix Aaij
aij
aii
0 Kirchhoff matrix
CM-A ?Cij
? Cij0
det(C)0All minors of C are equal.
Denote this common minor by
1
n
i,j0
0
i
j
11- Number of Euler loops in simple Euler Graph
- N G de Bruijn
- T van Aardenrie Ehrenfest
- C A B Smith
- W T Tuite
- BEST Theorem
- e (G) ? ?(di-1)!
i
12- Number of Eulerian loops in general Eular G.
- some aii?0
rings - some aijgt1
parallel arcs - Putting auxiliary nodes on these rings and
parallel arcs makes the graph simple.
i
13- No need to work with bigger A matrix.
- Just let some aii?0, aijgt1 in original A.
- Eliminate redundancy caused by unlebeled arcs.
- Modified BEST Theorem
-
- e(G)
? ?(di-1)!
i
? aij!
ij
14MALS
K5
ALSL
- ANPA_PSEAM 82AA
- MALSLFTVGQLIFLFWTMRITEASPDPAAKAAPAAAAAPAAAAPDTASDA
AAAAALTAANAKAAAELTAANAAAAAAATARG
LSLF
SLFT
LFTV
FTVG
TVGQ
VGQL
AKAA
156 rings
auxiliary arc
16From pdb.seq-a special selection of
SWISSPROT2821-12820 proteins ( May 2000
)Rnumber of reconstructed AA
sequences from a given protein decomposition
17Compositional Representation of Proteins
- The collection W or W ,n j
may be used as an equivalent representation of
the original protein sequence. - A seemingly trivial result upon further
reflection random AA sequences have unique
reconstruction as well. - Compositional Representation works equally for
random AA sequences and most of protein
sequences. - A given realization of a short random AA
sequence is as specific as a real protein
sequence.
M
K
L -k1
K
j
i
j1
i1
18- Nucleotide correlations in DNA/RNA
- Much studied
- K2 correlation functions 16
9 6 - See Wentian Li, Computer Chem. 21(1997)
257-271. - Amino Acid correlations in Proteins
- Almost no study
- Hard to comprehend 400 correlation
functions at K2 - Proteins too short to define correlation
functions - One should approach the problem from a more
deterministic point of view - Repeated AA segments in proteins are strong
manifestation of correlations!
19- On-going study the other extreme
- Quit a few proteins have an enormous
- number of reconstructions.
- Transmembrane
- Antifreeze
- Fibrous collagens
- Coarse-graining closer to biology by reducing
the number of AAs
20(No Transcript)
21- Preprint
-
- NSF ITP 01 018
-
- LANL E-archive
physics/0103028 - arxiv.org or
cn.arxiv.org - Cross-referenced in q-bio since 15 Sept 2003