The 7 Bridges in K - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

The 7 Bridges in K

Description:

Title: The 7 Bridges in Konigsberg and Compositional Representation of Protein Sequences Author: aaa Last modified by: Hao Bailin Created Date – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 22
Provided by: aaa47
Category:

less

Transcript and Presenter's Notes

Title: The 7 Bridges in K


1
The 7 Bridges in Königsberg and Compositional
Representation of Protein Sequences
  • Bailin Hao (???)
  • (ITP BGI, CAS )
  • Huimin Xie (???)
  • (Math Dept. Suzhou U)
  • Shuyu Zhang (???)
  • (IP. Acad. Sinica)

2
Compositional Approach inProkaryote Phylogeny
  • Justification of using K-tuples instead of
    primary protein sequences.
  • Problem of uniqueness of reconstruction of
    protein sequence from its constituent K-tuples.
  • Picking up a special class of proteins without
    biological knowledge.

3
(No Transcript)
4
7 bridges in Königsberg
Euler (1736) 4 odd
nodes No! Dénes König, Theory of
Finite and Infinite Graphs
1st ed.(1932). Birkhaüser (1990)
From Königsberg to Königs book, So runs the
graphic tale
5
Basic Notions
  • A Graph G(V, E), where V is a set of nodes
    (vertices), E is a set of edges (bonds)
  • Edges undirected (u,v)(v,u), uand v adjacent
    directed (u,v) differs from (v,u), u incident to
    v.
  • A weight may be associated with (u,v) cost,
    distance, transfer function, reaction rate, etc.
  • Eulerian graph each edge appears once and only
    once in a path
  • Hamiltonian graph each vertex appears once and
    only once in a path. Hamiltonian cycle of minimal
    weight --- Travel Salesman Problem (TSP)

6
  • An Euler path
  • An Euler loop
  • Euler grahp ?
    loop
  • Semi-Euler praph ?
    path, no loop
  • Problem of Eulerian loop simple, known solutions
  • Problem of Hamiltonian paths much harder

7
Hamiltonian Loops much harder
  • 10 nodes
  • 15 arcs
  • di3 ? nodes
  • Traveling Salesman Problem
  • NP-hard problems

No!
Yes!
8
  • Graph nodes arcs
  • Directed , Labeled arcs and nodes
  • Simple graph
  • No rings at nodes
  • No repeated arcs

i
i
j
9
  • Indegree din(i)

  • Outdegree dout(i)
  • Euler graph
  • din(i) dout(i) ? di
    ?i

10
Simple Euler GraphDiagonal matrix
Mdiag( d1, d2, dn )Adjacent
matrix Aaij
aij
aii
0 Kirchhoff matrix
CM-A ?Cij
? Cij0
det(C)0All minors of C are equal.
Denote this common minor by
1
n
i,j0
0
i
j
11
  • Number of Euler loops in simple Euler Graph
  • N G de Bruijn
  • T van Aardenrie Ehrenfest
  • C A B Smith
  • W T Tuite
  • BEST Theorem
  • e (G) ? ?(di-1)!

i
12
  • Number of Eulerian loops in general Eular G.
  • some aii?0
    rings
  • some aijgt1
    parallel arcs
  • Putting auxiliary nodes on these rings and
    parallel arcs makes the graph simple.

i
13
  • No need to work with bigger A matrix.
  • Just let some aii?0, aijgt1 in original A.
  • Eliminate redundancy caused by unlebeled arcs.
  • Modified BEST Theorem
  • e(G)

? ?(di-1)!
i
? aij!
ij
14
MALS
K5
ALSL
  • ANPA_PSEAM 82AA
  • MALSLFTVGQLIFLFWTMRITEASPDPAAKAAPAAAAAPAAAAPDTASDA
    AAAAALTAANAKAAAELTAANAAAAAAATARG

LSLF
SLFT
LFTV
FTVG
TVGQ
VGQL
AKAA
15
6 rings
auxiliary arc
16
From pdb.seq-a special selection of
SWISSPROT2821-12820 proteins ( May 2000
)Rnumber of reconstructed AA
sequences from a given protein decomposition
17
Compositional Representation of Proteins
  • The collection W or W ,n j
    may be used as an equivalent representation of
    the original protein sequence.
  • A seemingly trivial result upon further
    reflection random AA sequences have unique
    reconstruction as well.
  • Compositional Representation works equally for
    random AA sequences and most of protein
    sequences.
  • A given realization of a short random AA
    sequence is as specific as a real protein
    sequence.

M
K
L -k1
K
j
i
j1
i1
18
  • Nucleotide correlations in DNA/RNA
  • Much studied
  • K2 correlation functions 16
    9 6
  • See Wentian Li, Computer Chem. 21(1997)
    257-271.
  • Amino Acid correlations in Proteins
  • Almost no study
  • Hard to comprehend 400 correlation
    functions at K2
  • Proteins too short to define correlation
    functions
  • One should approach the problem from a more
    deterministic point of view
  • Repeated AA segments in proteins are strong
    manifestation of correlations!

19
  • On-going study the other extreme
  • Quit a few proteins have an enormous
  • number of reconstructions.
  • Transmembrane
  • Antifreeze
  • Fibrous collagens
  • Coarse-graining closer to biology by reducing
    the number of AAs

20
(No Transcript)
21
  • Preprint
  • NSF ITP 01 018
  • LANL E-archive
    physics/0103028
  • arxiv.org or
    cn.arxiv.org
  • Cross-referenced in q-bio since 15 Sept 2003
Write a Comment
User Comments (0)
About PowerShow.com