Conserved pathways within bacteria and yeast - PowerPoint PPT Presentation

About This Presentation
Title:

Conserved pathways within bacteria and yeast

Description:

Conserved pathways within bacteria and yeast -Something about ... Find a path of length k in the combination graph with maximum ... dip.doe-mbi.ucla.edu ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 16
Provided by: songj
Category:

less

Transcript and Presenter's Notes

Title: Conserved pathways within bacteria and yeast


1
Conserved pathways within bacteria and yeast
  • ------Something about the Term project for
    Algorithmic Techniques for Biology

2
General
  • Given two protein graphs
  • Weighted protein relations
  • Combine two graphs to one relation graph
  • Find a path of length k in the combination graph
    with maximum(minimum) weight.

3
Combining two graphs
G1
G2
a
1
30
2
b
d
20
3
66
c
4
5
5
e
a1
w3
c4
w2
Combination graph
w2
d2
w1
e2
4
Combining graphs
  • Each relation become a vertex in combination
    graph.
  • Two vertices connected if

2
c
2
2
1
c
c
k
e
4
f
4
e
4
e
c2
c2
c2
W1
W2
W3
e4
e4
e4
5
What we need to do
  • Find G1 (Hpylo20040704.tab)
  • Find G2 (Scere20040704.tab)
  • Find protein relations between G1 and G2
  • Combining two graphs
  • Find the path in combination graph

6
Where to get data and software
  • Get data from http//dip.doe-mbi.ucla.edu/
  • Choose Files?SPECIES to get
    Hpylo20040704.tab and Scere20040704.tab
  • Choose Files?FASTA to get fasta20040704.seq
  • (need register to get these files)
  • Get software blast fromhttp//www.ncbi.nlm.nih.
    gov/BLAST/
  • Choose FAQs? Which BLAST program should I use?
    ?FTP location ftp//ftp.ncbi.nih.gov/blast/execut
    ables/ ? to get such as blast-2.0.10-ia32-win
    32.exe

7
The Format of Hpylo20040704.tab
  • DIP4305E DIP3048N PIRB64526 GI2313123 DIP30
    47N SWPO24853 PIRA64520 GI2313078
  • DIP4306E DIP3049N SWPO25122 PIRC64564 GI2313
    456 DIP3047N SWPO24853 PIRA64520 GI2313078
  • DIP4307E DIP3050N PIRH64618 GI2313921 DIP30
    47N SWPO24853 PIRA64520 GI2313078
  • DIP4308E DIP3051N PIRB64520 GI2313079 DIP30
    51N PIRB64520 GI2313079
  • DIP4309E DIP3052N SWPP56036 PIRH64669 GI2314
    362 DIP3051N PIRB64520 GI2313079

Edge number
Node number
8
The format of fasta20040704.seq
  • gtDIP1NswP19527pirA21762gi112046
  • KARMSSLARAELEKRIDSLMDEIAFLKKVHEEEIAELQAQIQYAQISVE
  • gtDIP2NswpirA23003gi83621
  • MKKQNLNSILLMYINYIINYFNNIHKNQLKKDWIMGYEYM
  • gtDIP3NswP06778pirA23282gi83448
  • MAFLSYFATENQQMQTRRLPRTAEGSGGFGVLLMNEIMDMDEKKPV
  • gtDIP4NswP04925pirA23544gi91067
  • MANLGYWLLALFVTMWTDVGLCKKRPKPGG

Node number
Protein sequence
9
How to get protein relation-1
  • Get protein sequence file hp.seq from
    Hpylo20040704.tab and fasta20040704.seq
  • DIP4305E DIP3048N PIRB64526 GI2313123 DIP30
    47N SWPO24853 PIRA64520 GI2313078
  • DIP4306E DIP3049N SWPO25122 PIRC64564 GI2313
    456 DIP3047N SWPO24853 PIRA64520 GI2313078
  • gtDIP3047NswO24853pirA64520gi2313078
  • MATRTQARGAVVELLYAFESGNEEIKKIASSMLEEKKIKNNQLA
  • gtDIP3048NswpirB64526gi2313123
  • MIQIYHADAFEIIKDFYQQNLKVDAIITDPPYNISVKNNFPT
  • gtDIP3049NswO25122pirC64564gi2313456
  • MKTKAPMKNIRNFSIIAHIDHGKSTLADCLISECNAISNREMKSQVMDT

10
How to get protein relation-2
  • The format of hp.seq file
  • gtDIP3047NswO24853pirA64520gi2313078
  • MATRTQARGAVVELLYAFESGNEEIKKIASSMLEEKKIKNNQLAFAL
  • gtDIP3048NswpirB64526gi2313123
  • MIQIYHADAFEIIKDFYQQNLKVDAIITDPPYNISVKNNFPTLKSAKRQG
    I
  • gtDIP3049NswO25122pirC64564gi2313456
  • MKTKAPMKNIRNFSIIAHIDHGKSTLADCLISECNA

11
How to get protein relation-3
  • Get protein sequence file database.seq from
    Scere20040704.tab and fasta20040704.seq in the
    same way with hq.seq.
  • Download blast and extract to one directory.
  • Copy hp.seq, database.seq to the same dirictory
    with blast

12
How to get protein relation-4
  • In command mode, go to the blast directory
  • Input formatdb -i database.seq -p T -o T to
    make the index file of database.seq
  • Input blastall -p blastp -d database.seq -i
    hp.seq -o relation.out to create relation file
    relation.out

13
How to get protein relation-5
  • The format of relation.out file
  • Query DIP3549NswP71408pirE64653gi2314219
  • (632 letters)
  • Database database.seq
  • 4772 sequences 2,345,789 total
    letters

  • Score E
  • Sequences producing significant alignments
    (bits) Value
  • 200_database.seq
    404 e-113
  • 276_database.seq
    397 e-111
  • 828_database.seq
    184 4e-047
  • 2039_database.seq
    182 9e-047
  • 1871_database.seq
    177 5e-045
  • 4195_database.seq
    176 1e-044

The node number of graph G1
Get node number of G2 from the sequence in
database.seq file
14
How to get protein relation-6
  • The database.seq file
  • gtDIP801NswP29539pirS46157gi626910
  • MSKDFSDKKKHTIDRIDQHILRRSQHDNYSNGSSPWMKTNLPPPSPQAH
  • gtDIP802NswP39925pirS46611gi626985
  • MMMWQRYARGAPRSLTSLSFGKASRISTVKPVLRSRMPVHQRLQTLS
  • gtDIP2883NswP33299pirS34354gi422126
  • MPPKEDWEKYKAPLEDDDKKPDDDKIVPLTEGDIQVLKSYGAAPYAAK
  • gtDIP2884NswQ03656pirS55098gi1078546
  • MGSSINYPGFVTKSAHLADTSTDASISCEEATSSQEAKKNFFQRDYNMMK
    K

200 sequence, i.e. row 400.
G2
2039 sequence, i.e. row 4078.
G1
802N
3549N
2883N
15
Summary
  • Get data Hpylo20040704.tab,Scere20040704.tab
    ,fasta20040704.seq.
  • Get sequence file hp.seq, database.seq.
  • Get protein relation file relation.out.
  • Get combining graph
  • Find maximum weighted path of length k in the
    combination graph
Write a Comment
User Comments (0)
About PowerShow.com