Nothing in computational biology makes - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Nothing in computational biology makes

Description:

Only a small fraction of amino acid residues is directly. involved in protein function (including enzymatic) ... T, Weng S, Cherry JM, Botstein D. 1998. ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 44
Provided by: Koo
Category:

less

Transcript and Presenter's Notes

Title: Nothing in computational biology makes


1
Using (and abusing) sequence analysis to make
biological discoveries
Nothing in (computational) biology makes sense
except in the light of evolution
after Theodosius Dobzhansky (1970)
2
Significant sequence similarity is evidence of
homology
Only a small fraction of amino acid residues is
directly involved in protein function (including
enzymatic) the rest of the protein serves
largely as structural scaffold
Conserved sequence motifs are determinants
of conserved ancestral functions
3
The evolving roles of computational analysis in
biology
4
(No Transcript)
5
Sequence complexity Measure of the randomness of
a sequence Random sequence - highest complexity
(entropy) - globular protein domains Homopolymer
- lowest complexity (entropy) - non-globular
structures
Algorithmic complexity QQQQQQQQQQQQQ
(Q)n KRKRKRKRKRKR (KR)n ASDFGHKLCVNM - random
sequence - no algorithm to derive from a simpler
one
6
seg BRCA1 45 3.4 3.7 gt BRCA1.seg
gtgi728984spP38398BRC1_HUMAN Breast cancer
type 1 susceptibility protein
1-388 MDLSALRVEEVQNVINAMQKILECPICL
EL
IKEPVSTKCDHIFCKFCMLKLLNQKKGPSQ
CPLCKNDITKRSLQESTRFSQLVEELLK
II
CAFQLDTGLEYANSYNFAKKENNSPEHLKD
EVSIIQSMGYRNRAKRLLQSEPENPSLQ
ET
SLSVQLSNLGTVRTLRTKQRIQPQKTSVYI
ELGSDSSEDTVNKATYCSVGDQELLQIT
PQ
GTRDEISLDSAKKAACEFSETDVTNTEHHQ
PSNNDLNTTEKRAAERHPEKYQGSSVSN
LH
VEPCGTNTHASSLQHENSSLLLTKDRMNVE
KAEFCNKSKQPGLARSQHNRWAGSKETC
ND
RRTPSTEKKVDLNADPLCERKEWNKQKLPC
SENPRDTEDVPWITLNSSIQKVNEWFSR
sdellgsddshdgesesnakvadvldvlne
389-458 vdeysgssekidllasdphealickservh
sksvesnied
459-526 KIFGKTYRKKASLPNLSHVTENLIIGAFVT

EPQIIQERPLTNKLKRKRRPTSGLHPEDFI
KKADLAVQ ktpeminqgtnqteqngqv
mnitnsghenk 527-635 tkgdsiqneknpnpieslekesafktkae
p isssisnmelelnihnskapkknrlrrkss
trhihalelvvsrnlsppn
636-995 CTELQIDSCSSSEEIKKKKYNQMPVRHSRN

LQLMEGKEPATGAKKSNKPNEQTSKRHDSD
TFPELKLTNAPGSFTKCSNTSELKEFVN
PS
LPREEKEEKLETVKVSNNAEDPKDLMLSGE
RVLQTERSVESSSISLVPGTDYGTQESI
SL
LEVSTLGKAKTEPNKCVSQCAAFENPKGLI
HGCSKDNRNDTEGFKYPLGHEVNHSRET
SI
EMEESELDAQYLQNTFKVSKRQSFAPFSNP
GNAEEECATFSAHSGSLKKQSPKVTFEC
EQ
KEENQGKNESNIKPVQTVNITAGFPVVGQK
DKPVDNAKCSIKGGSRFCLSSQFRGNET
GL
ITPNKHGLLQNPYRIPPLFPIKSFVKTKCK knlleenfeehsmsperem
gnenipstvst 996-1089 isrnnirenvfkeasssninevgsstne
vg ssineigssdeniqaelgrnrgpklnamlr
lgvl
1090-1238 QPEVYKQSLPGSNCKHPEIKKQEYEEVVQT

VNTDFSPYLISDNLEQPMGSSHASQVCSET
PDDLLDDGEIKEDTSFAENDIKESSAVF
SK
SVQKGELSRSPSPFTHTHLAQGYRRGAKKL
ESSEENLSSEDEELPCFQHLLFGKVNNI
P sqstrhstvateclsknteenllslknsln
1239-1312 dcsnqvilakasqehhlseetkcsaslfss
qcseledltantnt
1313-1316 QDPF
Non-globular regions
Globular domains
7
1422-1513 GSQPSNSYPSIISDSSALEDLRNPEQSTSE

KAVLTSQKSSEYPISQNPEGLSADKFEVSA
DSSTSKNKEPGVERSSPSKCPSLDDRWY
MH
SC sgslqnrnypsqeelikvvdveeqqleesg
1514-1616 phdltetsylprqdlegtpylesgislfsd dpesdpsed
rapesarvgnipsstsalkvp
qlkvaesaqspaa
1617-1863 AHTTDTAGYNAMEESVSREKPELTASTERV

NKRMSMVVSGLTPEEFMLVYKFARKHHITL
TNLITEETTHVVMKTDAEFVCERTLKYF
LG
IAGGKWVVSYFWVTQSIKERKMLNEHDFEV
RGDVVNGRNHQGPKRARESQDRKIFRGL
EI
CCYGPFTNMPTDQLEWMVQLCGASVVKELS
SFTLGTGVHPIVVVQPDAWTEDNGFHAI
GQ
MCEAPVVTREWVLDSVALYQCQELDTYLIP
QIPHSHY
8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
1422-1513 GSQPSNSYPSIISDSSALEDLRNPEQSTSE

KAVLTSQKSSEYPISQNPEGLSADKFEVSA
DSSTSKNKEPGVERSSPSKCPSLDDRWY
MH
SC sgslqnrnypsqeelikvvdveeqqleesg
1514-1616 phdltetsylprqdlegtpylesgislfsd dpesdpsed
rapesarvgnipsstsalkvp
qlkvaesaqspaa
1617-1863 AHTTDTAGYNAMEESVSREKPELTASTERV

NKRMSMVVSGLTPEEFMLVYKFARKHHITL
TNLITEETTHVVMKTDAEFVCERTLKYF
LG
IAGGKWVVSYFWVTQSIKERKMLNEHDFEV
RGDVVNGRNHQGPKRARESQDRKIFRGL
EI
CCYGPFTNMPTDQLEWMVQLCGASVVKELS
SFTLGTGVHPIVVVQPDAWTEDNGFHAI
GQ
MCEAPVVTREWVLDSVALYQCQELDTYLIP
QIPHSHY
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
Paradigm shift in database searching
Traditional
PSI-BLAST
Set of homologs
Query sequence
Sequence database
PSSM
RPS-BLAST
New
Query sequence
Domain architecture
PSSM database
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
DOMAIN ARCHITECTURE OF SELECTED BRCT PROTEINS
BRCT
RING
BRCA1
BARD1
PHD-l
BRCA1/BARD homolog plant
CMP-trans
REV1 yeast
DPB11 yeast
AZF
PARP vertebrates
PARP
DNA ligase III
ATP-dep ligase
human
HhH
TdT eukaryotes
polX
RFC1
eukaryotes
ATP and PCNA-binding

DNA ligase bacteria
NAD-dep ligase
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
Use of profile libraries to examine domain
representation in individual proteomes
yeast
6,200
Detect domains using PSI-BLAST, IMPALA
Compare domain distributions
Profile library
worm
20,000
Chervitz SA, Aravind L, Sherlock G, Ball CA,
Koonin EV, Dwight SS, Harris MA, Dolinski K, Mohr
S, Smith T, Weng S, Cherry JM, Botstein D. 1998.
Comparison of the complete protein sets of worm
and yeast orthology and divergence. Science
282 2022-8
42
Normalized domain counts in worm and yeast
1.Hormone receptor 2.POZ 3.EGF 4.MATH
5.PTPase 6.Cation Channels 7.PDZ 8.SH2
9.FNIII 10.Homeodomain 11.LRR 12.EF hands
13.Ankyrin 14.RING finger 15.C2H2 finger
16.small GTPase 17.RRM 18.AAA 19.C6 finger
43
  • Searching a domain library is often easier and
    more informative
  • than searching the entire sequence database.
    However, the latter
  • yields complementary information and should not
    be skipped
  • if details are of interest.
  • Varying the search parameters, e.g. switching
    composition-based statistics
  • on and off, can make a difference.
  • Using subsequences, preferably chosen according
    to objective criteria,
  • e.g. separation from the rest of the protein by a
    low-complexity linker,
  • may improve search performance.
  • Trying different queries is a must when analyzing
    protein (super)families.
  • Even hits below the threshold of statistical
    significance often are worth
  • analyzing, albeit with extreme care. Transferring
    functional information
  • between homologs on the basis of a database
    description alone is dangerous.
  • Conservation of domain architectures, active
    sites and other features
  • needs to be analyzed (hence automated
    identification of protein families is
  • difficult and automated prediction of functions
    is extremely error-prone).
  • Always do a reality check!
Write a Comment
User Comments (0)
About PowerShow.com