Identification of specificity-determining positions in protein alignments - PowerPoint PPT Presentation

About This Presentation
Title:

Identification of specificity-determining positions in protein alignments

Description:

Large protein families with general function assigned by homology, ... Alexandra B. Rakhmaninova. Dmitry Rodionov. Olga Laikova. Howard Hughes Medical Institute ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 41
Provided by: rtcb
Category:

less

Transcript and Presenter's Notes

Title: Identification of specificity-determining positions in protein alignments


1
Identification of specificity-determining
positions in protein alignments
  • Mikhail Gelfand
  • Research and Training Center Bioinformatics
  • Institute for Information Transmission Problems,
    RAS
  • ECCB2005, Madrid

2
Motivation
  • Large protein families with general function
    assigned by homology, not much functional
    information
  • Much less structural data. Not many structures
    with substrates, cofactors etc.
  • Some specificity assignments from comparative
    genomics
  • gt
  • Search for specificity-determining positions in
    alignments
  • identification of functional sites
  • prediction of specificity
  • understanding and eventually re-design of function

3
Specificity (of transporters) from comparative
genomics three examples. 1. New specificities
in a little studied family
S-box (rectangle frame)MetJ (circle
frame)LYS-element (circles)Tyr-T-box
(rectangles)
malate/lactate
4
2. Misleading homology The PnuC family of
transporters
The THI elements
The RFN elements
5
3. A nightmare. The NiCoT family of nickel-cobalt
transporters
6
SDP (Specificity-Determining Position)
  • Alignment position that is conserved within
  • groups of proteins having the same specificity
  • (specificity groups) but differs between them

SDP is not equivalent to a functionally important
position
7
Measure of specificity mutual information
  • count of amino acid a in group i at position p
    divided by the total number of sequences
  • frequency of amino acid a in position p
  • fraction of proteins in group i

8
Taking into account the structure of the
phylogenetic tree random shuffling and linear
regression
linear regression
? min
  • Z-score

gt positions that are more specific than expected
given the tree
9
Smoothing pseudocounts and similarity between
amino acid residues
  • m(a?b) amino acid substitution matrix
  • n(a,i) count of amino acid a at position i

10
Automated threshold setting the Bernoulli
estimator
  • Are 5 SDP with Z-score gt 12 better than 10 SDP
    with Z-score gt 9?

?
11
Other similar techniques
  • Evolutionary trace (Lichtarge et al. 1996, 1997)
    need structure gradual construction of
    group-specific consensus
  • Evolutionary rate shifts (DIVERGE, Gu et al.
    2002) positions with group-specific
    evolutionary rate
  • Surface patches of slowly evolving residues
    (Rate4Site, Pupko et al. 2002) need structure
  • PCA in the sequence space (Casari et al., 1995)
  • Correlated mutations (Pazos and Valencia, 2002)
  • Prediction of functional sub-types (Hannenhalli
    and Russell, 2000) relative entropy of HMM
    profiles for groups

12
SDPpred Web interface
Input multiple alignment of proteins divided
into specificity groups
AQP spQ9L772AQPZ_BRUME ----------------
---------------------mlnklsaeffgtfwlvfggcgsa ilaa-
-afp-------elgigflgvalafgltvltmayavggisg--ghfnpavs
lgltv iiilgsts------------------------------slap--
---------------- qlwlfwvaplvgavigaiiwkgllgrd------
--------------------------- ------ spP48838AQPZ
_ECOLI -------------------------------------mfrkla
aecfgtfwlvfggcgsa vlaa--gfp-------elgigfagvalafglt
vltmafavghisg--ghfnpavtiglwa lvihgatd-------------
-----------------kfap------------------ qlwffwvvpi
vggiiggliyrtllekrd--------------------------------
------ trQ92ZW9 -------------------------------
------mfkklcaeflgtcwlvlggcgsa vlas--afp-------qvgi
gllgvsfafgltvltmaytvggisg--ghfnpavslglav iiilgsth-
-----------------------------rrvp-----------------
- qlwlfwiaplfgaaiagivwksvgeefrpvd-----------------
------------ ------ GLP spP11244GLPF_EC
OLI ----------------------------msqt---stlkgqciaef
lgtglliffgvgcv aalkvag---------a-sfgqweisviwglgvam
aiyltagvsg--ahlnpavtialwl glilaltd----------------
--------------dgn--------------g-vpr -flvplfgpivga
ivgafayrkligrhlpcdicvveek--etttpseqkasl-------- --
---- spP44826GLPF_HAEIN -----------------------
-----mdks-----lkancigeflgtalliffgvgcv
13
SDPpred Output
Alignment of the family with the SDPs
highlighted (Alignment view)
Detailed description of each SDP (List of SDPs)
Plot of probabilities used by the Bernoulli
estimator to set the cutoff (Probability plot
view)
14
Transcription factors from the LacI family
  • Training set 459 sequences,
  • average length 338 amino acids,
  • 85 specificity groups

44 SDPs
10 residues contact NPF (analog of the effector)
7 residues in the effector contact zone
(5?ltdminlt10?)
6 residues in the intersubunit contacts
5 residues in the intersubunit contact zone
(5?ltdminlt10?)
7 residues contact the operator sequence
6 residues in the operator contact zone
(5?ltdminlt10?)
LacI from E.coli
15
SDP clusters at the subunit contact region
Cluster I
Effector
Cluster II
DNA operator
LacI (lactose repressor) from E.coli (1jwl)
16
Overall statistics (LacI of E. coli)
Non-contacting residues (distance to the DNA,
effector, or the other subunit gt10?)
  • Total 348 amino acids
  • 44 SDP

Contact zone (may be functional)
Contacting residues (distance to the DNA,
effector, or the other subunit lt5?)
17
Membrane channels of the MIP family
  • Training set 17 sequences,
  • average length 280 amino acids,
  • 2 specificity groups
  • Aquaporines glyceroaquaporines

21 SDPs
8 residues contact glycerol (substrate) (dminlt5?)
8 residues oriented to the channel
5 residues in the contacts with other subunits
GlpF from E.coli
18
Two SDP clusters at the contact of subunits
forming the tetramer
Cluster II
Cluster I
20Leu, 24Ile, 108Tyr of one subunit, 193Ser of
another subunit
Glu43
Substrate (glycerol)
Subunit I
Glpf (glycerol facilitator) from E. coli (1fx8)
19
Overall statistics (GlpF from E.coli)
Non-contacting residues (distance to the
substrate, or another subunit gt10?)
  • Total 281 amino acids
  • 21 SDP

Contact zone (may be functional)
Contacting residues (distance to the substrate,
or another subunit lt5?)
20
isocitrate/isopropylmalate dehydrogenases
combinations of specificities towards substrate
and cofactor
  • IDH catalyzes the oxidation of isocitrate to
    a-ketoglutorate and CO2 (TCA) using either NAD or
    NADP as a cofactor in organisms from prokaryotes
    to higher eukaryotes
  • IMDH catalyzes oxidative decarboxylation of
    3-isopropylmalate into 2-oxo-4-methylvalerate
    (leucine biosynthesis) in prokaryotes and fungi,
    the cofactor is NAD

Eukaryota
Archaea Bacteria Eukaryota
Mitochondria
Archaea Bacteria
21
Selecting specificity groups
1. By substrate all IDHs vs. all IMDHs
2. By cofactor all NAD-dependent vs. all
NADP-dependent
3. Four groups
IDH (NADP) type II
IDH (NADP) type II
IDH (NADP) type II
IDH (NAD)
IDH (NAD)
IDH (NAD)
IMDH (NAD)
IMDH (NAD)
IMDH (NAD)
IDH (NADP) type I
IDH (NADP) type I
IDH (NADP) type I
22
Predicted SDPs
most SDPs near the substrate
SDPs near the substrate and the cofactor
SDPs near the substrate, the cofactor and the
other subunit
23
SDPs, the cofactor and the substrate
Substrate (isocitrate)
100Lys, 104Thr, 105Thr, 107Val, 337Ala,
341Thr substrate-specific and four group SDPs,
functionally not characterized
Cofactor (NADP)
Nicotinamide nucleotide
Adenine nucleotide
344Lys, 345Tyr, 351Val cofactor-specific
SDPs, known determinants of specificity to
cofactor
NADP-dependent IDH from E. coli (1ai2)
24
SDPs predicted for different groupings
substrate-specific SDPs
cofactor-specific SDPs
208Arg
337Ala
100Lys
300Ala
105Thr
341Thr
229His
154Glu
103Leu
233Ile
97Val
158Asp
115Asn
305Asn
308Tyr
98Ala
155Asn
231Gly
327Asn
287Gln
344Lys
164Glu
345Tyr
351Val
241Phe
38Gly
40Asp
104Thr
Color code Contacts cofactor Contacts substrate
AND cofactor Contacts substrate Contacts
substrate AND the other subunit Contacts the
other subunit
107Val
152Phe
161Ala
232Asn
245Gly
323Ala
31Tyr
36Gly
162Gly
Four groups
45Met
25
Overview
  • Transcription factors contacts with the cofactor
    and the DNA
  • Transporters contacts with the substrate
  • Enzymes contacts with the substrate and the
    cofactor
  • And all
  • contacts between subunits

26
Protein-DNA interactions
Entropy at aligned sites (blue plots) and the
number of contacts (red heavy atoms in a base
pair at a distance ltcutoff from a protein atom)
CRP
PurR
IHF
TrpR
27
The observed correlation does not depend on the
distance cutoff
28
CRP/FNR family of regulators
29
Correlation between contacting nucleotides and
amino acid residues
  • CooA in Desulfovibrio spp.
  • CRP in Gamma-proteobacteria
  • HcpR in Desulfovibrio spp.
  • FNR in Gamma-proteobacteria

Contacting residues REnnnR TG 1st arginine GA
glutamate and 2nd arginine
DD COOA ALTTEQLSLHMGATRQTVSTLLNNLVR DV COOA
ELTMEQLAGLVGTTRQTASTLLNDMIR EC CRP
KITRQEIGQIVGCSRETVGRILKMLED YP CRP
KXTRQEIGQIVGCSRETVGRILKMLED VC CRP
KITRQEIGQIVGCSRETVGRILKMLEE DD HCPR
DVSKSLLAGVLGTARETLSRALAKLVE DV HCPR
DVTKGLLAGLLGTARETLSRCLSRMVE EC FNR
TMTRGDIGNYLGLTVETISRLLGRFQK YP FNR
TMTRGDIGNYLGLTVETISRLLGRFQK VC FNR
TMTRGDIGNYLGLTVETISRLLGRFQK
TGTCGGCnnGCCGACA
TTGTGAnnnnnnTCACAA
TTGTgAnnnnnnTcACAA
TTGATnnnnATCAA
30
The correlation holds for other factors in the
family
31
Plans and perspectives. Protein-DNA interactions
LacI family of transcriptional regulators (each
branch represents a subfamily)
32
and their signals
1605 regulators from 189 genomes, forming 302
groups of orthologs and binding 2518 sites
33
Plans and perspectives. Experimental verification
  • A new family of Ni/Co transporters
  • No structural data
  • Specificity predicted by comparative genomics
  • Predicted SDPs form several clusters in the
    alignment, are located on the same sides of
    alpha-helices
  • Mutational analysis

34
Terminators of translation in prokaryotes /
decoding of stop-codons. Specificity of RF1
(UAG, UAA) and RF2 (UGA, UAA)
Fragment of the alignment (117 pairs). SDPs are
shown by black boxes above the alignment.
35
Interesting positions invariant, SDPs,
variable rate.
36
SDPs and invariant positionstwo decoding sites?
37
Plans and perspectives
  • Use of 3D structures, when available.
    Identification of functional sites as spatial
    clusters of SDPs and conserved positions
  • Automated identification of specificity groups
    based on the analysis of the phylogenetic tree
  • Protein-DNA interactions
  • Identification of protein-protein contact surfaces

38
Publications
  • N.J.Oparina, O.V.Kalinina, M.S.Gelfand,
    L.L.Kisselev (2005) Common and specific amino
    acid residues in the prokaryotic polypeptide
    release factors RF1 and RF2 possible functional
    implications. Nucleic Acids Research 33 (in
    press).
  • O.V.Kalinina, A.A.Mironov, M.S.Gelfand,
    A.B.Rakhmaninova (2004) Automated selection of
    positions determining functional specificity of
    proteins by comparative analysis of orthologous
    groups in protein families. Protein Science 13
    443-456.
  • O.V.Kalinina, P.S.Novichkov, A.A.Mironov,
    M.S.Gelfand, A.B.Rakhmaninova (2004) SDPpred a
    tool for prediction of amino acid residues that
    determine differences in functional specificity
    of homologous proteins. Nucleic Acids Research
    32 W424-W428.
  • O.V.Kalinina, M.S.Gelfand, A.A.Mironov,
    A.B.Rakhmaninova (2003) Amino acid residues
    forming specific contacts between subunits in
    tetramers of the membrane channel GlpF.
    Biophysics (Moscow) 48 S141-S145.
  • L.A.Mirny, M.S.Gelfand (2002) Using orthologous
    and paralogous proteins to identify specificity
    determining residues in bacterial transcription
    factors. Journal of Molecular Biology 321 7-20.
  • L.Mirny, M.S.Gelfand (2002) Structural analysis
    of conserved base-pairs in protein-DNA complexes.
    Nucleic Acids Research 30 1704-1711.
  • http//math.belozersky.msu.ru/psn/

39
Acknowledgements
  • Leonid Mirny (Harvard, MIT)
  • Olga Kalinina
  • Andrei A. Mironov
  • Alexandra B. Rakhmaninova
  • Dmitry Rodionov
  • Olga Laikova
  • Howard Hughes Medical Institute
  • Ludwig Institute of Cancer Research
  • Russian Fund of Basic Research
  • Russian Academy of Sciences, programs Molecular
    and Cellular Biologyand Origin and Evolution
    of the Biosphere

40
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com