Title: Evolution of paralogous proteins
1Evolution of paralogous proteins
- Level 3 Molecular Evolution and Bioinformatics
- Jim Provan
Patthy Sections 7.1 - 7.3
2Advantagous duplications
- Duplication of a complete protein-coding gene
results in two identical duplicons - Encode same protein and express it in the same
way as original - Duplication will be advantageous if increased
supply of gene product is advantageous (e.g
histones or ribosomal proteins) - Duplication by retroposition results in a
processed duplicon - Same protein encoded but expression likely to be
different - Duplication will be advantageous if change in
expression pattern (tissue specificity,
developmental stage) has biological advantage - Fully redundant functions (neutral duplications)
may be lost through drift but may persist long
enough to acquire an advantageous function - New proteins generally arise through modulation
of existing ones
3Advantageous duplication of unprocessed genes
- Duplications are advantageous and maintained by
positive selection if having multiple genes
performing the same function ensures enhanced
efficiency e.g. histones - Positive selection for advantageous gene
duplications can be observed in - Insects exposed to insecticides
- Tumours and protozoa exposed to drugs
- Various stomach lysozymes in ruminants
- Seven closely related mRNAs encoding lysozymes
- Multiple lysozyme genes arose through duplication
- Serial duplication of lysozyme gene was an
adaptive response to increase the expression of
lysozyme
4Advantageous duplication of processed genes
- Due to changes in chromosomal environment and/or
5 regulatory regions, processed gene may have
altered regulatory features and may not be
competing with its progenitor - Lower fidelity of reverse transcription means
that duplicon may have deleterious mutations and
be lost - Two examples of positive selection increasing the
chance of survival of testis-specific processed
genes - Phosphoglycerate kinase retrogene
- Pyruvate dehydrrogenase E1a retrogene
5The phosphoglycerate kinase retrogene
- Two functional PGK loci in the mammalian genome
- PGK-1 is X-linked and expressed constitutively in
all somatic cells - PGK-2 is a functional autosomal gene expressed in
a tissue-specific manner exclusively in the late
stages of spermatogenesis - PGK-2 lacks introns and has a poly(A) tail
processed gene - Evolution of PGK-2 was a compensatory response to
inactivation of the X-linked gene before meiosis - Mature spermatozoa require PGK to metabolise
fructose in semen - X-inactivation called for a functional autosomal
PGK locus - Unequal crossing-over would not have solved this
problem - Processed gene had an initial advantage since it
permitted the expression of PGK in a tissue where
the X-linked gene was inactivated - Subsequently evolved a testis-specific promoter
6The pyruvate dehydrogenase E1a retrogene
- PDH E1a subunit of the PDH complex is located on
the X chromosome and is expressed in somatic
tissues - Another PDH E1a locus is found on chromosome 4
- Testis-specific and expressed in postmeiotic
spermatogenic cells - Lacks introns, has a downstream poly(A) tract and
is flanked by a pair of 10 bp direct repeats - After the last meiotic division, spermatids rely
on energy from pyruvate for the maturation
process PDH E1a is essential - X-chromosome is inactivated in postmeiotic
spermatogenic cells - Only half the cells contain an X-chromosome
- Evolution of an alternative, non-X-linked gene
7Neutral duplications
- If there is no selective advantage from a
duplication event, functional constraints
protecting the new gene from deleterious
mutations may be relaxed - May ultimately be converted to a pseudogene
- Many clusters of duplicated genes contain
pseudogenes - Advantageous mutations, either in the coding
sequence or the regulatory region, may lead to
positive selection - Where there is a major change of function,
several critical sites may be involved - New function might not be fully manifested until
several sites have adapted - Early mutational steps may be selectively neutral
8Visual pigment proteins
- Old World primates have three colour-sensitive
proteins - Green- and red-absorbing photoreceptors are
encoded by a pair of closely related (96
identity), closely linked genes on the X
chromosome - suggests very recent gene
duplication - Blue photoreceptor is encoded by an autosomal
gene - New World monkeys have only one X-linked pigment
- Duplication must have occurred in the ancestor of
Old World monkeys after divergence from New World
monkeys - Humans, apes and Old World monkeys can
discriminate three colours whereas New World
monkeys can only distinguish two - Prior to emergence of three pigments, rate of
non-synonymous substitution exceeded synonymous
rate suggests positive selection for
three-colour vision
9Serine proteinases and their inhibitors
- Pancreatic proteinases (trypsin, chymotrypsin and
elastase) illustrate paralogues with minor
modifications of function - Strikingly similar three-dimensional structures
- Very different substrate specificities
- Elastase cleaves residues with small, non-polar
side chains (Ala, Val etc.) - Chymotripsin cleaves at bulky, hydrophobic
residues (Phe-X, Tyr-X etc.) - Trypsin cleaves only Arg-X or Lys-X
- Advantage of having multiple digestive
proteinases is clear their combined activities
ensure more efficient utilisation of proteins in
foodstuffs. - Original duplicons survived since they acquired
advantageous mutations that diversified their
function
10Serine proteinases and their inhibitors
- Molecular basis of differences in substrate
specificity can be rationalised from
three-dimensional structure and understanding of
the catalytic mechanism - Specificity of trypsins for Arg or Lys residues
is due to - Deep substrate binding site which can accommodate
side chains - Asp-189 residue at the bottom of the pocket which
neutralises the Arg / Lys residue of the
substrate - Specificity of chymotrypsin for bulky aromatic
residues due to - Large, hydrophobic substrate binding pocket
- Small, neutral (usually Ser) residue at position
189 - Elastase has shallower binding site
- Residues 216 and 226 have small side chains in
trypsin/chymotrypsin - Elastase has bulkier residues (Val, Thr) at these
positions
11Serine proteinases and their inhibitors
- Substitution of a few, key residues can alter
sequence specificity without eliminating enzyme
activity - Replacement of Asp-189 of trypsin with a Ser
residue (to mimic chymotrypsin) greatlydiminishes
activity towards Lys or Arg and increases
specificity for hydrophobic substrates 10- to
50-fold - Lack of complete change of substrate suggests
that other readjustments had to occur during
divergence of trypsin and chymotrypsin from their
common ancestor - Supports notion that new function may emerge by
continual improvement of function - Correlated with functional adaptation of serine
proteinase inhibitors - Porcine elafin genes have 93-98 conservation in
introns but only 60-77 similarity in exon 2,
which encodes the inhibitor domain - Due to accelerated mutation rate KA gtgt KS
12UDP-glucuronosyltransferases (UDPGTs)
- UDPGTs detoxify hundreds of compounds by
conjugation and increasing water solubility to
facilitate excretion - In mammals, bilirubin (by-product of haem
turnover) must undergo detoxification by
conjugation to glucuronic acid - Glucuronidation is carried out by a large family
of UDPGTs with different, but overlapping,
substrate specificities - Means that UDPGTs also affect levels of several
hormones - Overproduction (through whole-gene duplication)
deleterious - UDPGTs have distinct domains serving different
functions - N-terminal globular domain which binds toxic
substrate - C-terminal globular domain involved in
UDP-glucuronic acid binding - Substitutions in restricted regions of N-terminal
domain led to diversification of substrate
specificities
13UDP-glucuronosyltransferases (UDPGTs)
- Two families of UDPGTs in mammals which differ
markedly in evolutionary strategy used for
functional diversification - UDPGT2B subfamily has evolved through classic
process of whole gene duplication resulting in
several isoforms - Clustered gene family on chromosome 4
- Primary substrates include 4-hydroxysterone and
hyodeoxycholic acid - Single UDPTG1 gene complex on chromosome 2 has
diversified by duplication only of exon 1, which
encodes substrate-binding domain - Human UDPGT1 gene complex has six closely related
exon 1 variants - Single set of four exons that encode the
C-terminal parts of UDPGTs - mRNAs of different isoforms produced by
differential splicing of one of the exon 1
variants onto the constant C-terminal exons
14Major change of function in paralogous genes
- Some members of the serine protease family (e.g.
haptoglobin, hepatocyte growth factor,
azurocidins) have lost their capacity to act as
proteinases - Have lost one or more of the residues in the
catalytic triad - Have other important biological functions
- Haptoglobin binds globin release from lysed
erythrocytes - Hepatocyte growth factor acts through specific
receptor tyrosine kinases to stimulate cell
growth - Azurocidin has bactericidal activity
- Careful analysis sometimes indicates plausible
pathway for transition from one function to
another
15Evolution of azurocidins
- Azurophil granules of neutrophils contain several
proteins implicated in the killing of
microorganisms - Serine proteases that cause degradation of
connective tissues (cathepsin G, neutrophil
elastase, proteinase 3) - Azurocidin is similar to these but lacks His-57
and Ser-195 and thus has no proteolytic activity - Bactericidal activity of azurocidin mediated by
tight binding to anionic lipopolysaccharide, a
component of the Gram-negative bacterial
envelope - Serine proteinase fold used as a scaffold for
endotoxin binding - Fact that azurocidins share most recent common
ancestor with proteinases that have antibacterial
activity suggests that this was a common function
of the ancestor - New function probably emerged before original
function was lost
16Major change of function by domain acquisition
- In proteinases involved in blood coagulation,
very large segments are joined to the
trypsin-homologue region - These nonproteinase parts of plasma proteinases
consist of multiple structural-functional domains
that were introduced by exon shuffling - Function modified not only by point mutations but
also by domain insertions and duplications - Proteinase domains retained proteolytic activity
but point mutations led to a altered (usually
narrower) sequence specificity - Value of domain-acquisition mutations is that
they can endow novel binding specificities and
lead to dramatic changes in regulation and
targeting
17Modular structure of blood coagulation and
fibrinolytic proteinases
18Domain acquisition in the evolution of plasma
proteinases
- Selective value of domains joined to proteinase
domain illustrated by fact that they are usually
involved in interactions with cofactors,
substrates or inhibitors - Vitamin K-dependent calcium-binding domains of
prothrombin, coagulation factors VII, IX, X and
protein C anchor proteinases to phospholipid
membranes ensuring proper regulation of cascade - Kringle domains of plasmin and plasminogen are
critical for binding of proteinase to its primary
substrate, fibrin - Serine proteinase domain has proteinase
specificity very similar to that of trypsin
(Lys-X and Arg-X) - Fibrin specificitydue to fact that kringle
domains have specific fibrin-binding sites that
target the enzyme to fibrin
19Similarities and differences in the evolution of
paralogous and orthologous proteins
- Common protein folds are conserved in both
paralogues and orthologues and structural
elements generally accept mutations at similar
rates between the two - One difference is that orthologous proteins are
likely to fulfil very similar functions in
different species whereas paralogous proteins are
more likely to have diversified in function - When comparing orthologous proteins, residues
that are critical for structure, function and
specificity are equally likely to be conserved - When comparing paralogous proteins that fulfil
different functions, only residues essential for
structure are likely to be conserved