Title: Identification and Characterization of Lin12/Notch Repeats (LNRs):
1Identification and Characterization of
Lin12/Notch Repeats (LNRs)
A Bioinformatics ApproachFathima F. Jahufar,
Framingham High School 07. Didem Vardar-Ulu,
Chemistry Department
- Abstract
- Lin12 Notch Repeats (LNRs) are Ca2 binding,
cysteine-rich protein domains. They were first
found in a block of three in a transmembrane
receptor protein called Notch. Since then they
have also been found in other types of
multidomain proteins such as the
Pregnancy-associated Plasma Protein (PAPP) and
Stealth proteins. In these proteins, the LNRs
are present in a variety of different numbers and
arrangements. - For this project, we have used a variety of
different bioinformatics tools to identify,
align, and compile information on different LNRs
from different protein sources. These tools
include BLAST, ClustalW, ExPASy Proteomics Tools,
and UniProt. Using these tools, we have been able
to compile a list of different LNRs along with
certain physicochemical properties of each,
including the theoretical pI, the molecular
weight, the number of acidic and basic residues
and the extinction coefficients. We have also
broken down the percentages of each amino acid
and each type of amino acid in each residue
position relative to the cysteines. - Our preliminary results indicate that although
all LNRs, regardless of their origin, are small,
acidic sequences. There are important subtle
differences in the details of each LNR sequence
that might shed light into their unique
biological function within the larger multidomain
protein scaffold. The compilations presented in
this work are useful in comparing different LNRs
and deciding which LNRs would be valuable for
further studies. . - Introduction
- Lin12 Notch Repeats (LNRs) are relatively short
protein domains (only about 35-40 amino acids
long) found in a variety of different protein
families. LNRs were first found in a block of
three in Notch protein, a transmembrane receptor
protein. In this protein, LNRs help maintain the
receptor in a resting, metalloprotease-resistant
conformation prior to ligand binding (1) . LNRs
are also found in other multidomain proteins such
as PAPP proteins and Stealth proteins. PAPP
proteins, like the Notch, have three LNRs.
However, the third LNR is separated from the
second LNR by more than 1000 amino acids (2).
LNRs in PAPP are thought to determine the
proteolytic specificity of PAPP, which cleaves
insulin-like growth factor-binding proteins (2) .
In Stealth, LNRs come in ones or twos, but are
not found in all Stealth proteins (3). - Average natural abundance of cysteine in
proteins is about 2.3 (4). However, most LNRs
are 15-17 cysteine. Hence, they are very
cysteine rich and require Ca2 to fold properly
into their native forms. Most LNRs have six
cysteines, while a few have only four. These
cysteines help to form three (or two) specific
disulfide bridges that help give LNRs their
structure. LNRs also contain several aspartic
acids and asparagines that coordinate the binding
of Ca2 ions. - Using bioinformatics to study LNRs involves the
use of websites such as UniProt, BLAST,
ClustalW2, and ExPASy Proteomics Tools. UniProt
allows keyword/ text searches to identify amino
acid sequences from different data bases. It
also matches input sequences to sequences within
proteins in a database and provides basic
information about these proteins. Protein BLAST
(Basic Local Alignment Search Tool) compares
amino acids sequence inputs to those in the
protein database and outputs significant matches.
ClustalW2 is an online tool that aligns multiple
amino acid sequences facilitating one to one
amino acid comparisons. Finally, ExPASy (Expert
Protein Analysis System) Proteomics tools allow
information to be gathered and predictions to be
made about amino acids sequences. We have used
UniProt and BLAST to first identify different LNR
sequences within the protein database and to
determine their location within their
corresponding protein sources. Then, we used
ClustalW to align these LNRs, after which we
improved these automated alignments manually
based on the position of the cyteines and the
Ca2 coordinating residues that define an LNR.
Finally, in order to better understand and
predict the biochemical and biophysical
characteristics of LNRs, we used EXPASY
Proteomics Tools to compile a list of
physicochemical properties for each of the
identified LNR sequences. The alignments of the
LNRs (each slot numbered) and small sections of
the tables detailing the properties of the LNRs
and of each slot in the alignments are presented
here.
Slot Total High. Perc. Hydr. (G, A, V, L, I, M, P) Arom. (F, Y, W) Basic (H, K, R) Acidic (D, E) Polar (S, C, T, N, Q)
1 9 78 L 9 100 Hydrophobic
2 10 40 N 3 1 6 60 Polar
3 12 33 F 8 4 67 Hydrophobic
4 12 25 N 5 1 2 4 42 Hydrophobic
5 18 33 D 4 1 4 6 3 33 Acidic
6 19 47 P 13 2 4 68 Hydrophobic
7 29 31 E 7 8 2 10 2 34 Acidic
8 29 21 K 15 8 3 3 52 Hydrophobic
9 29 28 N 7 1 4 17 59 Polar
10 29 100 C 29 100 Polar
11 2 50 V, E 1 1 50 Acid/Hydr.
12 3 33 D, V, T 1 1 1 33 Hydr/Acid/Polar
13 3 33 Y, S, L 1 1 1 33 Hydr/Arom/Polar
14 3 33 Q, N, R 33 Q, N, R 1 2 67 Polar
15 10 50 N 1 1 1 7 70 Polar
16 10 60 P 7 1 2 70 Hydrophobic
17 18 33 L 11 1 6 61 Hydrophobic
18 28 25 Y 5 8 1 7 7 29 Aromatic
19 32 22 D 13 4 8 7 41 Hydrophobic
20 32 22 Q 13 7 3 9 41 Hydrophobic
Fig. 4 Alignment Slots Statisticss Some of
Them. Each slot (see Fig. 3) is analyzed for the
most abundant amino acid (Column 3 Highest
Percentage) and then analyzed for different types
of amino acids (Columns 4-8). Many slots are made
predominantly of a certain type of amino acid.
Information for slots 1-20 is shown.
Name Accession Sequence Residues Cys MW pI (Theor.) neg. Pos. Instability Index aliphatic Aromatic Basic Acidic Total Ex. Co. (all half)
hN1 LNRA P46531 EEACELPECQEDAGNKVCSLQCNNHACGWDGGDCS 1447-1481 6 3715.9 3.89 8 1 92.14 10 1 2 13 35 5875
hN1 LNRB P46531 LNFNDPWKNCTQSLQCWKYFSDGHCDSQCNSAGCLFDGFDCQ 1482-1523 6 4827.2 4.28 5 2 53.47 7 7 3 13 42 12865
hN1 LNRC P46531 RAEGQCNPLYDQYCKDHFSDGHCDQGCNSAECEWDGLDCA 1524-1563 6 4484.7 4.12 9 2 20.63 9 4 4 14 40 8855
hN2 LNRA Q04721 PATCLSQYCADKARDGVCDEACNSHACQWDGGDC 1422-1455 6 3593.8 4.17 6 2 52.41 10 2 3 9 34 7365
hN2 LNRB Q04721 LTMENPWANCSSPLPCWDYINNQCDELCNTVECLFDNFECQ 1457-1497 6 4804.3 3.26 7 0 50.62 7 5 0 15 41 12865
hN2 LNRC Q04721 GNSKTCKYDKYCADHFKDNHCDQGCNSEECGWDGLDCA 1498-1535 6 4261.5 4.64 8 4 36.11 7 4 6 12 38 8855
hN3 LNRA Q9UM47 EPRCPRAACQAKRGDQRCDRECNSPGCGWDGGDCS 1384-1418 6 3785.1 6.31 6 6 69.21 8 1 6 9 35 5875
hN3 LNRB Q9UM47 LSVGDPWRQCEALQCWRLFNNSRCDPACSSPACLYDNFDCH 1419-1459 6 4722.2 4.75 5 3 95.14 9 5 4 10 41 12865
hN3 LNRC Q9UM47 AGGRERTCNPVYEKYCADHFADGRCDQGCNTEECGWDGLDCA 1460-1501 6 4616.9 4.35 9 4 34.81 12 4 5 12 42 8855
hN4 LNRA Q5STG5 CEGRSGDGACDAGCSGPGGNWDGGDCS 1180-1206 4 2490.5 3.71 5 1 48.32 11 1 1 6 27 5750
Q5STG5 PGAKGCEGRSGDGACDAGCSGPGGNWDGGDCS 1175-1206 4 2900.9 4.04 5 2 37.03 14 1 2 6 32 5750
hN4 LNRB Q5STG5 LGVPDPWKGCPSHSRCWLLFRDGQCHPQCDSEECLFDGYDCE 1207-1248 6 4830.3 4.57 8 3 83.37 9 5 5 10 42 12865
hN4 LNRC Q5STG5 TPPACTPAYDQYCHDHFHNGHCEKGCNTAECGWDGGDCR 1249-1287 6 4297.6 5.2 6 2 69.48 8 4 6 9 39 8855
mN1 LNRA Q01705 EEACELPECQVDAGNKVCNLQCNNHACGWDGGDCS 1446-1480 6 3713 3.95 7 1 74.68 11 1 2 13 35 5875
mN1 LNRB Q01705 LNFNDPWKNCTQSLQCWKYFSDGHCDSQCNSAGCLFDGFDCQ 1481-1522 6 4827.2 4.28 5 2 53.47 7 7 3 13 42 12865
mN1 LNRC Q01705 LTEGQCNPLYDQYCKDHFSDGHCDQGCNSAECEWDGLDCA 1523-1562 6 4471.7 3.93 9 1 25.45 9 4 3 14 40 8855
dN LNRA P07207 RAMCDKRGCTECQGNGICDSDCNTYACNFDGNDCS 1479-1513 7 3771 4.17 6 3 60.46 7 2 3 11 35 1865
Fig. 5 Physichochemical characteristics of LNRs.
Each LNR sequence is characterized using ExPASy
Proteomics. Information such as the theoretical
pI and total number of residues tells us that all
LNRs are acidic and are less than 45 amino acids
long. This tables shows a few of the
characteristics and compiled information for some
selected LNR sequences.
LNR A
LNR B
LNR C
Conclusion/Future Work Our preliminary results
indicate that although all LNRs, regardless of
their origin, are small, acidic sequences, there
are important subtle differences in the details
of each LNR sequence that might shed light into
their unique biological function within the
larger multidomain protein scaffold. We have
also found that some slots in the alignment of
the LNRs predominantly contain a certain type of
amino acid, either acidic, basic, hydrophobic,
polar or aromatic. This compiled information, in
the future, will be used in deciding which LNRs
are relevant for further experimental
characterization study and comparison of the
bioinformatics data with experimental results
will give us a clearer understanding of the
characteristics of LNRs from such a diverse
variety of protein families.
LNR C
LNR A
LNR B
- References
- Notch Subunit Heterodimerization and Prevention
of Ligand-Independent Proteolytic Activation
Depend, Respectively, on a Novel Domain and the
LNR Repeats. Cheryl lSanchez-Irizarry, Andrea C.
Carpenter, Andrew P. Weng, Warren S. Pear, Jon C.
Aster, and Stephen C. Blacklow. Molecular and
Cellular Biology, Nov.2004, Vol.24, No.21. p
92659273. - The Lin12-Notch Repeats of Pregnancy-associated
Plasma Protein-A Bind Calcium and Determine its
Proteolytic Specificity. Henning B. Boldt, Kasper
Kjaer-Sorensen, Michael T. Overgaard, Kathrin
Weyer, Christine B. Poulsen, Lars Sottrup-Jensen,
Cheryl A. Conovers, Linda C. Giudice, Claus
Oxvig. Journal of Biological Chemistry, Sept.
2004, Vol. 279, No. 37, p. 38525-38531. - Stealth Proteins In Silico Identification of a
Novel Protein Family Rendering Bacterial
Pathogens Invisible to Host Immune Defense. Peter
Sperisen, Christoph D. Schmid, Philipp Bucher,
Olav Zilian. PLoS Comput Biol. 1(6) e63. 2005. - Number of Cysteines Histogram. UCSC Genome
Bioinformatics. Updated 12 Feb. 2004. - lthttp//genome.ucsc.edu/google/goldenPath/help/pb
TracksHelpFiles/pbcCnt.shtmlgt -
Acknowledgements - National Science Foundation
Research Experiences for Undergraduates (NSF-REU)
in Chemistry and Physics - Professor Didem
Vardar-Ulu, Christina Hao, Sharline Madera, and
Ursela Siddiqui.