Facts and Artefacts: - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Facts and Artefacts:

Description:

Rat mitochondrial IF1 protein mRNA, L07806, 883 bp. Rat casein kinase II alpha ... Rat mitochondrial succinyl-CoA synthetase alpha subunit J03621, 1684 bp ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 32
Provided by: davidh63
Category:
Tags: artefacts | facts | rat

less

Transcript and Presenter's Notes

Title: Facts and Artefacts:


1
Facts and Artefacts Database Anomalies Revealed
by the Analysis of Rat Ly6 Proteins Dr
Christopher Southan Oxford GlycoSciences (UK)
Ltd Harwell Seminar, November 2002
2
Introduction Quirks that Lurk in Databases
  • The sequence deluge into the primary databases
    necessitates automated pipelines to produce
    'value added' secondary databases
  • But, however sophisticated the data parsing or
    curation, anomalies will get through
  • Most things that could have gone wrong, have
  • Although the overall quirk frequency is low, they
    present pitfalls for the unwary
  • Responsibility for primary annotation and
    sequence quality lies solely with submitting
    authors
  • Few originating authors correct, update or
    withdraw their primary sequence entries
  • It is difficult to discriminate between in vitro
    artifacts or rare in vivo events

3
Outline
  • Proteomic analysis of rat urine lead to the
    identification of 2 novel secreted proteins in
    EST data
  • Further searching expanded these findings to a
    large family of rat and mouse proteins, and
    vertebrate homolgoues of short Ly-6 proteins with
    unknown biology
  • Bioinformatic analysis of database matches
    exposed a swathe of anomalies including
  • chimeric and pre-mRNAs
  • sequence errors
  • naming ambiguities
  • equivocal functional data
  • The OGS Protein Atlas of the human genome
    includes peptide data from one short Ly6
    homologue, Lynx1
  • Combining proteomic data with sequence analysis
    delineated the Lynx1 gene product and inferred
    biochemical properties of the protein

4
Rat Urine ? 2Dgel ? Trypsin ? MS/MS ? PepSea
Search ? EST hits
  • Spot area 1 gave two different
  • peptide matches
  • CTSFDSTGFCHVGR contained within Rat EST A893514
  • CESLDSTGLCR contained within the nucleotide
    sequence of EST AA800439

5
Rat Urine ? HPLC ? MALDI ? N-Terminal Sequence
6
EST AA893514 vs dbEST 30 Rat Hits at 95 to
100 Identity
7
Assembly of Rat Urinary Proteins 1 and 2
  • 9 EST sequences, the MS/MS sequences, and the
    N-terminal data, were all consistent with two
    paralogous proteins
  • 90 identical at the AA level and 96 identical
    at the DNA level
  • One N-glycosylation site
  • Secreted forms abundant in male rat urine by HPLC
  • Highly represented in rat liver ESTs

8
RUP-3 Confirmed by Data from Wait et al
Electrophoresis 22, 3043-3052 (2001)
RUP1 MGKPILLLPLGLSLLMSSLLALQCFRCESLDSTGLCRVGRRICQ
TYPDEICAWVVVTTRD RUP2 MGKHILLLPLGLSLLMSSLLALQCFRC
TSFDSTGFCHVGRQKCQTYPDEICAWVVVTTRD RUP3
MGKHILLLPLGLSLLMSSLLALQCFRCISFDSTGFCYVGRHICQTYPDEI
CAWVVVTTRD
. RUP1
GKFVYGNQSCAECIGTTVEHGSLIISTNCCSATPFCNMVHP EST
AA800439 RUP2 GKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPF
CNMVHR EST AA893514 RUP3 GKFVYGNQSCAECNATTVEHGSL
IVSTNCCSATPFCNMVHR EST AA893518
.
9
TIGR One Assembly but at Least Two Gene Products
10
Solid Matches between RUP2 and Three Unrelated
mRNAs
  • Rat mitochondrial IF1 protein mRNA, L07806, 883
    bp
  • Rat casein kinase II alpha subunit (CK2), L15618,
    2180 bp
  • Rat mitochondrial succinyl-CoA synthetase alpha
    subunit J03621, 1684 bp
  • Matches of 92 to 100 identity over 300-500
    bases
  • Two in reverse-frame, one in forward frame

11
Searches Against Rat ESTs Confirmed the Three
mRNAs as Chimeras
L07806
L15618
J03621
12
Translation Matches for the Chimeras Reveal a
Cryptic Ly-6 Protein verified by rat genomic hit
RUP-2 28 TSFDSTGFCHVGRQKCQTYPDEICAWVVVTTRDGKF
VYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR 101
TSFDSTGFCHVGRQKCQTYPDEICAWVVVTTRDGKFVYGNQSCAEC
NATTVEHGSLIVSTNCCSATPFCNMVHR 417
TSFDSTGFCHVGRQKCQTYPDEICAWVVVTTRDGKFVYGNQSCAECNATT
VEHGSLIVSTNCCSATPFCNMVHR 196 L07806 Rattus
rattus mitochondrial IF1 protein mRNA RUP-2 59
RDGKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR 101
RDGKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMV
HR 708 RDGKFVYGNQSCAECNATTVEHGSLIVSTNCCSATP
FCNMVHR 580 L15618 Rat casein kinase II alpha
subunit (CK2) mRNA RUP-2 24 CFRCTSFDSTGFCHVGRQK
CQTYPDEICAWVVVTTRDGKFVYGNQSCAECNATTVEHGSLIVSTNCCSA
TPFCNMV 99 CF C S G C C P
ECA VT DGKFVYGNQSCAEC TVEHGSLIVSTNCCSAT
FCNV 50 CFECGNLNSMGICNFRTAVCYAHPGEVCA-SVL
TYKDGKFVYGNQSCAECSGRTVEHGSLIVSTNCCSATSFCNIV
274 J03621 Rat mitochondrial succinyl-CoA
synthetase alpha subunit
13
Comparing the J03621 Chimera EST matches against
rat genome HTG
J0362 mRNA vs rat ESTs
J0362 mRNA vs rat genomic
14
What Caused the Chimeras?
  • Each of the chimeric cDNAs submitted by different
    research groups 1988-1993
  • All prepared from rat cDNA libraries
  • Two of these genes are nuclear-encoded
    mitochondrial proteins
  • L07806 has 2 non-chimeric counterparts
  • 3 'host' transcripts are on different loci in
    humans (no rat map data yet)
  • The 5' insertions are different sequences,
    lengths and orientations
  • Only L15618 is single-exon insert
  • Hits to unfinished rat genome data confirm the
    chimeras
  • Is the insertion of RUP2-like genes an in vitro
    artefact or a rare event in vivo?

15
mRNA Anomaly No. 4 Unspliced?
  • LOCUS AF368860 1197 bp mRNA 13-JUN-2001
  • cds 10..96 "MGKHILLLPLVLSLLMSSLQDSCGHEPS
    trQ91XP0
  • Rattus norvegicus 3' non-translated
    beta-F1-ATPase mRNA-binding protein mRNA,
    complete cds. "Identification of a liver
    specific cDNA clone chaperoning the differential
    assembly of ribonucleoprotein complexes at the 3'
    UTR of the mRNAs of oxidative phosphorylation"

BLAST vs Rat ESTs
16
The L07806 Chimera Caused Major Errors in the
UniGene Cluster for Rn.1658
Atpi ATPase inhibitor (rat mitochondrial IF1
protein)
17
Sequence Conflict arising from the L07806-Derived
Chimeric ORF is Flagged by SwissProt
  • But the L07806-derived protein, without the
    targeting sequence, was expressed as maltose
    binding protein fusion in E coli and was fully
    active!

18
RUP Homologues Expand a New Family of Secreted
Ly-6 Proteins but not (yet?) Recognised by
InterPro
19
Confusion Over Caltrin 5 Different Sequences in
SwissProt 22 PubMed Citations
  • Caltrin inhibition of Ca2 uptake into
    spermatozoa
  • CALTRIN PRECURSOR (CALCIUM TRANSPORT INHIBITOR).
    - Mus musculus (a Ly-6 protein)
  • CALTRIN PRECURSOR (CALCIUM TRANSPORT INHIBITOR)
    (SEMINALPLASMIN) (SPLN). - Bos taurus (PYY-like)
  • CALTRIN-LIKE PROTEIN I. - Cavia porcellus (weak
    protease inhibitor match)
  • CALTRIN-LIKE PROTEIN II. - Cavia porcellus
    (elastase inhibitor like)
  • PANCREATIC SECRETORY TRYPSIN INHIBITOR II
    PRECURSOR (PSTI-II) (CALTRIN) (CALCIUM TRANSPORT
    INHIBITOR). - Rattus norvegicus (trypsin
    inhibitor identity)

20
Mouse Ly-6-like Caltrin Sequence Errors,
Unverified Reported Function, New Name and New
Function?
21
ARS component B, Antineoplastic Urinary Protein
and Secreted Mammalian Ly-6/uPAR Related Protein
Updates for SwissProt P55000
22
Linking Sequence to Function the Lost Keyword
Problem (PubMed Queries in red)
  • Adermann et al. "Structural and phylogenetic
    characterisation of human SLURP-1, the first
    secreted mammalian member of the Ly-6 /uPAR
    protein superfamily" Protein Sci. 1999 from
    blood and urine peptide libraries. SLURP-1 is
    encoded by the ARS (component B)-81/s locus, and
    appears to be the first mammalian member of the
    Ly-6/uPAR family lacking a GPI-anchoring signal
    sequence ... SLURP-1 () Ly-6 () ANUP (-)
  • Katz et al "A partial catalog of proteins
    secreted by epidermal keratinocytes in culture."
    J Invest Dermatol. 1999 proteins secreted by
    adult human epidermal keratinocytes included
    anti-neoplastic urinary protein () ANUP (-)
    SLURP-1(-) Ly-6 (-)
  • Fischer et al. "Mutations in the gene encoding
    SLURP-1 in Mal de Meleda". Hum Mol Genet. 2001
    Three different homozygous mutations (a deletion,
    a nonsense and a splice site mutation) were
    detected in 19 families of Algerian and Croatian
    origin first instance of a secreted protein
    being involved in a palmoplantar keratoderma..
    SLURP-1 () Ly-6 () ANUP (-)

23
First RUP orthologue in Mouse Ensembl, Chr 9
24
NCBI Genomic Pipeline also Predicts New
Orthologues in Mouse
25
Not all Pipelines are Equal Matching NCBI
XP_135421 to the UCSC Mouse Genome
26
Unknown Biology for the Short Ly-6 Proteins
  • Single domain proteins 85-100 residues mostly
    with signal peptide
  • Probable ligands by inference from toxin
    structures?
  • Recently duplicated rat parologous family of up
    to 10 gene loci
  • Liver and spleen expression in rat
  • SLURP linked to skin physiology?
  • Caltrin/SVS VII Phospholipid binding?
  • Foetal expression for probable pig and bovine
    orthologues
  • Fast-evolving orthologues in mouse but only
    homologues in human ?
  • Distant homologues involved in myelopoiesis in
    Xenopus and liver acute phase in rainbow trout
  • Distant homologues in C.elegans

27
Proteins in SwissProt/TREMBL Submitted or Updated
from this Work
  • Submitted
  • RSP1_RAT (Q9QXN2) Spleen protein 1
  • UP1_RAT (P81827) Urinary protein 1 (RUP-1)
  • UP2_RAT (P81828) Urinary protein 2 (RUP-2)
  • P8312 Urinary protein 3 (RUP-3) Rat
  • P83106 PIP1 protein (PIP1) - Sus scrofa
  • P83107 BOP1 protein (BOP1) - Bos taurus
  • Q9BZG9 Ly-6 neurotoxin-like protein Lynx1 - Homo
    sapiens
  • Updated/corrected links
  • SLUR_MOUSE (Q9Z0K7) Secreted Ly-6/uPAR protein
  • SLUR_HUMAN P55000 Secreted Ly-6/uPAR protein
  • CALT_MOUSE Q09098 Caltrin (now renamed Seminal
    Vesicle Protein 7)

28
The Pitfall List (1)
  • TIGR EST assembly has merged paralogues by shared
    identity
  • The chimeric and pre-mRNAs lead to
  • Artifactual clustering of ESTs and non-homologous
    gene products in Unigene
  • Protein database conflicts and artifacts
  • Annotation errors in LocusLink and dbEST
  • RUP gene products missed in LocusLink/Unigene
  • Chimeric mRNA picked as Refseq and therefore
    transitively propagated to RATMAP
  • Translation of cryptic novel protein not captured

29
The Pitfall List (2)
  • Loose ends and sequence errors in old data
    captured by SwissProt but unresolved by authors
  • Equivocal functional annotation transitively
    perpetuated
  • Sequence-literature links broken by gene name
    ambiguities
  • Incorrect signal peptide annotation
  • Similarity scores for Ly-6 homologues fall below
    those used in domain databases
  • Big time lag for sequences without mRNA appearing
    in NCBI system

30
Conclusions
  • Finding quirks in database entries is definitely
    part of the fun
  • of bioinformatics, but
  • Sequence anomalies can seriously confound
    automated and manual annotation
  • They can only be unravelled, or at least exposed
    by
  • transitive and broad sequence/keyword searching
  • detailed examination of sequence and literature
    links
  • understanding the technology of sequencing from
    different sources and database building
    procedures
  • Conflicting data links should be ideally be
    resolved by new data but may have to be resolved
    by judgment
  • Inferring biological meaning from database search
    results requires an understanding of the
    experiments and the in-silico analyses
    underpinning the annotations

31
Acknowledgements
Southan, C., Cutler, P., Birrell, H., Connelly,
J., Fantom, K.G.M., Sims, M., Shaikh, N., and
Schneider, K. "The characterisation of novel
secreted Ly-6 proteins from rat urine by the
combined use of 2-dimensional gel
electrophoresis, microbore HPLC and expressed
sequence tag data" Proteomics, (2002) 2, 187-196
Write a Comment
User Comments (0)
About PowerShow.com