GENOME ANNOTATION AND FUNCTIONAL GENOMICS The protein sequence perspective - PowerPoint PPT Presentation

About This Presentation
Title:

GENOME ANNOTATION AND FUNCTIONAL GENOMICS The protein sequence perspective

Description:

GENOME ANNOTATION AND FUNCTIONAL GENOMICS The protein sequence perspective – PowerPoint PPT presentation

Number of Views:427
Avg rating:3.0/5.0
Slides: 73
Provided by: Beat273
Category:

less

Transcript and Presenter's Notes

Title: GENOME ANNOTATION AND FUNCTIONAL GENOMICS The protein sequence perspective


1
GENOME ANNOTATION AND FUNCTIONAL GENOMICSThe
protein sequence perspective
2
GENOME ANNOTATION
  • Two main levels
  • STRUCTURAL ANNOTATION Finding genes and other
    biologically relevant sites thus building up a
    model of genome as objects with specific
    locations
  • FUNCTIONAL ANNOTATION Objects are used in
    database searches (and expts) aim is attributing
    biologically relevant information to whole
    sequence and individual objects

3
WHY PROTEIN RATHER THAN DNA?
  • Larger alphabet -more sensitive comparisons
  • Protein sequences lower signal to noise ratio
  • Less redundancy and no frameshifts
  • Each aa has different properties like size,
    charge etc
  • Closer to biological function
  • 3D structure of similar proteins may be known
  • Evolutionary relationships more evident
  • Availability of good, well annotated protein
    sequence and pattern databases

4
Large-scale genome analysis projects
  • Rate-limiting step is annotation
  • Whole genome availability provides context
    information
  • Main goal is to bridge gap between genotype and
    phenotype

5
Definitions of Annotation
  • Addition of as much reliable and up-to-date
    information as possible to describe a sequence
  • Identification, structural description,
    characterisation of putative protein products and
    other features in primary genomic sequence
  • Information attached to genomic coordinates with
    start and end point, can occur at different
    levels
  • Interpreting raw sequence data into useful
    biological information

6
ANNOTATION/FUNCTION CAN BE MAPPED TO DIFFERENT
LEVELS
  • ? ORGANISM -phenotypic function (morphology,
    physiology, behaviour, environemntal response),
    context NB
  • ? CELLULAR -metabolic pathway, signal cascades,
    cellular localisation. Context dependent
  • ? MOLECULAR -binding sites, catalytic activity,
    PTM, 3D structure
  • ? DOMAIN
  • ? SINGLE RESIDUE

7
Annotation is the description of
  • Function(s) of the protein
  • Post-translational modification(s)
  • Domains and sites
  • Secondary structure
  • Quaternary structure
  • Similarities to other proteins
  • Disease(s) associated with deficiencie(s) in the
    protein
  • Sequence conflicts, variants, etc.

8
Additional information for proteins
  • ALTERNATIVE PRODUCTS
  • CATALYTIC ACTIVITY
  • COFACTOR
  • DEVELOPMENTAL STAGE
  • DISEASE
  • DOMAIN
  • ENZYME REGULATION
  • FUNCTION
  • INDUCTION
  • PATHWAY
  • PHARMACEUTICALS
  • POLYMORPHISM
  • PTM
  • SIMILARITY
  • SUBCELLULAR LOCATION
  • SUBUNIT
  • TISSUE SPECIFICITY

9
Amino-acid sites are
  • Post-translational modification of a residue
  • Covalent binding of a lipidic moiety
  • Disulfide bond
  • Thiolester bond
  • Thioether bond
  • Glycosylation site
  • Binding site for a metal ion
  • Binding site for any chemical group (co-enzyme,
    prosthetic group, etc.)

10
Regions
  • SIGNAL SEQUENCE
  • TRANSIT PEPTIDE
  • PROPEPTIDE
  • CHAIN
  • PEPTIDE
  • DOMAIN
  • ACTIVE SITE
  • DNA BIND SITE
  • METAL BIND SITE
  • MOLECULE BIND SITE
  • TRANSMEMBRANE

11
Annotation sources
  • publications that report new sequence data
  • review articles to periodically update the
    annotation of families or groups of proteins
  • external experts
  • protein sequence analysis

12
Approaches to functional annotation
  • Automatic annotation (sequence homology, rules,
    transfer info from pdb)
  • Automatic classification (pattern databases,
    clustering, structure)
  • Automatic characterisation (functional databases)
  • Context information (comparitive genome analysis,
    metabolic pathway databases)
  • Experimental results (2D gels, microarrays)
  • Full manual annotation (SWISS-PROT style)

13
PROTEIN SEQUENCE ANALYSIS
  • Protein sequence can come from gene predictions,
    literature or peptide sequencing
  • Analysis on different levels
  • molecular
  • cellular
  • organism
  • Simplest case- match for whole sequence in
    database- determination of structure and function
  • In between- partial matches across sequence to
    diverse or hypothetical proteins
  • Difficult case- no match, have to derive
    information from amino acid properties, pattern
    searches etc

14
From sequence to function
15
Predicting function from sequence similarity
  • Orthologues- arose from speciation, same gene in
    different organisms -can have lt30 homology
  • Paralogues- from duplication within a genome,
    second copy may have new or changed function
  • (difficult to distinguish between otho- and
    paralogues unless whole genome is available)
  • Equivalog- proteins with equivalent functions
  • Analog- proteins catalyzing same reaction but not
    structurally related
  • Some enzymes may have seq similarity simply
    because common catalytic site, substrate,
    pathway.

16
TYPES OF HOMOLOGY
PROTEIN/DOMAIN
Superfamily
Duplication within species
Paralogues may have different functions
A
B
Speciation
Orthologues may have different functions, if
same - Equivalogs
B2
B1
17
Sequence homology in genomes
  • When you do a whole genome BLAST search there is
    a general pattern of results

Maverick genes tend to diverge more frequently
than core genes
18
Using homology information for automatic
annotation- automatic annotation of TrEMBL as an
example
19
Requirements for automatic annotation
  • Well-annotated reference database (eg SWISS-PROT)
  • Highly reliable diagnostic protein family
    signature database with the means to assign
    proteins to groups (eg CDD, InterPro, IProClass)
  • A RuleBase to store and manage the annotation
    rules, their sources and their usage

20
Direct Transfer
  • Search target
  • Transfer annotation to target database
  • ExampleFASTA against sequence database and
    transfer of DE line of best hit

XDB
Target
21
Multiple Sources
  • Usually more than one external database is used
  • Combine the different results

XDB
Target
22
Conflicts
  • Contradiction
  • Inconsistencies
  • Synonyms
  • Redundancy

23
Translation
  • Use a translator to map XDB language to target
    language

XDB
Target
24
Translation Examples
  • ENZYME ??TrEMBL CA L-ALANINED-ALANINECC -!-
    CATALYTIC ACTIVITY L-ALANINECC D-ALANINE.
  • PROSITE ??TrEMBL/SITE3,heme_ironFT METAL
    IRON
  • Pfam ??TrEMBL FT DOMAIN zf_C3HC4FT
    ZN_FING C3HC4-TYPE

25
Demands on a system for automated data analysis
and annotation
  • Correctness
  • Scalability
  • Updateable
  • Low level of redundant information
  • Completeness
  • Standardized vocabulary

26
What do we have?
  • SWISS-PROT
  • RuleBase
  • TrEMBL
  • PROSITE (and Pfam, PRINTS, ProDom, SMART, Blocks
    etc)
  • SWISS-PROT/TrEMBL/RuleBase in Oracle

27
Standardized transfer of annotation from
characterized proteins in SWISS-PROT to TrEMBL
entries
  • TrEMBL entry is reliably recognized by a given
    method as a member of a certain group of proteins
  • corresponding group of proteins in SWISS-PROT
    shares certain annotation
  • common annotation is transferred to the TrEMBL
    entry and flagged as annotated by similarity

28
Automatic annotation information flow
  • Get information necessary to assign proteins to
    groups eg using InterPro or other biological or
    family information- store in RuleBase
  • Group proteins in SWISS-PROT by these conditions
  • Extract common annotation shared by all these
    proteins- store in RuleBase
  • Group unannotated sequences by the conditions
  • Transfer common annotation flagged with evidence
    tags
  • Note can add taxonomic constraints

29
Extract Reference Entries
  • Use XDB to extract entries from standard database
  • ExamplePfamPF00509 HemagglutininHEMA_IAVI7/P03
    435HEMA_IANT6/P03436HEMA_IAAIC/P03437HEMA_IAX31
    /P03438HEMA_IAME2/P03439HEMA_IAEN7/P03440HEMA_I
    ABAN/P03441HEMA_IADU3/P03442HEMA_IADA1/P03443HE
    MA_IADMA/P03444HEMA_IADM1/P03445HEMA_IADA2/P0344
    6HEMA_IASH5/P03447

Pfam
TrEMBL
SWISS-PROT
30
Extract Common Annotation
  • 132 entries read131 ID HEMA_XXXXX125 DE
    HEMAGGLUTININ PRECURSOR. 6 DE
    HEMAGGLUTININ.131 GN HA130 CC -!- FUNCTION
    HEMAGGLUTININ IS RESPONSIBLE FOR ATTACHING
    THE130 CC VIRUS TO CELL RECEPTORS AND FOR
    INITIATING INFECTION.125 CC -!- SUBUNIT
    HOMOTRIMER. EACH OF THE MONOMER IS FORMED BY
    TWO125 CC CHAINS (HA1 AND HA2) LINKED BY A
    DISULFIDE BOND. 75 DR HSSP P03437 1HGD. 31
    DR HSSP P03437 1DLH.131 KW HEMAGGLUTININ
    GLYCOPROTEIN ENVELOPE PROTEIN102 KW SIGNAL
    1 KW COAT PROTEIN POLYPROTEIN
    3D-STRUCTURE130 FT CHAIN
    HA1 CHAIN.107 FT CHAIN
    HA2 CHAIN.102 FT SIGNAL

31
Store Common Annotation
  • Store the used conditions and the extracted
    common annotation in a separate database

XDB
TrEMBL
SWISS-PROT
RuleBase
32
RULES
  • Rules describe
  • the content of the annotation to be transferred
    (ACTIONS),
  • the CONDITIONS which the target TrEMBL entry
    must fulfill in order to allow transfer of the
    annotation.
  • Rules uniquely describe or delineate a set of
    SWISS-PROT entries.
  • The common annotation in these entries is
    transferred to TrEMBL.

33
// RULE RU000482 DATE 2001-01-11 USER
OPSWFL PACK PROSITE ?PSAC PS00449 ?EMOT
PS00449 !ECNO 3.6.1.34 !SPDE ATP synthase A
chain !CCFU KEY COMPONENT OF THE PROTON
CHANNEL IT MAY PLAY A DIRECT ROLE IN THE
TRANSLOCATION OF PROTONS ACROSS THE MEMBRANE (BY
SIMILARITY) !CCSU F-TYPE ATPASES HAVE 2
COMPONENTS, CF(1) - THE CATALYTIC CORE - AND
CF(0) - THE MEMBRANE PROTON CHANNEL. CF(1) HAS
FIVE SUBUNITS ALPHA(3), BETA(3), GAMM A(1),
DELTA(1), EPSILON(1). CF(0) HAS THREE MAIN
SUBUNITS A, B AND C (BY SIMILARITY) !CCLO
INTEGRAL MEMBRANE PROTEIN (By Similarity) !CCSI
TO THE ATPASE A CHAIN FAMILY !SPKW CF(0) !SPKW
Hydrogen ion transport !SPKW Transmembrane //
ACTIONS

CONDITIONS
34
Add Annotation to Target
  • Use conditions to extract entries from TrEMBL
  • Add common annotation to the entries

XDB
TrEMBL
SWISS-PROT
RuleBase
35
Automatic annotation using multiple dbs
  • Extract conditions from XDB
  • Group SWISS-PROT by conditions
  • Extract common annotation
  • Group TrEMBL by conditions
  • Add common annotation to TrEMBL

ENZYME
Pfam
INTERPRO
PROSITE
TrEMBL
SWISS-PROT
RuleBase
36
Using tree structure of InterPro
37
RU000652 with additional condition connected by
AND
// RULE RU000652 DATE 2001-01-11 USER
OPSWFL PACK PROSITE ?IPRO IPR002379 ?PSAC
PS00605 ?EMOT PS00605 !SPDE ATP synthase C
chain (Lipid-binding protein) (Subunit C) !ECNO
3.6.1.34 !CCSU F-TYPE ATPASES HAVE 2
COMPONENTS, CF(1) - THE CATALYTIC CORE - AND
CF(0) - THE MEMBRANE PROTON CHANNEL. CF(1) HAS
FIVE SUBUNITS ALPHA(3), BETA(3), GAMMA(1),
DELTA(1), EPSILON(1). CF(0) HAS THREE MAIN
SUBUNITS A, B AND C (By Similarity) !CCSI TO
THE ATPASE C CHAIN FAMILY !SPKW CF(0) !SPKW
Hydrogen ion transport !SPKW Lipid-binding !SPKW
Transmembrane //
Additional condition (parent signature)
38
Condition types
  • Signature hits
  • - Prosite, Prints, Pfam, Prodom
  • Taxonomy
  • - Broad groups like
  • Archaea
  • Bacteriophage
  • Eukaryota
  • Prokaryota
  • Eukaryotic viruses
  • - more specific such as species
  • Organelle
  • Conditions
  • Negated conditions

39
Rule-building
  • Grouping and extraction of common annotation
  • - semi automated but involves manual data-mining
  • assisted by perl/shell scripts.
  • Algorithmic data-mining
  • - fully automated.
  • - fast.
  • - exhaustive exploration of condition-set/annota
    tion
  • search-space .
  • - non-biological, validity of rules being
    assessed
  • by comparison with semi-manual
    approach.

40
Advantages of this method
  • Uses reliable ref database, prevents propagation
    of incorrect annotation
  • Using common annotation of multiple entries,
    lower over-prediction than from best hit of BLAST
  • Can standardize annotation and nomenclature of
    target sequences, since reference is standardized
  • Can have different levels of common annotation
    from different levels of family hierarchy
  • Independent of multi-domain organisation
  • Evidence tags allow for easy tracking and updating

41
Pitfalls of automatic functional analysis
  • Multifunctional proteins- genome projects often
    assign single function, info is lost in homology
    search
  • Hypothetical proteins (40 oRFs unknown), and
    poorly or even wrongly annotated proteins
  • No coverage of position-specific annotation eg
    active sites
  • Current methods provide only a phrase describing
    some properties of the unknown protein
  • It is important to have evidence for all
    annotation added

42
EVIDENCE TAGS
43
(No Transcript)
44
Predicting function from non-homology
  • Look at position of genes relative to others,
    compare with other organisms
  • Can still build up rules from annotated sequences
    using information you have on other features like
    fold, physical properties etc.
  • Use physical properties and known attributes

45
Protein functions from regions
  • Active sites- short, highly conserved regions
  • Loops- charged residues and variable sequence
  • Interior of protein- conservation of charged
    amino acids

46
Protein functions from specific residues
  • Polar (C,D,E,H,K,N,Q,R,S,T) - active sites
  • Aromatic (F,H,W,Y) - protein ligand- binding
    sites
  • Zn-coord (C,D,E,H,N,Q) - active site, zinc
    finger
  • Ca2-coord (D,E,N,Q) - ligand-binding site
  • Mg/Mn-coord (D,E,N,S,R,T) - Mg2 or Mn2
    catalysis, ligand binding
  • Ph-bind (H,K,R,S,T) - phosphate and sulphate
    binding
  • C disulphide-rich, metallo- thionein, zinc
    fingers
  • DE acidic proteins (unknown)
  • G collagens
  • H histidine-rich glycoprotein
  • KR nuclear proteins, nuclear localisation
  • P collagen, filaments
  • SR RNA binding motifs
  • ST mucins

47
Supplement annotation with Xrefs to other
databases
  • DDBJ/EMBL/GenBank Nucleotide Sequence Database
  • PDB
  • Genomic databases (FlyBase, MGD, SGD)
  • 2D-Gel databases (ECO2DBASE, SWISS-2DPAGE,
    Aarhus/Ghent, YEPD, Harefield), Gene expression
    data
  • Specialized collections (OMIM, InterPro, PROSITE,
    PRINTS, PFAM, ProDom, SMART, ENZYME, GPCRDB,
    Transfac, HSSP)

48
Approaches to functional annotation
  • Automatic annotation (sequence homology, rules,
    transfer info from pdb)
  • Automatic classification (pattern databases,
    clustering, structure)
  • Automatic characterisation (functional databases)
  • Context information (comparitive genome analysis,
    metabolic pathway databases)
  • Experimental results (2D gels, microarrays)
  • Full manual annotation (SWISS-PROT style)

49
AUTOMATIC CLASSIFICATION
Annotation can by using Clustering methods eg
CluSTR (EBI), and pattern searches (InterPro
etc)- classification of proteins into different
families
50
(No Transcript)
51
(No Transcript)
52
AUTOMATIC CHARACTERIZATION- FUNCTIONAL ANNOTATION
SCHEMES
  • First attempt Riley classification of E.coli
  • Genome sequencing projects driving force
  • Need standardised system and vocabulary
  • Functional schemes normally hierarchies of
    different levels of generalisation

53
Databases for Functional Information
  • KEGG -Kyoto encyclopedia of genes and genomes
  • (http//www.genome.ad.jp/kegg/)
  • Links genome information (GENES database) to high
    order functional information stored in PATHWAY
    database.
  • Also has LIGAND database for chemical compounds,
    molecules and reactions.
  • PEDANT -Protein Extraction, Description and
    Analysis Tool
  • (http//pedant.gsf.de/)
  • Annotation for complete and incomplete genomes
    eg. List of ORFs, EC numbers, functional
    categories, list seqs with homologs, gene
    clusters, domain hits, TM, structure links,
    search facility for sequences etc
  • WIT What is there
  • ( http//www.cme.msu.edu/WIT)
  • Database of metabolic pathways, can text search
    for ORFs, pathways, enzymes

54
Databases for Functional Information (2)
  • COG -Clusters of Orthologous Groups
  • (http//www.ncbi.nlm.nih.gov/COG)
  • Phylogenetic classification of proteins encoded
    in complete genomes.
  • Contains 2791 COGs including 30 genomes.
  • COGs thought to contain orthologous proteins,
    classified into broad functional categories
    (transciption, replication, cell division).
  • COGNITOR assigns proteins to COGs based on
    best-hit, divides multi-domain proteins
  • Can compare results with complete genomes, look
    for missing functions
  • GO Gene Ontology
  • (http//www.geneontology.org)
  • Standard vocabulary first used for mouse, fly and
    yeast
  • Three ontologies molecular function, biological
    process and cellular component

55
Databases for Functional Information (3)
  • MIPSMYGD FunCat Functional catalogue (yeast)
    http//www.mips.biochem.mpg.de/proj/yeast
  • EcoCyc -Encyclopedia of E. coli Genes and
    Metabolism http//ecocyc.doubletwist.com/ecocyc/e
    cocyc.html
  • Enzyme database http//wwwexpasy.ch/sprot/enzym
    e.html
  • TIGR Gene identification list http//www.tigr.org
    /tdb/mdb/mdb.html
  • All schemes have different depths, breadths and
    resolutions
  • Schemes need to be applicable to all organisms,
    standardized for comparisons and permit multiple
    assignments

56
Assignment of function
  • Use a combination of databases, especially those
    with standardised functional information
  • Search function databases with sequences to find
    matches -assign function eg PENDANT, PIR
    superfamilies, COGs, GO (via InterPro)

57
FUNCTIONAL CLASSIFICATION USING INTERPRO
  • InterPro classification with 3-4 letter codes
  • Mapping of InterPro entries to GO
  • GO- Gene Ontology (SGD, FB MGD) universal
    ontology for
  • molecular function
  • biological process
  • cellular component

58
Classification of IPRs
CGD Cell cycle/growth/death -CGDc cell
cycle/division -CGDg cell growth/development -CGDd
cell death CYS Cytoskeletal/structural -CYSc
cytoskeletal -CYSs structural -CYSv virus
coat/capsid protein DPT Defense/pathogenesis/tox
in DRG DNA/RNA-binding/regulation DRM DNA/RNA
metabolism -DRMr DNA repair/recombination -DRMp
DNA replication -DRMm DNA/RNA modification -DRMt
transcription/translation -DRMb ribosomal
protein
MET Metabolism -METs substrate metabolism
-METe electron transfer -METa amino acid
metabolism -METn nucleic acid metabolism
-METm metal binding proteins OTH Other
functions -OTHm cell motility -OTHt
transposition -OTHa cell adhesion -OTHg
miscellaneous functions -OTHh hormones -OTHi
immune-response proteins -OTHf multifunctional
proteins -OTHo multifunctional domains PFD
Protein folding degradation -PFDc chaperone
-PFDp protease/endopeptidase -PFDi
protease inhibitor
PRG Protein-binding/other regulation -PRGg
GPCRs -PRGr other receptors -PRGo other
regulation STD Signal transduction -STDk
sig transduction kinases -STDp sig transduction
phosphatases -STDr sig transduction response
reg -STDs sig transduction sensors -STDc
cell signalling TRS Transport and
secretion -TRSt transport (subtrates) -TRSi
transport (ions) -TRSs secretion -TRSr
carrier proteins UNK Unknown function
59
(No Transcript)
60
(No Transcript)
61
Pie charts of whole proteome analysis of 4
organisms
62
Distribution of protein functions
63
GENOME ANNOTATION TOOLS
  • Oakridge Genome Annotation Channel
    (http//compbio.ornl.gov/channel/)
  • ENSEMBL (http//ensembl.ebi.ac.uk)
  • Artemis (http//www.sanger.ac.uk/Software/Artemis)
    Sequence viewer and annotation tool
  • GeneQuiz (http//www.sander.ebi.ac.uk/genequiz/)
    System for automated annotation of sequences, web
    access required
  • Genome Annotation Assessment Project (GASP1)
    (http//www.fruitfly.org/GASP1)

64
PEDANT SYSTEM
Layer 1 bioinformatics tools
Databases for searching
PSI-BLAST IMPALA PREDATOR CLUSTALW TMAP
SIGNALP SEG PROSEARCH COILS HMMER
MIPS PROSITE BLOCKS PIR COGS
parser of results
Layer 2 database to store information -MySQL
Manual annotation tool
Layer 3 user interface to display results
Programs written in Perl5 and some in C
-portable. Processing of one sequence takes about
3 minutes
65
Summary of protein sequence annotation
  • Mask compositionally-biased and coiled-coil
    regions
  • Identify transmembrane regions, signal peptides,
    GPI anchors
  • Predict secondary structure
  • Look for known domains from protein pattern
    databases
  • Search sequence database for similar sequences
  • If no or few results search with subsequences, do
    iterative searches
  • Functional annotation consider function of each
    domain present, annotation from database
    homologs, function from hits with 3D structure

66
SUMMARY OF ANNOTATION PIPELINE
NB look out for multi-domain proteins, put into
genome context
Supplement with manual curation and use evidence
tags
67
LIMITS OF PROTEIN SEQUENCE ANALYSIS
  • Predicting function from sequence requires
    another sequence to be mapped to a function many
    hypothetical proteins in db and UPFs
  • If sequence homologues are found, may not be
    functional homologues -qualitative rather than
    quantitative process
  • - orthologues may have different functions
  • -enzyme homologues may be inactive
  • -equivalent functions may use different genes,
    not orthologue
  • Analogy can infer molecular function, but not
    necessarily cellular function

68
LIMITS OF PROTEIN SEQUENCE ANALYSIS (2)
  • Databases are biased in sequence and aa
    composition and search is dependent on size
  • If no homology found- limited amount of
    information can be inferred
  • Incorrect annotation can be propagated when
    similarity is over part on sequence not used in
    annotation
  • No answers to tissue-specificity, binding of
    ligands, relationship between genotype and
    phenotype

69
LIMITS OF PROTEIN SEQUENCE ANALYSIS (3)
  • Need additional information from experiments, eg
    can predict glycosylation sites, but not kind of
    sugar attached
  • Problem with multidomain proteins (assign
    orthology on basis of domains or domain
    composition of whole protein?) -check also known
    domain architectures and their taxonomic
    limitations

70
Using different approaches to functional
annotation Status for SPTR
  • Automatic annotation (RuleBase) 20 of all
    protein sequences/20 of all new sequences
  • Automatic classification (InterPro, CluSTr,
    Structure) 60 of all protein sequences/60 of
    all new sequences
  • Automatic characterisation (GO) 40 of all
    protein sequences/40 of all new sequences
  • Full annotation (SWISS-PROT style) 20 of all
    protein sequences/5 of all new sequences

71
Using different approaches to functional
annotation Future for SPTR
  • Automatic annotation (RuleBase) 50 of all
    protein sequences in 2004
  • Automatic classification (InterPro, CluSTr,
    Structure) 90 of all protein sequences in 2004
  • Automatic characterisation (GO) 70 of all
    protein sequences in 2004
  • Full annotation (SWISS-PROT style) 10 of all
    protein sequences in 2004

72
IMPORTANT TO NOTE
  • DONT COMPLETELY TRUST COMPUTER RESULTS
  • CHECK LITERATURE
  • CONFIRM WITH WETLAB WORK- mutational analysis
    gives valuable info about function
  • COMPROMISE BETWEEN OVER AND UNDER-PREDICTIONS
    -overpredictions can be checked by curators,
    easier to delete than find missing info.
Write a Comment
User Comments (0)
About PowerShow.com