Title: UniRef Sequence clusters
1UniProt The Universal Protein Resource
- The mission of UniProt is to provide the
scientific community with a comprehensive,
high-quality and freely accessible resource of
protein sequence and functional information. - UniProt provides four databases, each optimized
for different usesUniProtKB, UniRef, UniParc and
UniMES. - UniProt is produced by the UniProt Consortium
(SIB, EBI, PIR) -
Contact help_at_uniprot.org Web site
www.uniprot.org
UniProtKB - Protein sequence knowledgebase
UniProtKB gives access to publicly available
protein sequences. UniProtKB is composed of 2
sections
- UniProtKB/Swiss-Prot
- Manually annotated
- (Reviewed)
- Merge of all sequences derived from the same
gene and the same species low redundancy and
high accuracy of the protein sequence - Integration of biological data from
publications, external expertise, as well as
high-performance bioinformatic tools, etc.
high-quality manual annotation - Addition of cross-references to relevant
databases links to about 100 databases are
available central hub for biological data.
UniRef - Sequence clusters
- UniRef is useful for comprehensive BLAST
similarity searches by providing 3 collections of
sequence clusters (UniRef100, UniRef90 and
UniRef50) based on UniProtKB and selected UniParc
records. - One UniRef90 entry groups sequences that have
90 or more identity across species (database
reduction of 40) - The use of UniRef50 reduces the size of the
database of 65.
- UniProtKB/TrEMBL
- Computer annotated
- (Unreviewed)
- Merge of 100 identical sequences from the same
organism - Protein family and domain attribution
(InterPro) - Automated annotation.
UniParc - Sequence archive
- UniParc allows the tracking of a protein sequence
and of its integration into various databases. - UniParc gives access to archived non-redundant
protein sequences (records active or not) found
in publicly accessible databases (UniProtKB, PIR,
EMBL, Ensembl, IPI, PDB, RefSeq, FlyBase,
WormBase, Patent Offices). - Each entry contains a protein sequence, taxonomy
data and cross-references to source databases. - Use with caution also contains pseudogenes,
incorrect CDS prediction, etc.
EMBL/GenBank/DDBJ, Ensembl, VEGA, RefSeq, other
protein resources
UniMES Metagenomic and Environmental Sequences
Currently the database contains only data from
the Global Ocean Sampling Expedition (GOS).
UniMES is released in FASTA format together with
an UniMES matches to InterPro method file.
Swiss Institute of Bioinformatics
(SIB) European Bioinformatics Institute
(EMBL-EBI) Protein Information Resource (PIR)
UniProt is mainly supported by the National
Institutes of Health (NIH) grant 2 U01
HG02712-04. Additional support for the EBI's
involvement in UniProt comes from the European
Commission (EC)'s FELICS grant (021902RII3) and
from the NIH grant 1R01HGO2273-01.
UniProtKB/Swiss-Prot activities at the SIB are
supported by the Swiss Federal Government through
the Federal Office of Education and Science. PIR
activities are also supported by the NIH grants
and contracts HHSN266200400061C, NCI-caBIG, and
1R01GM080646-01, and the National Science
Foundation (NSF) grant IIS-0430743.