Title: GenBank
1GenBank
- Nucleotide only sequence database
- Archival in nature
- Data shared nightly among three collaborating
databases - GenBank at NCBI
- DNA Database of Japan (DDBJ)
- EMBL at EBI
2The International Sequence Database Collaboration
Source NCBI
3NCBI site map A good place to find resources
http//www.ncbi.nlm.nih.gov/Sitemap/index.html
4GeneBank Release 131.0December 15 2003
- 30968418 Sequences
- 36553368485 Bases
- full release every two months
- incremental and cumulative updates daily
- available only through internet
ftp//ftp.ncbi.nih.gov/genbank/
5GenBank Record
- Header
- information that apply to
- the whole record
- Features
- annotations on the record
- Sequence
6GenBank Record
GeneBank Record
Header
modification date
Molecule Type
Locus Name
Sequence Length
Modification Date
Accession Number
Version Number
GenBank Division
7GeneBank Record
FEATURE
Link to Seq
8GenBank Record
Sequence
9Entrez
10Entrez
http//www.ncbi.nlm.nih.gov/gquery/gquery.fcgi
Select GenBank
11Find mRNA sequence for human epidermal growth
factor receptor
12Specify human as an organism
Click Preview/Index
Specify human by selecting Organisms from
All Fields drop-down menu
132
1
14Limit your search
Exclude all technology generated records
Select mRNA in the Molecule list
Select Refseq in the database list
15RefSeq
- Database of reference sequences
- Curated
- Non-redundant one record for each gene, or each
splice variant, from each organism represented - Each record is intended to present an
encapsulation of the current understanding of a
gene or protein, similar to a review article -
RefSeq FAQ
16Molecular databases
17Find Gene Name by searching LocusLink
http//www.ncbi.nlm.nih.gov/LocusLink/
Select organism
18LocusLink
19Find mRNA sequence for epidermal growth factor
receptor (EGFR)
Starts with gene name EGFR
- Limit search to
- Gene Name
- exclude all technology generated records
- Select mRNA as Molecule
- Select Refseq as source database
20Entrez Neighbors and Hard Links
Word weight
3-D Structure
3 -D Structure
VAST
Phylogeny
Protein sequences
BLAST
BLAST
Source NCBI
21SRS List of Public SRS Servers
22SRS List of Public SRS Servers
23SRS Tutorial
24http//srs.ebi.ac.uk
Database Information -which are present -when
indexed
25What is SRS?
- Central resource for molecular biology data
- Data retrieval system
- - more than 250 databanks have been indexed. More
than 35 SRS servers over the WWW - Data analysis applications server
- - 11 protein applications
- - 6 nucleic acid applications
- Uniform query interface on the web
26History of SRS
- 1990 - Main author Dr. Thure Etzold
- Development started in EMBL, Heidelberg
- 1997
- Moved to EBI in Cambridge. Development work was
supported by various grants amongst others from
the EMBnet. - 1998
- Etzold and his group join LionBiosciences
27Why SRS?
- Information retrieval
- Easy way to retrieve information from sequence
and sequence-related databases - Possibility to search for multiple words/other
criteria - Linkage between different databases
- E.g. Find all primary structures with known
three-dimensional structure - ... and much more
28Philosophy of SRS
Original database file -plain text, html,
xml
29The Library Select Page
30SRS main toolbar tabs
- Top Page displays databases in different
database groups - Query displays either the standard or extended
query form - Results or the query manager maintains a
history of all the results obtained during a
session - Projects or the project manager maintains a
history of all queries and views used during a
session - Views allows a user to define a user specific
view for one or more databases - Databanks contains a list and some facts about
the databases available in the system
31Search terms in SRS
- SRS indexed fields can be searched using any of
the following - Single word search
- Multiple word phrases
- Numbers and dates
- Regular expressions
- Wildcards