Title: BLAST Sequence Searching in Registry
1BLAST Sequence Searching in Registry
- Soichi Tokizane
- November 2002
2You will learn
- How sequences are represented in the Registry
file today - How to use BLAST for similarity searching
- Techniques for finding references to BLAST results
3Sequence Information from CAS
4CAS creates the Registry database
CAS Registry growth since 1965
Substances Registered (millions)
01
5CA has covered biochemistry journals and patents
since 1907
Oxidizing enzymes - (III) Specific nature of
tyrosinase and its action on products of
disintegration of protein compds. Arch.
Sci. phys. nat. gen. 24 1907
6Today, the CA database contains a very complete
bioscience collection
It covers
- Journals and patents from
- More than 3,000 bioscience titles
- Patents from 33 countries plus EP and WO
- Over 500 books and over 300 book series
- Conference proceedings
- Dissertations
7Over 40 of the 21.5 million bibliographic
records in CA cover bioscience information
8Biomolecules (sequences) are a major substance
class in REGISTRY
36 million substances
9Virtually all types of sequences are covered in
Registry
- Sequences from earlier literature
- Novel nucleic acid primers and probes
- Protein sequences deduced from gene translation
and ESTs - Sequences with uncommon or non-natural residues
- Chemically modified sequences
- Fusion proteins
- Genetically engineered sequences
- Protein nucleic acids (PNAs)
10BLAST Sequence Similarity Searching
11Registry offers several sequence search techniques
- BLAST similarity (homology) searching
- similarity searching is the retrieval of sequence
matches based on identity, conservation, and gaps - Sequence code match exact, family, motif,
pattern - Sequence name search
12BLAST is a similarity matching algorithm
- BLAST stands for Basic Local Alignment Search
Tool - Produced and offered by the U.S. National Center
of Biotechnology Information (NCBI) - Designed to quickly compare nucleic and amino
acid sequences against desired databases
13Search Application
Find patent references for sequences similar to
the following recombinant human collagen. Conduct
a comprehensive search in Registry on STN.
MRAWIFFLLCLAGRALAAPLADYKDDDDKP GYLGGFLLVLHSQTDQEP
TCPLGMPRLWTG YSLLYLEGQEKAHNQDLGLAGSCLPVFSTL
HQVCHYAQRNDRSYWLASAAPLPRAWIFF MMPLSEEAIRPYVSRCAVC
EAPAQAVAVHS QDQSIPPCPQTWRSLWIGYSFLMHTGAGDQ
GGGQALMSPRAAPFLECQGRQGTLADY CHFFANKYSFWLTTVKADLQ
FSSAPAPDTL KESQAISRCQVCVKYS
14CAS Registry BLAST via STN on the Web is easy to
use
- 1. Install sequence plug-in
- 2. Conduct Registry BLAST similarity search
- 3. Search selected BLAST answers in STN to
get the literature references
15BLAST is available via STN on the Web
- A plug-in must be downloaded and installed before
using the BLAST module - It is a one-time only requirement
- The plug-in is free
- Clicking on Get Sequence Plug-in takes you to
easy-to-use Instructions
16Plug-in instruction page
17Conduct Registry BLAST Similarity Search
18Follow these steps for Registry BLAST searching
- Launch CAS Registry BLAST
- Submit sequence query
- Examine results and return to STN
- Continue searching in STN on the Web
19Logon to STN on the Web and select the Sequence
Assistant
1.
2.
20Select from one of three STN online options
before launch
Click on Launch button
21The main and new search windows appear
22Submit sequence query
- In a new session, the only available option is
Similar Sequences - Fast BLAST is available after the first search
- Click on the Similar Sequences button to open the
Search by Sequence query page
Search by Sequence
23The Search by Sequence screen is easy to use
24Type in a result name
- Type desired name for sequence search
- Alpha or numeric
- Spaces and punctuation allowed
- STN will assign sequential number if you do not
name the search - The name can also be changed later in the Main
Menu
25Recall Sequence is useful for re-submitting the
same query with different settings
- The most recently searched sequence is stored in
a buffer that can be retrieved using this
function - This function is grayed out when you first begin
26Read from File allows you to upload directly from
a file
- The file can be
- A text file (e.g. .txt)
- In GCG or FASTA format
- An STN record (SQIDE display)
27The sequence query must be 1-letter code
- The sequence query can be
- Copied and pasted
- Read from File
- Typed directly
- a Recalled sequence
- The sequence length limit is 50,000 characters
28This screen is for inserting a sequence query
from file
29The BLAST program to be used is selected next
30Searches can be run on a subset of the Registry
File
- For proteins, the three options are
- The default is all CA sequences
Other options are available for nucleic acids,
such as include or exclude GenBank records.
31BLAST default settings are optimized
- Parameters can be modified
- Search sensitivity
- Low complexity filtering
- Maximum number of answers
- Show advanced options
32Advanced functions should only be modified with a
thorough understanding of BLAST principles
- Users are encouraged to contact bioinformatics
departments for details, advice, and
recommendations - Additional information is also available at the
NCBI Web page http//www.ncbi.nlm.nih.gov/
33The Main Window is for managing results
- The Main Window has columns for
- Assigned name
- Type of search
- Time created
- Status
- Results
- Reviewed status
34Results can be viewed once the search is complete
- The results are permanently stored on STN, until
deleted by the user - Old results can be reviewed when desired
- Up to 50 results sets can be stored
Highlight
Then view
35(No Transcript)
36Alignments can be viewed individually
37Alignments can be saved or printed
38The saved file has a summary of all the hits
and scores
39Select desired alignments for transfer to STN
- Check boxes
- Select by score category
- Select all
40Transfer RNs to STN
- Select Transfer RNs to STN
- Message indicates when the transfer is complete
- Log off the BLAST system -- Select Exit from File
menu or close browser
41Retrieve RNs from BLAST
- The Sequence Assistant page appears after you
exit BLAST - Select the Retrieve RNs from BLAST option
42Return to STN on the Web
- STN will indicate if session is logged off
- If so, log on to STN on the Web
- Select Sequence Assistant
- Retrieve RNs from BLAST
To obtain a transcript of your session, you must
log in again.
Back to the STN on the Web login page
43Continue STN Searching
44L-Numbers are created from the automatic transfer
45L-Numbers are used for reference searches
These search results can be optionally combined
with DGENE, with routine use of STNs multifile
search interaction.
46STN Express with Discover! 6.01is now available
for Sequence Searching
http//www.cas.org/ONLINE/STN/interact/express.htm
l
47CAS REGISTRY BLAST is now searchable from Express
48Transferring BLAST data into an STN session is
seamlessly integrated into the software
49A report merges an STN transcript and BLAST
alignment data
50A report merges an STN transcript and BLAST
alignment data
51CAS REGISTRY BLAST will offer enhancements that
are in great demand by customers
- BLAST Alerts
- 1000 answers (increased from 200)
- Searching on lt50 residues
- BLAST version 2.2.3 from NCBI
These BLAST enhancements are also available
through STN on the Web new plug-in required.
52Setting up and managing CAS REGISTRY BLAST alerts
is easy
53Searches can now be set to retrieve only
sequences that have 50 residues or less--a big
help for primers and drug targets
54Summary
55In conclusion CAS Registry BLAST is necessary
for comprehensive sequence searching
- The Registry file is a key resource for
biotechnology information - CAS Registry BLAST provides a powerful and easy
to use search engine - BLAST RNs can be searched using STN on the Web or
STN Express to get related patent and journal
references - Similarity searches in Registry can be combined
with results from DGENE
56The End