Title: BLAST
1BLAST A heuristic algorithm
Anjali Tiwari Pannaben Patel Pushkala Venkataraman
2(No Transcript)
3Basic Local Alignment Search Tool
BLAST
Rapid Searching of Protein nucleotide DBs
Seeking similar sequences
GenBank
nr
SwissProt
Database
PDB
PRF
PIR
nr non redundant database
4Program Query Database Search Level
Blastp Amino acid Amino acid Amino acid
Blastn Nucleotide Nucleotide Nucleotide
Blastx Nucleotide Amino acid Amino acid
Tblastn Amino acid Nucleotide Amino acid
Tblastx Nucleotide Nucleotide Amino acid
BLAST 3 STEP ALGORITHM
Compile Words Scan DB Extend
5Some definitions
Process of lining up 2 or more sequences to asses
similarity
Alignment
A 2020 substitution matrix for amino acids
BLOSUM62
Space introduced into alignment to compensate for
insertions/deletions in 1 sequence relative to
another
Gap
6Local Search Algorithms
Similarity Measures
Identities Conservative Replacements ve
Similarity Matrix - BLOSUM
Unlikely Replacements -ve
7General Concept of working of BLAST
1000s of sequences
Query Input
Calculate HSP
Calculate MSP
MSP Maximal Segment Pair HSP High Scoring Pair
Display output
8Key Idea BLAST1
Compile a list of high scoring words of length w
from query (w3 for proteins, 11 for nucleic
acids)
Step 1
Scan for word hits in the database of score
greater than threshold, T
Step 2
Extend word hit in both directions to find
High Scoring Pairs with scores greater than S
Step 3
9Example
Step -1
Query QQGPHUIQEGQQGKEEDPP Words of length 3 w
QQG, QGP, GPH, PHU, HUI Take first triple
QQG Make neighborhood words w QQG, QEG,
GQG Find high scoring triples Blosum(w, w) gt
T where T Threshold parameter Suppose Blosum
(QQG, QEG) 18 Blosum(QQG,GQG) 12 Blosum(QQG,
QQG) 16 T13 Choose QQG and QEG since Blosum
Value gt T value
10Step -2
Suppose Database Sequence PKLMMQQGKQEGM
Matching Word Pairs in DB sequence
11Step -3
Query QQGPHUIQEGQQGKEEDPP DB
Sequence PKLMMQQGKQEGM
Blosum(QQG, QQG) 16
QQGPHUIQEGQQGKEEDPP PKLMMQQGKQEGM
Blosum(QQGK, QQGK) 21
QQGPHUIQEGQQGKEEDPP PKLMMQQGKQEGM
Blosum(QQGKE, QQGKQ) 23
QQGPHUIQEGQQGKEEDPP PKLMMQQGKQEGM
Blosum(QQGKEE, QQGKQE) 28
QQGPHUIQEGQQGKEEDPP PKLMMQQGKQEGM
Blosum(QQGKEED, QQGKQEG) 27
12Extension to the right stops here because BLOSUM
value is beginning to decrease
- ADVANTAGES
- Faster than Dynamic Programming
- Removes low complexity regions
- Spends less time on uninteresting
- search
- Statistical significance of results can
- be obtained these are very good
- DISADVANTAGES
- Finds reports only local
- alignments
- Finds too many word hits per
- Sequence thus reducing speed
- Does not allow for gaps in sequence
New Models to combat disadvantages
BLAST2, PSI Blast
13BLAST2 Combination of 2 Hit Gapped
2 Hit Method - 3 Step method Step 1 and Step 2
as BLAST 1 Step 3 is where they differ BLAST
now looks for 2 words in a sequence instead of 1
while aligning. The 2 words are at a distance lt A
and are not overlapping. Typically A40
A
14Gapped Blast
- Gapped alignment is introduced to get an optimal
alignment - Two sequences
- Seq A ACGTA
- Seq B ACATA
- Normal alignment is
- ACGTA
- ACATA
-
But if a penalty of mismatch is larger than the
penalty of gap then the best optimal alignment is
as below. AC-GTA ACG-TA ACA-TA AC-ATA
15Gapped BLAST - Allows gaps to come while aligning
Query ATTGTCAAAGACTTGAGCTGATGCAT DB
GGCAGACATGACTGACAAGGGTATCG
ATTGTCAAAGACTTGAGCTGATGCAT
GGCAGACATGA CTGACAAGGGTATCG
Mismatch
Gap
16PSI BLAST- Position specific iterated BLAST.
Used for multiple alignments
Query Sequence
BLAST search of DB
Sequences with high scores collected
Multiple alignment profile made
New sequences added process iterated
DB searched with profile
17References
- Altschul, S.F., Gish, W., Miller, W., Myers, E.W.
Lipman, D.J. (1990) "Basic local alignment
search tool." Journal of Molecular Biology
215403-410. - Altschul, S.F.,Thomas L.M., Alejandro A.S,
Jinghui Z, Zheng Z, W. Miller David J.L. (1997)
Gapped BLAST and PSI-BLAST a new generation of
protein database search programs. Nucleic Acids
Research. - http//www.ncbi.nlm.nih.gov/
- http//bioinf.man.ac.uk/ember/prototype/
18References (Continued)
- http//www.psc.edu/biomed/training/tutorials/seque
nce/db/index.html - http//aracyc.stanford.edu/jshrager/jeff/mbcs/mat
ch.html - http//www.ime.usp.br/durham/cursos/ibi5032/pub/d
oc/allignmentTutorial.pdf - http//ibivu.cs.vu.nl/teaching/masters/seq_analysi
s/sa_lecture3.pdf