Title: Identifying sequences with
1Identifying sequences with
Speaker S. Gaj
BioInformatics Lunch Meeting
Date 04-03-2005
2Annotation
- Annotation
- Best possible description available for a given
sequence at the current time. - How to annotate?
- Combining
- Alignment Tools
- Databases
- Datamining (scripts)
Background
3Microarrays
4(No Transcript)
5(No Transcript)
6Part ISequence Alignment
7Introduction
- Global alignment
- Optimal alignment between two sequences
containing as many characters of the query as
possible. - Ex predicting evolutionary relationship
between genes, - Local alignment
- Optimal specific alignment between two sequences
identifying identical area(s) - Ex Identifying key molecular structures
(S-bonds, a- helices, )
Background
8Global vs Local Alignment
Global Alignment
Score -42 at (seq1)1..90 (seq2)1..90 1
MA-----STVTSCLEPTEVFMDLWPEDHSNWQELSPLEPSDPLNPPTPPR
AAPSPVVPST
1 MSHGIQMSTIKKRRSTDEE
VFCLPIKGREIYEILVKIYQIENYNMECAPPAGASSVSVGA 56
EDYGGDFDFRVGFVEAGTAKSVTCTYSPVLNKVYC
61
TEAEPTEVFMDLWPEDHSNWQELSPLEPSD-----
Local Alignment
Score 148 at (seq1)10..36 (seq2)64..90
10 EPTEVFMDLWPEDHSNWQELSPLEPSD
64
EPTEVFMDLWPEDHSNWQELSPLEPSD
- Stop the alignment extension if it is not
profitable
9BLAST
- Basic Local Alignment Search Tool
- Aligning an unknown sequence (query) against all
sequences present in a chosen database based on a
score-value. - Aim
- Obtaining structural or functional information
on the unknown sequence.
Introduction
10(No Transcript)
11Programs
- Different BLAST programs available
- Parameters
- Maximum E-Value, Gap Opening Penalty (GOP), Gap
Extension Penalty (GEP), - Terms
- Query Sequence which will be aligned
- Subject Sequence present in database
- Hit Alignment result.
Nucleic Protein
Nucleic BlastN BlastX
Protein - BlastP
BLAST
12Substition Matrices What?
- Estimates the rate at which each possible residue
in a sequence changes to each other residue over
time. - For example, hydrophobic residue is more likely
to stay hydrophobic than not. - Each matrix is tailored to look for certain types
of sequences KNOW WHAT YOU ARE LOOKING FOR!
BLAST Matrices
13Substition Matrices Why?
- Determine likelihood of homology between two
sequences - Substitutions that are more likely should get a
higher score - Substitutions that are less likely should get a
lower score.
BLAST Matrices
14Matrices - PAM
- Point Accepted Mutation
- Mostly used in global amino acid alignments
- PAM1 represents 1 of change
- PAM250 (PAM1)250
- PAM1
- Applied for a time period over which we expect
1 of the amino acids to undergo accepted point
mutations within the species of interest.
BLAST Matrices
15BLOSUM
- Mostly used in local AA alignments
- Based on observed alignments, not predicted ones.
- BLOSUM 80, BLOSUM 62, BLOSUM 45
- Default BLOSUM 62
- Matrix calculated from comparisons of sequences
with no less than 62 divergence.
BLAST Matrices
16PAM vs BLOSUM
- Closely related
- High PAM
- Low BLOSUM
- Distantly related
- Low PAM
- High BLOSUM
BLAST Matrices
17BlastN Example
BLAST
18BlastN Example
BLAST
19Common BLAST problems
Clone seq
mRNA
Sequencing Error
BLAST
- Solution
- Low penalty for GOP and GEP 1
20Translation Problems
gtemblJ03801HSLSZ Human lysozyme mRNA, complete
cds with an Alu repeat in the 3' flank.
BLAST
L A L P S S Q H
E G S H C S G A
1
ctagcactctgacctagcagtcaacatgaaggctctcattgttctggggc
t...
21Translation Problems
gtemblJ03801HSLSZ Human lysozyme mRNA, complete
cds with an Alu repeat in the 3' flank.
3
S T L T Q S T
R L S L F W G
2
H S D L A V N M
K A L I V L G
BLAST
L A L P S S Q H
E G S H C S G A
1
0
ctagcactctgacctagcagtcaacatgaaggctctcattgttctggggc
t...
V L Q S L L E C
L Y F L H H A
-1
-2
C C K A F N N V
C I F Y I M H
-3
V A K P L I R M F
V F F T S C I
http//searchlauncher.bcm.tmc.edu/cgi-bin/seq-util
/sixframe.pl
22Common BLAST problems
intron
exon
Gene X
Translation
BLAST
full mRNA
Splicing
mRNA
23Common BLAST problems
mRNA
Clones derived from mRNA
BLAST
BlastX against protein sequence
3 possible hit-situations
24Common BLAST problems
? Aligns with protein in 1 of the 6 frames.
BLAST
? Part perfect alignment
25Questions?
End