Identifying sequences with - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Identifying sequences with

Description:

Optimal alignment between two sequences containing as many characters of the query as ... helices, ...) Background. Score: -42 at (seq1)[1..90] : (seq2)[1..90] ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 26
Provided by: BiGF5
Category:

less

Transcript and Presenter's Notes

Title: Identifying sequences with


1
Identifying sequences with
Speaker S. Gaj
BioInformatics Lunch Meeting
Date 04-03-2005
2
Annotation
  • Annotation
  • Best possible description available for a given
    sequence at the current time.
  • How to annotate?
  • Combining
  • Alignment Tools
  • Databases
  • Datamining (scripts)

Background
3
Microarrays
4
(No Transcript)
5
(No Transcript)
6
Part ISequence Alignment
7
Introduction
  • Global alignment
  • Optimal alignment between two sequences
    containing as many characters of the query as
    possible.
  • Ex predicting evolutionary relationship
    between genes,
  • Local alignment
  • Optimal specific alignment between two sequences
    identifying identical area(s)
  • Ex Identifying key molecular structures
    (S-bonds, a- helices, )

Background
8
Global vs Local Alignment
Global Alignment
Score -42 at (seq1)1..90 (seq2)1..90 1
MA-----STVTSCLEPTEVFMDLWPEDHSNWQELSPLEPSDPLNPPTPPR
AAPSPVVPST
1 MSHGIQMSTIKKRRSTDEE
VFCLPIKGREIYEILVKIYQIENYNMECAPPAGASSVSVGA 56
EDYGGDFDFRVGFVEAGTAKSVTCTYSPVLNKVYC
61
TEAEPTEVFMDLWPEDHSNWQELSPLEPSD-----
  • Includes total sequence
  • The highest score

Local Alignment
Score 148 at (seq1)10..36 (seq2)64..90
10 EPTEVFMDLWPEDHSNWQELSPLEPSD
64
EPTEVFMDLWPEDHSNWQELSPLEPSD
  • The highest score
  • Stop the alignment extension if it is not
    profitable

9
BLAST
  • Basic Local Alignment Search Tool
  • Aligning an unknown sequence (query) against all
    sequences present in a chosen database based on a
    score-value.
  • Aim
  • Obtaining structural or functional information
    on the unknown sequence.

Introduction
10
(No Transcript)
11
Programs
  • Different BLAST programs available
  • Parameters
  • Maximum E-Value, Gap Opening Penalty (GOP), Gap
    Extension Penalty (GEP),
  • Terms
  • Query Sequence which will be aligned
  • Subject Sequence present in database
  • Hit Alignment result.

Nucleic Protein
Nucleic BlastN BlastX
Protein - BlastP
BLAST
12
Substition Matrices What?
  • Estimates the rate at which each possible residue
    in a sequence changes to each other residue over
    time.
  • For example, hydrophobic residue is more likely
    to stay hydrophobic than not.
  • Each matrix is tailored to look for certain types
    of sequences KNOW WHAT YOU ARE LOOKING FOR!

BLAST Matrices
13
Substition Matrices Why?
  1. Determine likelihood of homology between two
    sequences
  2. Substitutions that are more likely should get a
    higher score
  3. Substitutions that are less likely should get a
    lower score.

BLAST Matrices
14
Matrices - PAM
  • Point Accepted Mutation
  • Mostly used in global amino acid alignments
  • PAM1 represents 1 of change
  • PAM250 (PAM1)250
  • PAM1
  • Applied for a time period over which we expect
    1 of the amino acids to undergo accepted point
    mutations within the species of interest.

BLAST Matrices
15
BLOSUM
  • Mostly used in local AA alignments
  • Based on observed alignments, not predicted ones.
  • BLOSUM 80, BLOSUM 62, BLOSUM 45
  • Default BLOSUM 62
  • Matrix calculated from comparisons of sequences
    with no less than 62 divergence.

BLAST Matrices
16
PAM vs BLOSUM
  • Closely related
  • High PAM
  • Low BLOSUM
  • Distantly related
  • Low PAM
  • High BLOSUM

BLAST Matrices
17
BlastN Example
BLAST
18
BlastN Example
BLAST
19
Common BLAST problems
  • BlastN

Clone seq
mRNA
Sequencing Error
BLAST
  • Solution
  • Low penalty for GOP and GEP 1

20
Translation Problems
  • 6-Frame translation

gtemblJ03801HSLSZ Human lysozyme mRNA, complete
cds with an Alu repeat in the 3' flank.
BLAST
L A L P S S Q H
E G S H C S G A
1
ctagcactctgacctagcagtcaacatgaaggctctcattgttctggggc
t...
21
Translation Problems
  • 6-Frame translation

gtemblJ03801HSLSZ Human lysozyme mRNA, complete
cds with an Alu repeat in the 3' flank.
3
S T L T Q S T
R L S L F W G
2
H S D L A V N M
K A L I V L G
BLAST
L A L P S S Q H
E G S H C S G A
1
0
ctagcactctgacctagcagtcaacatgaaggctctcattgttctggggc
t...
V L Q S L L E C
L Y F L H H A
-1
-2
C C K A F N N V
C I F Y I M H
-3
V A K P L I R M F
V F F T S C I
http//searchlauncher.bcm.tmc.edu/cgi-bin/seq-util
/sixframe.pl
22
Common BLAST problems
intron
exon
Gene X
Translation
BLAST
full mRNA
Splicing
mRNA
23
Common BLAST problems
mRNA
Clones derived from mRNA
BLAST
BlastX against protein sequence
3 possible hit-situations
24
Common BLAST problems
  • ? Yields no protein hit

? Aligns with protein in 1 of the 6 frames.
BLAST
? Part perfect alignment
25
Questions?
End
Write a Comment
User Comments (0)
About PowerShow.com