Presentazione di PowerPoint - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Presentazione di PowerPoint

Description:

To find known structural and functional domains in unknown proteins ... with different steric hindrance and chemico-physical properties that influence ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 36
Provided by: cerm
Category:

less

Transcript and Presenter's Notes

Title: Presentazione di PowerPoint


1
Why sequence comparison?
The sequence determines the properties of the
macromolecule
  • Sequence comparison is used
  • To discover structural, functional and
    evolutionary relationships
  • Similar sequence ? similar structure/function of
    the protein
  • To identify conserved patterns
  • To find known structural and functional domains
    in unknown proteins

A comparison may be the basis of further
experimental investigations
2
What is a sequence alignment?
Procedure of comparing two (pairwise) or more
(multiple) sequences by searching for a series of
individual characters that are in the same order
in the sequences
3
Examples comparing some strings
  • Comparing some names

What have we learnt? To compare the names
(strings) we have mentally searched for the
alignment which maximises the number of
identities ? if this number is high the names are
similar otherwise are different
What have we learnt? To decide the best alignment
among many ones we have to score the results
What have we learnt? In some cases, to find the
best alignment we have to put some gaps
What have we learnt? Gaps must be penalized
4
Comparing sequences
The algorithms which compare sequences do the
same operations we have done They try many
alignments and select that with the highest score
as the best. In some cases they put some gaps to
increase the quality of the alignment
A different score
Comparing amino/nucleic acids sequences is
different with respect to comparing names because
each letter is a molecule with different steric
hindrance and chemico-physical properties that
influence their relative replaceability in
evolution
5
An example of sequence comparison
An algorithm tries all the possible solutions
(there are lots of possible alignments) and
chooses that with the highest score
D A R I A E S K - A
The best alignment is
The score of the alignment is
(Ss m(as,bs))(gap insertion penalty gaps)(gap
extension penalty length of gaps)
Length of alignment
6
Scoring matrices
Scoring matrices reflect - Probabilities of
mutual substitutions - The probability of
occurrence of each residue
7
Protein Score Matrix
PAM240 (Percent Accept Mutation)
BLOSUM62 (BLOck SUbstitution Matrix)
The BLOSUM model is designed to find conserved
domains of proteins the BLOSUM statistics are
based on the BLOCKs library (http//bioinformatics
. weizmann.ac.il/blocks/process_blocks.html) a
collection of multiple alignments of protein
fragments without gaps. The numbers in the
matrix reflect the frequency of substitution of
one residue by another in the alignments with a
percentage of identity gt 62
The PAM model is designed to track evolutionary
origin of protein the numbers in the matrix
reflect the probability that a residue is
substituted by another, after 240 evolutionary
steps
GONNET250. Similar to PAM but are much more up to
date and are based on a far larger data set. They
appear to be more sensitive.
The higher the index, the higher the distance of
the matrix from the identity matrix
The lower the index, the higher the distance of
the matrix from the identity matrix
8
How good is my alignment?
The output parameters
Score the value calculated for the sequence
using the substitution matrix and the gap
penalties. Percent identity percent of exact
matching residues in the alignment. Percentage
of similarity percent of similar residues
aligned (depends on the definition of similarity
but is biologically more significative) Percentag
e of gaps percent of gaps present in the
alignment Expected value (E) probability that
a match with this score would be obtained
comparing two random sequences. NOTE different
systems use different forms of these statistics.
9
Multiple Sequence Alignment
  • Compare all sequences pairwise.
  • Perform cluster analysis on the pairwise data to
    generate a hierarchy for alignment (guide tree).
  • Build alignment step by step according to the
    guide tree. Build the multiple alignment by first
    aligning the most similar pair of sequences, then
    add another sequence or another pairwise
    alignments.

10
Steps in Multiple Alignment
(1) Pairwise alignment (prepare guide tree)
6 pairwise alignments then
cluster analysis (2) Multiple alignment following
the tree from (A) align pairs
align alignments - preserve gaps
11
  • We can deduce a pairwise alignment for each two
    sequences in the multiple alignment (projected
    pairwise alignment)
  • The projected pairwise alignment is NOT the best
    pairwise alignment for the two sequences.

Best Pairwise alignment
Projected Pairwise alignment
12
Isopenicillin N Synthase
  • Mononuclear iron proteins electron carrier
    proteins. Iron atoms are bound to amino acid side
    chains.
  • In IPNS the metal ion is coordinated by three
    protein ligands

13
IsoPenicillin N Synthase
  • IPNS is involved in biosynthesis of penicillin

H
M
e
S
N
N
H
2
M
e
N
O
C
O
O
H
O
2
O
C
O
O
H
H

2
F
e
A
s
c
o
r
b
a
t
e
A
C
V
M
e
2
H
O
S
N
N
H
2
2
M
e
N
O
C
O
O
H
C
O
O
H
O
I
s
o
p
e
n
i
c
i
l
l
i
n

N
14
Research IPNS
  • Goal Identify Fe2 binding residues.
  • Possible solutions
  • Empirical approach (Alanine walk)
  • Bioinformatic approach (comparing different IPNS
    sequences).

15
Step 1
  • Multiple alignment of known IPNS

16
PileUp - output
  • !!AA_MULTIPLE_ALIGNMENT 1.0
  • PileUp of _at_ipns.fil
  • Symbol comparison table GenRunDatablosum62.cmp
    CompCheck 1102
  • GapWeight 8
  • GapLengthWeight 2
  • ipns.msf MSF 338 Type P March 14, 2002
    0929 Check 7631 ..
  • Name IPNS_STRJU Len 338 Check 6344
    Weight 1.00
  • Name IPNS_STRCL Len 338 Check 4249
    Weight 1.00
  • Name IPNS_NOCLA Len 338 Check 7020
    Weight 1.00
  • Name IPNS_CEPAC Len 338 Check 18
    Weight 1.00
  • //
  • 1
    50
  • IPNS_STRJU MPILMPSAE VPTIDISPLS GDDAKAKQRV
    AQEINKAARG SGFFYASNHG
  • IPNS_STRCL MPVLMPSAH VPTIDISPLF GTDAAAKKRV
    AEEIHGACRG SGFFYATNHG
  • IPNS_NOCLA MKMPSAE VPTIDVSPLF GDDAQEKVRV
    GQEINKACRG SGFFYAANHG
  • IPNS_CEPAC MGSVPVPVAN VPRIDVSPLF GDDKEKKLEV
    ARAIDAASRD TGFFYAVNHG

17
MA bacteria and fungi
  • Multiple Sequence alignment of IPNS

Not enough variation
18
Step 2
  • Add more enzymes, similar to IPNS

19
Isopenicillin N Synthase
  • Alignment of IPNSs, hydroxylases and expandases
    (same biochemical pathway)

20
Isopenicillin N Synthase
  • New multiple alignment, narrowing down the
    possibilities

21
Simple multiple alignment
  • The known IPNS sequences are very similar.
  • Close enzymes sequences are also quite similar.
  • Not enough variability to categorize the active
    sites.
  • We need to obtain even more distant sequences.

22
Step 3
  • Using the multiple alignment for further searches

23
Consensus Sequence
  • We can deduce a consensus sequence from the
    multiple sequence alignment. The consensus
    sequence holds the most frequent character of the
    alignment at each column.

24
Profile
  • We can deduce a statistical model describing the
    multiple sequence alignment. A Profile holds
    statistical information about characters in
    alignment at each column.

25
Profile vs. Consensus
  • Consensus each position reflects the most common
    character found at a position.
  • Profile each position reflects the frequency of
    the character found at a position.

26
Profile vs. Consensus
  • The following multiple alignments will have the
    same consensus

27
Profile vs. Consensus
  • But have a different profile

28
ProfileMake and ProfileSearch
  • ProfileMake creates a profile position-specific
    scoring table.
  • The profile is constructed from a multiple
    sequence alignment.
  • profilemake alignment.msf -beg -end

29
ProfileSearch
  • ProfileSearch Searches for sequences in the
    database that match the profile.
  • Profilesearch profile.prf

30
Close enzymes
  • IPNS, Hydroxylase, Expandase
  • Ethylene forming enzyme (EFE, ACCO)
  • Hyoscyamine 6 hydroxylase
  • Flavanone-3-hydroxylases
  • Flavonol synthases
  • Anthocyanidin hydroxylases
  • Anthocyanidin synthases
  • Gibberellin A20 oxidases
  • Gibberellin 3b oxidases
  • Gibberellin 2b, 3b hydroxylase
  • Gibberellin 7-oxidase
  • Desacetoxyvindoline 4-hydroxylase
  • L-proline 3-hydroxylases
  • Prolyl 4-hydroxylases
  • Lysyl hydroxylases
  •  

31
Isopenicillin N Synthase
  • Common to these enzymes is their involvement in
    secondary metabolism, such as the production of
    penicillin and cephalosporin antibiotics in
    bacteria and fungi, gibberellins, alkaloids,
    ethylene, anthocyanidins and flavonoids in
    plants, and the modification of collagen.

The HXD(53-57)XH motif plays a role in binding
the iron in the active site.
32
Isopenicillin N Synthase
  • Experimental evidence supports the finding that
    His212, Asp214 and His268 are the endogenous
    ligands that bind Fe2 in IPNS.
  • Enzyme Relative Km kcat kcat/Km
  • Activity (mM) (min-1) (mM-1min-1)
  • Wild type 100 0.4 38.8 96.9
  • His48Ala 16 0.56 7.5 13.4
  • His63Ala 31 1.0 14.2 14.2
  • His114Ala 28 0.85 12.5 14.7
  • His124Ala 48 0.84 32.1 38.1
  • His135Ala 22 0.59 11.7 19.8
  • His212Ala lt0.007 n.d. n.d.
  • His268Ala lt0.003 n.d. n.d.
  •  
  • Asp14Ala 5 0.86 0.56 0.7
  • Asp113Ala 63 0.45 23.8 52.8
  • Asp131Ala 68 0.48 36.3 75.5
  • Asp203Ala 32 0.91 12.3 13.5
  • Asp214Ala lt0.004 n.d. n.d.

33
Searching Databases withMultiple Alignments
  • Pros
  • Using representatives of multiple sequence
    alignment data in database searches. Uses more
    information, resulting in higher sensitivity
  • Cons
  • Searches take longer and are often more difficult
    to interpret

34
Psi Blast
  • Psi (Position Specific Iterated) is an automatic
    profile-like search
  • The program first performs a gapped blast search
    of the database. The information of the
    significant alignments is then used to construct
    a position specific score matrix. This matrix
    replaces the query sequence in the next round of
    database searching
  • The program may be iterated until no new
    significant are found

35
Psi-Blast Output
  • Detected by Psi-Blast
  • IPNS
  • Deacetoxy cephalosporin C synthase (expandase)
  • Deacetyl cephalosporin C synthase (hydroxylase)
  • Ethylene Forming Enzyme (EFE, ACCO)
  • Hyoscyamine 6 hydroxylase
  • Flavanone-3-hydroxylases
  • Flavonol synthases
  • Anthocyanidin hydroxylases
  • Gibberellin A20 oxidases
  • Desacetoxyvindoline 4-hydroxylase
  • Not detected by Psi-Blast
  • (Prolyl 4-hydroxylases and Lysyl hydroxylases)
  •  
Write a Comment
User Comments (0)
About PowerShow.com