Roadmap - PowerPoint PPT Presentation

1 / 26

About This Presentation

Title:

Roadmap

Description:

Sequence alignments Introduction What is an alignment? ... Retrovirus had acquired the gene from the host cell as some kind of genetic ... Bioinformatics Author: – PowerPoint PPT presentation

Number of Views:87

Avg rating:3.0/5.0

Slides: 27

Provided by: duan85

Learn more at: https://www.cs.uakron.edu

Category:

more less

Transcript and Presenter's Notes

Title: Roadmap

1
Roadmap

The topics
basic concepts of molecular biology
more on Perl
overview of the field
biological databases and database searching
sequence alignments
phylogenetics
structure prediction
microarray data analysis

2
Sequence alignments

Introduction
What is an alignment?
Why do alignments?
A bit of history
Dot matrix comparison
Scoring alignments
Alignment methods
Significance of alignments

3
What is Sequence alignment

Sequence alignment is an arrangement of two or
more sequences, highlighting their similarity.

4
Why do alignments?

Sequence Alignment is useful for discovering
structural, functional and evolutional
information in biological sequences.

5
Over time, genes accumulate mutations

Environmental factors
Radiation
Oxidation
Mistakes in replication/repair
Deletions, Duplications
Insertions
Inversions
Point mutations

6
Comparing two sequences

Point mutations, easyACGTCTGATACGCCGTATAGTCTATCT
ACGTCTGATTCGCCCTATCGTCTATCT
Insertions/deletions, must alignACGTCTGATACGCCGT
ATAGTCTATCTCTGATTCGCATCGTCTATCT

ACGTCTGATACGCCGTATAGTCTATCT----CTGATTCGC---ATCGTC
TATCT
7
Sequence Alignment

Doolittle RF, Hunkapiller MW, Hood LE,
Devare SG, Robbins KC, Aaronson SA,
Antoniades HN. Science 221275-277, 1983.
A sequence for platelet derived
growth factor (PDGF) from mammalian cells was
virtually identical to the sequence for the
retrovirus encoded oncogene known as v-sis (gene
causing cancer in animals).
Retrovirus had acquired the gene from the host
cell as some kind of genetic exchange event and
then had produced a mutant that could alter the
function of the normal protein when it infected
another animal.

8
Dot Matrix Comparison

A T C A G A G G T C T G
B T C A G A G C T G

C
T
G
T
G
G
A
G
A
C
T
X
X
X
T
X
X
C
X
X
A
X
X
X
X
G
X
X
A
X
X
X
X
G
X
X
C
X
X
X
T
X
X
X
X
G
9
Interpretation of dot matrix

Regions of similarity appear as diagonal runs of
dots
Reverse diagonals (perpendicular to diagonal)
indicate inversions
Can link or "join" separate diagonals to form
alignment with "gaps"

10
More on Dot Matrix

Improving detection of matching regions by
filtering
using sliding window to compare the two
sequences. For example, print a dot at a matrix
position only if
7 out of the next 11 positions in the sequence
are identical
Similarity score of the next 11 positions in the
sequence is greater than 5.

11
Sequence repeats

Many sequences contains repetitive regions.

a retrovirus vector sequence against itself using
a window size of 9 and mismatch limit of
2 (http//arbl.cvmbs.colostate.edu/molkit/dnadot/b
kg.html)
12
More on Dot Matrix

Dot matrix graphically presents regions of
identity or similarity between two sequences
The use of windows and thresholds can reduce
noise in dot matrix
Inversions and duplications have unique
signatures in dot matrix

13
Software

Dotlet (java applet)
www.ch.embnet.org
Dnadot
arbl.cvmbs.colostate.edu/molkit/dnadot/
Dotter
www.cgr.ki.se/cgr/groups/sonnhammer/Dotter.html
Dottup
www.emboss.org

14
How to measure the similarity

Basically three kinds of changes can occur at any
given position within a sequence
Mutation
Insertion
Deletion
Insertion and deletion have been found to occur
in nature at a significantly lower frequency than
mutations.

15
Scoring Matrices for Aligning DNA Sequences

Transition --- substitutions in which a purine
(A/G) is replaced by another purine (A/G) or a
pyrimadine (C/T) is replaced by another
pyrimadine (C/T).
Transversions ---
(A/G) ? (C/T)

16
Scoring a sequence alignment

Match score 1
Mismatch score 0
Gap penalty 1
ACGTCTGATACGCCGTATAGTCTATCT
----CTGATTCGC---ATCGTCTATCT
Matches 18 (1)
Mismatches 2 0
Gaps 7 ( 1)

Score 11
17
Gap opening and extension penalties

We want to find alignments that are
evolutionarily likely.
Which of the following alignments seems more
likely to you?
ACGTCTGATACGCCGTATAGTCTATCTACGTCTGAT-------ATAGT
CTATCTACGTCTGATACGCCGTATAGTCTATCTAC-T-TGA--CG-C
GT-TA-TCTATCT
We can achieve this by penalizing more for a new
gap, than for extending an existing gap

?
?
18
Scoring a sequence alignment

Match/mismatch score 1/0
Open/extension penalty 2/1ACGTCTGATACGCCGTATAG
TCTATCT ----CTGATTCGC-
--ATCGTCTATCT
Matches 18 (1)
Mismatches 2 0
Open 2 (2)
Extension 5 (1)

Score 9
19
Amino Acid Substitution Matrices

PAM - point accepted mutation based on global
alignment evolutionary model
BLOSUM - block substitutions based on local
alignments similarity among conserved sequences

20
Part of PAM 250 Matrix
C S T P A G
C 12
S 0 2
T -2 1 3
P -3 1 0 6
A -2 1 1 1 2
G -3 1 0 -1 1 5
21
PAM matrices

PAM 1 Matrix reflects an amount of evolution
producing on average one mutation per hundred
amino acids (1 unit evolution).
PAM 250 --- 250 unit evolution

22
Limitations of PAM Matrices

Constructed based on the phylogenetic
relationships prior to scoring mutations
Difficulty of determining ancestral relationships
among sequences
Based on a small set of closely related proteins

23
BLOSUM Matrices

Based on the observed amino acid substitutions in
a large set of 2000 conserved amino acid
patterns (blocks). The blocks are found in a
database of protein sequences representing more
than 500 families of related proteins and act as
signatures of these protein families.
The matrices are measured on the multiple
alignment of the blocks.
The entries of the matrices are computed based on
the same principle used in PAM -- log(odds
ratio).

24
Part of BLOSUM 62 Matrix

BLOSUM62 was measured on pairs of sequences with
an average of 62 identical amino acids.

C S T P A G
C 9
S -1 4
T -1 1 5
P -3 -1 -1 7
A 0 1 0 -1 4
G -3 0 -2 -2 0 6
25
PAM vs. BLOSUM

PAM
Based on mutational model of evolution (Markov
process)
PAM1 is based on sequences of 85 similarity
Designed to track the evolutionary origins
BLOSUM
Based on the multiple alignment of blocks
Good to be used to compare distant sequences
Designed to find proteins conserved domains

26
Gap Penalty

Optimal penalties vary from sequence to sequence,
and finding the most adequate value is a matter
of empirical trial and error.
When compare distantly related sequences, a high
gap-opening penalty and a very low gap-extension
penalty often give better results
When compare closely related sequences, gaps
should be penalized on both a gap-opening and
gap-extension

Write a Comment

User Comments (0)