Pealkiri - PowerPoint PPT Presentation

About This Presentation
Title:

Pealkiri

Description:

Position Weight Matrices for Representing Signals in Sequences Triinu Tasa, Koke 04.02.05 – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 17
Provided by: Pee114
Category:

less

Transcript and Presenter's Notes

Title: Pealkiri


1
Position Weight Matrices for Representing Signals
in Sequences
Triinu Tasa, Koke 04.02.05
2
Definitions
  • Sequence, string ordered arrangement of letters
    'A', 'C', 'G', 'T'
  • Pattern simplified regular expression, alphabet
    'A', 'C', 'G', 'T', '.', where '.' - wild-card
    of length 1 ('A', 'C', 'G' or 'T')

Triinu Tasa, Koke 04.02.05
3
What is a weight matrix?
What is a weight matrix?
  • GATGAG
  • GATGAT
  • TGATAT
  • GATGAT
  • or
  • GTAGTAGTAGT

Triinu Tasa, Koke 04.02.05
4
What is a weight matrix?
Better
GATGAG GATGAT TGATAT
  • Alignment matrix C
  • A 0 2 1 0 3 0
  • C 0 0 0 0 0 0
  • G 2 1 0 2 0 1
  • T 1 0 2 1 0 2
  • Frequency matrix F
  • A 0 0.7 0.3 0 1 0
  • C 0 0 0 0 0 0
  • G 0.7 0.3 0 0.7 0 0.3
  • T 0.3 0 0.7 0.3 0 0.7

Triinu Tasa, Koke 04.02.05
5
Or weight matrix W
What is a weight matrix?
  • where
  • N number of sequences used
  • - a priori probability of letter i

Triinu Tasa, Koke 04.02.05
6
Importance matrix I
What is a weight matrix?
  • I(i, j)

A 0 1.4 0.3 0 3 0 C 0 0 0 0 0 0 G 1.4 0.3 0 1.4
0 0.3 T 0.3 0 1.4 0.3 0 1.4
Triinu Tasa, Koke 04.02.05
7
Applications
Applications - Clustering
  • Pattern clustering
  • 1. G.GATGAG.T 62/75 139/49
    223/26 R17.3026 BP1.12008e-37
  • 2. G.GATGAG 89/110 145/60
    244/50 R10.436 BP1.61764e-34
  • 3. GATGAG.T 124/148 152/70
    272/78 R7.36961 BP2.79148e-33
  • 4. TG.AAA.TTT 132/145 153/61
    279/84 R6.84578 BP1.83509e-32
  • 5. AAAATTTT 200/231 163/77
    2137/154 R4.69239 BP1.19109e-30
  • 6. TGAAAA.TTT 104/114 145/53
    259/61 R7.78277 BP3.86086e-29
  • 7. AAA.TTTT 343/537 179/145
    2264/392 R3.05349 BP5.66833e-29
  • 8. G.AAA.TTTT 135/156 151/62
    284/94 R6.19534 BP5.69933e-29
  • 9. TG.GATGAG 49/57 130/35
    219/22 R16.1117 BP9.35765e-28
  • 10. TG.AAA.TTTT 86/91 140/43
    246/48 R8.87311
    BP1.1124e-27
  • ...

Triinu Tasa, Koke 04.02.05
8
G.GATGAG.T
Applications - Clustering
  • GAGATGAGAT
  • GTGATGAGAT
  • GAGATGAGGT
  • ...
  • A -6.9 0.98 -6.9 1.38 -6.9 -6.9 1.38 -6.9 0.98 -6.
    9
  • C -6.9 -6.9 -6.9 -6.9 -6.9 -6.9 -6.9 -6.9 -6.9 -6.
    9
  • G 1.38 -6.9 1.38 -6.9 -6.9 1.38 -6.9 1.38 0.29 -6.
    9
  • T -6.9 0.29 -6.9 -6.9 1.38 -6.9 -6.9 -6.9 -6.9 1.3
    8

Triinu Tasa, Koke 04.02.05
9
Compare matrices with each other using the
dynamic programming approach
Applications - Clustering
  • where
  • A, B matrices
  • i, j - columns
  • If D(m,n) gt threshold gt matrices are different

Triinu Tasa, Koke 04.02.05
10
Applications - Clustering
  • G.GATGAG.T TG.AAA.TTT AAAATTTT
  • G.GATGAG TGAAAA.TTT AAA.TTTT
  • GATGAG.T TG.AAA.TTTT
  • We want to represent the clusters by
  • logos
  • We need to align the patterns first position
    the similar parts of the patterns above each
    other
  • G.GATGAG.T
  • G.GATGAG--
  • --GATGAG.T
  • or the logo will look like this

Triinu Tasa, Koke 04.02.05
11
Multiple Alignment
Applications Multiple alignment
  • Importance matrix I represents the aligned
    patterns.
  • Example
  • G.GATGAG.T
  • GATGAG.T
  • G.GATGAG
  • 1. Insert the first pattern into I ('.' gives
    0.25 to each)
  • A 0 0.25 0 1 0 0 1 0 0.25 0
  • C 0 0.25 0 0 0 0 0 0 0.25 0
  • G 1 0.25 1 0 0 1 0 1 0.25 0
  • T 0 0.25 0 0 1 0 0 0 0.25 1
  • 2. Align the second pattern with I using a
    dynamic programming approach

Triinu Tasa, Koke 04.02.05
12
Applications Multiple alignment
  • Dynamic programming matrix
  • G . G A T G
    A G . T
  • G 0.00 0.10 0.01 0.10 0.00 0.00
    0.10 0.00 0.10 0.01 0.00
  • A 0.00 0.00 0.11 0.00 0.20 0.00
    0.00 0.20 0.00 0.11 0.00
  • T 0.00 0.00 0.01 0.00 0.00 0.30
    0.00 0.00 0.00 0.01 0.21
  • G 0.00 0.10 0.01 0.11 0.00 0.00
    0.40 0.00 0.10 0.01 0.00
  • A 0.00 0.00 0.11 0.00 0.21 0.00
    0.00 0.50 0.00 0.11 0.00
  • G 0.00 0.10 0.01 0.21 0.00 0.00
    0.10 0.00 0.60 0.01 0.00
  • . 0.00 0.00 0.10 0.01 0.21 0.00
    0.00 0.10 0.00 0.60 0.01
  • T 0.00 0.00 0.01 0.00 0.00 0.31
    0.00 0.00 0.00 0.01 0.70
  • G.GATGAG.T
  • --GATGAG.T

Triinu Tasa, Koke 04.02.05
13
Applications Multiple alignment
  • 3. Add the pattern '--GATGAG.T' to I, if
    necessary add columns to the matrix.
  • 4. Repeat the procedure for every pattern.
  • Output
  • G.GATGAG.T
  • G.GATGAG--
  • --GATGAG.T
  • Why importance matrix?

Triinu Tasa, Koke 04.02.05
14
Applications Multiple alignment
  • Example
  • Pattern GATG
  • So far aligned
  • GATGATGTA-
  • - - - GATGTGG
  • We want w(G, 4) gt w(G, 1) gt w(G, 9)
  • Solution importance matrix

Triinu Tasa, Koke 04.02.05
15
Applications Weight matrix matching
  • Weight Matrix Matching
  • Purpose find the sequences that the weight
    matrix describes best in a given text file
  • ...CATAGGAAATTCCACCTCTTTGGCTTTGCCCAGTCTTCCCTTGAGGA
    TGCCTACGTTC...
  • 1. Calculate the score for each position
  • 2. if score gt threshold gt signal
  • Problem finding a good threshold
  • Threshold 99.5 quantile

Triinu Tasa, Koke 04.02.05
16
Questions?
Triinu Tasa, Koke 04.02.05
Write a Comment
User Comments (0)
About PowerShow.com