Title: Pealkiri
1Position Weight Matrices for Representing Signals
in Sequences
Triinu Tasa, Koke 04.02.05
2Definitions
- Sequence, string ordered arrangement of letters
'A', 'C', 'G', 'T' - Pattern simplified regular expression, alphabet
'A', 'C', 'G', 'T', '.', where '.' - wild-card
of length 1 ('A', 'C', 'G' or 'T')
Triinu Tasa, Koke 04.02.05
3What is a weight matrix?
What is a weight matrix?
- GATGAG
- GATGAT
- TGATAT
- GATGAT
- or
- GTAGTAGTAGT
Triinu Tasa, Koke 04.02.05
4What is a weight matrix?
Better
GATGAG GATGAT TGATAT
- Alignment matrix C
- A 0 2 1 0 3 0
- C 0 0 0 0 0 0
- G 2 1 0 2 0 1
- T 1 0 2 1 0 2
- Frequency matrix F
- A 0 0.7 0.3 0 1 0
- C 0 0 0 0 0 0
- G 0.7 0.3 0 0.7 0 0.3
- T 0.3 0 0.7 0.3 0 0.7
Triinu Tasa, Koke 04.02.05
5Or weight matrix W
What is a weight matrix?
- where
- N number of sequences used
- - a priori probability of letter i
Triinu Tasa, Koke 04.02.05
6Importance matrix I
What is a weight matrix?
A 0 1.4 0.3 0 3 0 C 0 0 0 0 0 0 G 1.4 0.3 0 1.4
0 0.3 T 0.3 0 1.4 0.3 0 1.4
Triinu Tasa, Koke 04.02.05
7Applications
Applications - Clustering
- Pattern clustering
- 1. G.GATGAG.T 62/75 139/49
223/26 R17.3026 BP1.12008e-37 - 2. G.GATGAG 89/110 145/60
244/50 R10.436 BP1.61764e-34 - 3. GATGAG.T 124/148 152/70
272/78 R7.36961 BP2.79148e-33 - 4. TG.AAA.TTT 132/145 153/61
279/84 R6.84578 BP1.83509e-32 - 5. AAAATTTT 200/231 163/77
2137/154 R4.69239 BP1.19109e-30 - 6. TGAAAA.TTT 104/114 145/53
259/61 R7.78277 BP3.86086e-29 - 7. AAA.TTTT 343/537 179/145
2264/392 R3.05349 BP5.66833e-29 - 8. G.AAA.TTTT 135/156 151/62
284/94 R6.19534 BP5.69933e-29 - 9. TG.GATGAG 49/57 130/35
219/22 R16.1117 BP9.35765e-28 - 10. TG.AAA.TTTT 86/91 140/43
246/48 R8.87311
BP1.1124e-27 - ...
Triinu Tasa, Koke 04.02.05
8G.GATGAG.T
Applications - Clustering
- GAGATGAGAT
- GTGATGAGAT
- GAGATGAGGT
- ...
- A -6.9 0.98 -6.9 1.38 -6.9 -6.9 1.38 -6.9 0.98 -6.
9 - C -6.9 -6.9 -6.9 -6.9 -6.9 -6.9 -6.9 -6.9 -6.9 -6.
9 - G 1.38 -6.9 1.38 -6.9 -6.9 1.38 -6.9 1.38 0.29 -6.
9 - T -6.9 0.29 -6.9 -6.9 1.38 -6.9 -6.9 -6.9 -6.9 1.3
8
Triinu Tasa, Koke 04.02.05
9Compare matrices with each other using the
dynamic programming approach
Applications - Clustering
-
- where
- A, B matrices
- i, j - columns
- If D(m,n) gt threshold gt matrices are different
Triinu Tasa, Koke 04.02.05
10Applications - Clustering
- G.GATGAG.T TG.AAA.TTT AAAATTTT
- G.GATGAG TGAAAA.TTT AAA.TTTT
- GATGAG.T TG.AAA.TTTT
- We want to represent the clusters by
- logos
- We need to align the patterns first position
the similar parts of the patterns above each
other - G.GATGAG.T
- G.GATGAG--
- --GATGAG.T
- or the logo will look like this
Triinu Tasa, Koke 04.02.05
11Multiple Alignment
Applications Multiple alignment
- Importance matrix I represents the aligned
patterns. - Example
- G.GATGAG.T
- GATGAG.T
- G.GATGAG
- 1. Insert the first pattern into I ('.' gives
0.25 to each) - A 0 0.25 0 1 0 0 1 0 0.25 0
- C 0 0.25 0 0 0 0 0 0 0.25 0
- G 1 0.25 1 0 0 1 0 1 0.25 0
- T 0 0.25 0 0 1 0 0 0 0.25 1
- 2. Align the second pattern with I using a
dynamic programming approach
Triinu Tasa, Koke 04.02.05
12Applications Multiple alignment
- Dynamic programming matrix
- G . G A T G
A G . T - G 0.00 0.10 0.01 0.10 0.00 0.00
0.10 0.00 0.10 0.01 0.00 - A 0.00 0.00 0.11 0.00 0.20 0.00
0.00 0.20 0.00 0.11 0.00 - T 0.00 0.00 0.01 0.00 0.00 0.30
0.00 0.00 0.00 0.01 0.21 - G 0.00 0.10 0.01 0.11 0.00 0.00
0.40 0.00 0.10 0.01 0.00 - A 0.00 0.00 0.11 0.00 0.21 0.00
0.00 0.50 0.00 0.11 0.00 - G 0.00 0.10 0.01 0.21 0.00 0.00
0.10 0.00 0.60 0.01 0.00 - . 0.00 0.00 0.10 0.01 0.21 0.00
0.00 0.10 0.00 0.60 0.01 - T 0.00 0.00 0.01 0.00 0.00 0.31
0.00 0.00 0.00 0.01 0.70 - G.GATGAG.T
- --GATGAG.T
Triinu Tasa, Koke 04.02.05
13Applications Multiple alignment
- 3. Add the pattern '--GATGAG.T' to I, if
necessary add columns to the matrix. - 4. Repeat the procedure for every pattern.
- Output
- G.GATGAG.T
- G.GATGAG--
- --GATGAG.T
- Why importance matrix?
Triinu Tasa, Koke 04.02.05
14Applications Multiple alignment
- Example
- Pattern GATG
- So far aligned
- GATGATGTA-
- - - - GATGTGG
- We want w(G, 4) gt w(G, 1) gt w(G, 9)
- Solution importance matrix
Triinu Tasa, Koke 04.02.05
15Applications Weight matrix matching
- Weight Matrix Matching
- Purpose find the sequences that the weight
matrix describes best in a given text file - ...CATAGGAAATTCCACCTCTTTGGCTTTGCCCAGTCTTCCCTTGAGGA
TGCCTACGTTC... - 1. Calculate the score for each position
- 2. if score gt threshold gt signal
- Problem finding a good threshold
- Threshold 99.5 quantile
Triinu Tasa, Koke 04.02.05
16Questions?
Triinu Tasa, Koke 04.02.05