Title: Biological Language Modeling Toolkit
1Biological Language Modeling Toolkit Graphing
Utilities
2Overview
- BLMT
- Ex Computes association measures in protein
sequences - Graphing Utilities
- Display how well the association measures or
other data (known or surmised) feature boundaries - Step 1 Automatic extraction of feature
boundaries from given source files - Step 2 Plot data along with feature positions
along a sequence
3BLMT Mutual Information
- Mutual Information -gt Computes "mutual
information, which is a measure of association
between adjacent amino acids. - Input amino acid sequence file(s)
- (ex) Swiss prot SW datasets
- Output file.mi.out.av -gt
- first column is position in sequence
- second column is mutual information value
associated with that position
4Feature Positions
- Extract feature position information (via
Swiss-prot) - Extracellular (EC),
- Cytoplasmic (CP),
- Helices (H)
- --gt label where the EC, CP, and H regions are in
the sequence.
5 DR PROSITE PS00238 OPSIN 1. KW
Photoreceptor Retinal protein Transmembrane
Glycoprotein Vision KW Phosphorylation
Lipoprotein Palmitate G-protein coupled
receptor KW Acetylation Retinitis pigmentosa
Disease mutation. FT DOMAIN 1 36
EXTRACELLULAR. FT TRANSMEM 37 61
1 (POTENTIAL). FT DOMAIN 62 73
CYTOPLASMIC. FT TRANSMEM 74 98
2 (POTENTIAL). FT DOMAIN 99 113
EXTRACELLULAR. FT TRANSMEM 114
133 3 (POTENTIAL). FT DOMAIN 134
152 CYTOPLASMIC. FT TRANSMEM 153
176 4 (POTENTIAL). FT DOMAIN 177
202 EXTRACELLULAR. FT TRANSMEM
203 230 5 (POTENTIAL). FT DOMAIN
231 252 CYTOPLASMIC. FT TRANSMEM
253 276 6 (POTENTIAL). FT DOMAIN
277 284 EXTRACELLULAR. FT
TRANSMEM 285 309 7 (POTENTIAL). FT
DOMAIN 310 348 CYTOPLASMIC. FT
MOD_RES 1 1 ACETYLATION (BY
SIMILARITY). FT CARBOHYD 2 2
N-LINKED (GLCNAC...) (BY SIMILARITY). FT
CARBOHYD 15 15 N-LINKED (GLCNAC...)
(BY SIMILARITY). FT DISULFID 110 187
BY SIMILARITY. FT BINDING 296 296
RETINAL CHROMOPHORE.
6(No Transcript)
7(No Transcript)
8Problems/Solution
- Problems
- -Making one subplot graph (MATLAB) requires
program customization - - Generation of multiple subplots together
requires more tedious work. Waste of time and
effort. - Solution
- -Need clear interface to generate subplot graphs
for you w/o writing tedious matlab code.
9a1,b1textread(test.out', 'd f') hold
on subplot(1,1,1) hold on hh1 plot(a1, b1,
'linewidth',2.5) hold on ylabel('yaxis','fontsize
',16, 'Color','k','fontweight','bold') set(hh1,
'MarkerSize',5) set(gca, 'YLim',-1,
3) set(gca,'ytick',-.6,-.2,.2 xdash
NaN,62,73,NaN,134,152,NaN,231,252,NaN,310,348
cp ydash (-.2)(ones(size(xdash))) line(xdash,
ydash,'color','y','linewidth',3) xdash
1,36,NaN,99,113,NaN,177,202,NaN,277,284,NaN
ec ydash (-.2)(ones(size(xdash))) line(xdash,
ydash,'color','r','linewidth',3) hold
on xlabel('x_axis','fontsize',16,
'Color','k') print -dpsc -r0 sample
10Design Capabilities
- Access multiple mutual information output
datasets - Display combination of EC/CP/H position
information on MI datasets (color coded) - Specify range (Y limits) and naming conventions
(X axis) - Output into convenient picture files (ex .tiff
file).
11Subplotter
- Version 1 (In house use only)
- -Initially the program takes as input
- --gt .SW file (EC/CP/H)
- --gt .m file (MATLAB file that code will be
generated in)
12Subplotter ( Version 1)
How many
output files to textread 1 What is the file to
be textread into matlab program output file 1
opsdh_1gpcr.out How many TOTAL subplots do you
request? 1
13Subplotter ( Version 1)
Subplot(1,1,1)
Which file do you want results to be
graphed on this subplot? 0 opsdh_1gpcr.out Make
selection (0) 0
How many items (EC,CP,H) do you
want plotted (1,2, 3 GPCR, 4
Loops)?
--gt 3
14Subplotter ( Version 1)
Specify Y-Axis Label? (y/n) n Y-Axis Label
GPCR Specify YLim? (y,n) n Give name to
X-Axis sample Give name to .tiff file for
output (no extension!) sample Matlab Program
completed! wait ...
15Subplotter (Version 1)
16Current/Future Work
- Generate graphing utility for every tool on the
BLMT website.
17Questions?