Bioinformatics%20at%20Promega%20Corporation

About This Presentation

Title:

Bioinformatics%20at%20Promega%20Corporation

Description:

PhD and work experience in Molecular Biology. Eight years in Promega ... More complex code (VBScript) Rapidly evolving theory. Partially Promega specific ... – PowerPoint PPT presentation

Number of Views:140

Avg rating:3.0/5.0

Slides: 15

Provided by: monik2

Category:

more less

Transcript and Presenter's Notes

Title: Bioinformatics%20at%20Promega%20Corporation

1
Bioinformatics at Promega Corporation
Intro to Bioinformatics Biotec November 28,
2006 Ethan Strauss Sr. Scientist RD
Bioinformatics, Promega, Ethan.strauss_at_promega.c
om http//q7.com/ethan/molbio
2
My Background

Bachelors degree in biology
PhD and work experience in Molecular Biology
Eight years in Promega Technical Services
Almost two years in Bioinformatics (officially)
No formal computer training
No formal bioinformatics training

3
Bioinformatics at Promega Corporation

Bioinformatics did not exists as a separate
function until 2001
One person 2001- 2005
Two people 2005 - ?
Bioinformatics supports primarily RD (100
scientists)
Mentor and train RD scientists
Provide expertise for projects (120 requests per
year)
Propose and evaluate new acquisitions
Liaison to IT department
Manage bioinformatics infrastructure (15 tools)
Develop new tools and adapt existing tools in
house

4
Bioinformatics Projects

Programming
Tools for internal and external Promega customers
Plexor Primer Design System (https//www.promega
.com/techserv/tools/plexor/logon.aspx)
Biomath (http//www.promega.com/biomath/)
siRNA Designer (http//www.promega.com/siRNADesi
gner/)
Sequence analysis for Excel and Microsoft
Word(http//www.promega.com/enotes/features/fe002
5.htm)
Analysis of BLAST results
Automated data retrieval (Web services)
Database for tracking vector construction
Database for keeping track of plasmid features

5
Bioinformatics Projects

Biocomputing (use of computers in biological
research)
Database searches
data mining
discovery research
Primer design
Blast analysis and interpretation
Etc

6
NCBI

I recently took the Powerscripting course from
NCBI
NCBI has a lot of very powerful tools and
databases.
They are not as well documented as they might be.
Check them out periodically.
Databases at NCBI I was not aware of, but am now.
Pub Med CentralArticles with free text
3D domain, structure, 3D structural information.
GEO (Gene Expression Omnibus)Micorarray
expression data
There are many more which I see on the drop down
list, but dont really know any thing about

7
NCBI ftp site

Most NCBI data is available by FTP from
http//www.ncbi.nlm.nih.gov/Ftp/
I have used it for a number of projects including
an analysis of amino acid residue distribution
for the first 11 positions of human and E. coli

8
NCBI - Entrez Programming Utilities
Programatic access to Entrez http//eutils.ncbi.nl
m.nih.gov/entrez/query/static/eutils_help.html
Allows incorporation of entrez functionality
into third party tools http//www.promega.com/tech
serv/tools/plexor/NewQpcrProject.aspx Allows
automation of Entrez searchesAnalysis of large
datasetsAutomation of searches and
queries Accessable using HTTP or SOAP
9
NCBI - Entrez Programming Utilities

Programs available
ESearch Searches and retrieves primary IDs and
term translations and optionally retains results
for future use in the user's environment.
ESummary Retrieves document summaries from a
list of primary IDs or from the user's
environment.
EFetch Retrieves records in the requested
format from a list of one or more primary IDs or
from the user's environment.
ELink Checks for links from the query ID
numbers to other Entrez databases
EInfo Provides field index term counts, last
update, and available links for each database.
EPost Posts a file containing a list of primary
IDs for future use in the user's environment to
use with subsequent search strategies.

10
NCBI - Entrez Programming Utilities
Lets try it! Go to http//www.ncbi.nlm.nih.gov/Cla
ss/wheeler/eutils/eu.html and play Now
try http//www.ncbi.nlm.nih.gov/Class/wheeler/euti
ls/epipe.html
11
NCBI - Entrez Programming Utilities
These sorts of utilities can be access
programtically using Perl. See Demonstration
Programs at http//eutils.ncbi.nlm.nih.gov/entrez
/query/static/eutils_help.html
12
NCBI - Entrez Programming Utilities
my utils "http//www.ncbi.nlm.nih.gov/entrez/eu
tils" my db ask_user("Database",
"Pubmed") my query ask_user("Query",
"zanzibar") my report ask_user("Report",
"abstract") my esearch "utils/esearch.fcgi?
dbdbretmax1usehistoryyterm" my
esearch_result get(esearch . query) print
"\nESEARCH RESULT esearch_result\n" esearch_re
sult mltCountgt(\d)lt/Countgt.ltQueryKeygt(\d)lt
/QueryKeygt.ltWebEnvgt(\S)lt/WebEnvgts my Count
1 my QueryKey 2 my WebEnv
3 print "Count Count QueryKey QueryKey
WebEnv WebEnv\n" my retstart my
retmax3 for(retstart 0 retstart lt Count
retstart retmax) my efetch
"utils/efetch.fcgi?rettypereportretmodetextr
etstartretstartretmaxretmax" .
"dbdbquery_keyQueryKeyWebEnvWebEnv"
print "\nEF_QUERYefetch\n" my
efetch_result get(efetch) print
"---------\nEFETCH RESULT(". (retstart
1) . . (retstart retmax) . ") ".
"efetch_result\n-----PRESS ENTER!!!-------\n"

13
Bioinformatics Advice

Be aware of bias in databases!
Search Genbank (nucleotide) for HumanOrganism
apoptosis. How many hits?
Now try OrcinusOrganism apoptosisHow many
hits?
Can you conclude that Orcinus does not have
apoptosis?

14
Bioinformatics Advice

Bioinformatics is changing and advancing very
rapidly.
Dont forget to notice what is new.
NCBI now has 20 different databases. They had
two only 3-5 years ago
If you want to do something that you know cant
be done, check again in two weeks!
My standard computer can process the entire human
genome for Restriction sites, ORFs etc in a few
hours. Not long ago, the best computers couldnt
even hold that much data!
If old tools work, dont feel you need to use the
newest tools.
I still do much of my analysis with Microsoft
Word

15
LIMS Laboratory Information Management System
Goal Manage in-house DNA sequences and
associated data Eval UW-Madison Center for
Eukaryotic Structural Genomics Sesame
http//www.sesame.wisc.edu/ Sesame is designed
to organize and record data relevant to complex
scientific projects, to launch computer-controlled
processes, and to help decide about subsequent
steps on the basis of information available. The
Sesame system is based on the multi-tier
paradigm, and it consists of a framework and
application modules that carry out specific
tasks.Users interact with Sesame through a
series of web-based Java applet-applications
designed to organize data. It allows
collaborators on a given project to enter,
process, view, and extract relevant data,
regardless of location, so long as web access is
available. Data reside in an Oracle relational
database. Sesame serves as a digital laboratory
notebook and allows users to attach numerous
files and images
16
Programming

Tools for Promega customers
Biomath (http//www.promega.com/biomath/)
Basic calculations (Most can be done easily by
hand)
Simple code (Javascript)
Established theory.
Universal (not Promega specific)
siRNA Designer(http//www.promega.com/siRNADesigne
r/ )
Complex calculations
More complex code (VBScript)
Rapidly evolving theory
Partially Promega specific

17
Programming

Tools for Promega customers
Plexor Primer Design (https//www.promega.com/tech
serv/tools/plexor)
Complex calculations
Complex code (C.Net)
Separate user interface and main calculations
Multiple interacting modules
Database integration
Integration with Genbank (through a web service)
Proprietary improvements on established theory
Very Promega specific

18
Programming

Tools for internal use
BLAST analysis of Plexor Primers
Primer specificity is important
BLAST can determine specificity, but output is
very complex.
Simplify
Combine all hits from the same Gene
Only show hits which could mis-prime
Groups hits by species
Allow sorting by species

19
Programming

Tools for internal use
BLAST analysis of Plexor Primers

Initial BLAST results (1 page out of 30)
Analyzed BLAST results (complete!)
20
Programming

Tools for internal use
Vector/Insert Database
Promegas Flexi vector system has a very
structured cloning procedure.
RD has been making many different Flexi vector
backbones with many inserts.
Keeping track has been a problem.
A database is in development

21
Programming

Tools for internal use

22
Programming

Internal Projects
Which Restriction enzyme cuts least frequently in
human ORFs?
Method
Download human Refseq database (ftp//ftp.ncbi.nih
.gov/refseq/H_sapiens/)
Load into local database
Scan each sequence for each RE site
The scan took 2-3 hours to complete

http//www.promega.com/pnotes/89/12416_11/12416_11
.pdf
23
Programming

Internal Projects
Which human genes in Genbank are the most
popular?
Method
Download Gene database (ftp//ftp.ncbi.nlm.nih.g
ov/gene/)
Download Gene Ontology information
(http//www.geneontology.org/)
Use web services to get pathway information from
KEGG (http//www.genome.jp/kegg/)
Use web services to get citation information from
Pubmed (http//www.ncbi.nlm.nih.gov/entrez/query.f
cgi?dbPubMed)
Load all into local database
Rank genes by desired criteria
Size
Function
Localization
Pathways
Publications

24
Database searches and data mining
Question Can you reformat this sequence for
me?Tool ReadSeq http//bimas.dcrt.nih.gov/molb
io/readseq Macros Question How many viral
proteins start with MetHis?Tool Hits database
motif searches http//hits.isb-sib.ch/ Question
How many different bacterial two-domain
proteins are known?Tool SCOP database
http//scop.berkeley.edu/ Question How do I
design PCR primers selective for bacterial
species X?Tool Ribosomal database 16s rRNA
alignment http//rdp.cme.msu.edu

Write a Comment

User Comments (0)

About PowerShow.com

Bioinformatics%20at%20Promega%20Corporation - PowerPoint PPT Presentation

Bioinformatics%20at%20Promega%20Corporation

PhD and work experience in Molecular Biology. Eight years in Promega ... More complex code (VBScript) Rapidly evolving theory. Partially Promega specific ... – PowerPoint PPT presentation