Title: Bridging Bioinformatics and Chemoinformatics
1Bridging Bioinformatics and Chem(o)informatics
- Gary Wiggins
- School of Informatics
- Indiana University
- wiggins_at_indiana.edu
- Yan He (SLIS MLS Student)
- Meredith Saba (SLIS MLS Student)
2Provocative Thought
- While much bioscience is published with the
knowledge that machines will be expected to
understand at least part of it, almost all
chemistry is published purely for humans to
read. - Murray-Rust et al. Org. Biomol. Chem. 2004, 2,
3201.
3Overview of the Talk
- Review of ACS CINF 2004 Papers
- Review of Relevant Articles
- Public Chemistry Databases and Data Repositories
with Bioinformatics Info/Links - Overview of Web Services
- NIH-funded Projects Underway or Planned at
Indiana University
4The Bigger Picture Linking Bioinformatics to
Cheminformatics
- American Chemical Society Division of Chemical
Information (CINF) Symposium, Anaheim, Spring
2004 - All-day session with 16 papers
- http//www.acscinf.org/new/docs/meetings/227nm/227
cinfabstracts.htm
5Problems from ACS CINF 2004
- Both technical and people factors hinder
knowledge exchange between biology and chemistry.
(Lipinski) - People Problems per Chris Lipinski
- Meta data capture is complicated by people
issues, particularly those between chemists and
biologists. - Discipline-based disconnects occur distressingly
often and are frequently overlooked as a cause of
lost productivity.
6Interdisciplinary Collaborations Biology and
Chemistry
- Whats ... important for these collaborations
is, not only do you have to accept the other
guys paradigm or at least live with it you have
to be willing to accept the other guys foibles
or your perception of the other guys foibles
(and recognize the opposite of this). We each
have our own approaches to how we do science, and
its just different cultures. - --Thom Kauffman interview in ACS LiveWire, March
2005, 7.3. http//pubs.acs.org/4librarians/livewir
e/2006/7.3/profile.html
7Some Questions from the ACS CINF 2004 Symposium
- "Find all proteins related to protein A (i.e.
within a given path length of A) in a protein
interaction graph, and retrieve related assay
results and compound structures. - Find all pathways where compound X inhibits or
slows a reaction, and retrieve Gene Ontology
classifications for all proteins involved in the
reaction.
8Problems from ACS CINF 2004
- Commercial vs. public data
- Batch mode data processing possible in biology,
but primitive in chemistry - Primary HTS data has a very high noise factor
- Data format standardization problem
- Chemoinformatics and bioinformatics use
completely different data formats and analysis
tools - Chemical and protein sequence information has
been largely analyzed separately
9Solutions from ACS CINF 2004
- Linking biological and chemical information in
computational approaches to predict biological
activity, ADME profiles, and adverse drug
reactions (ADR) - Energetics of binding for more accurate and
sensitive chemical representation of DNA-protein
interactions - A discovery informatics platform that facilitates
archival, sharing, integration, and exploration
of synthetic methods and biological activity data
10Solutions from ACS CINF 2004
- Data pipelining approach makes it possible to
apply bioinformatics and chemoinformatics data
and analyses together. - Visualizations are the best way for people to
understand data.
11Solutions from ACS CINF 2004
- Cabinet (Chemical And Biological Information
NETwork, formerly Fedora) servers include - Metabolic pathway network chart (Empath)
- Protein-Ligand Association Network (Planet)
- Enzyme Commission Codebook (EC Book)
- Traditional Chinese Medicines (TCM)
- World Drug Index (WDI), and others.
- Built on the Daylight HTTP toolkit
- http//www.metaphorics.com/products/cabinet.html
12Overview of the Talk
- Review of ACS CINF 2004 Papers
- Review of Relevant Articles
- Public Chemistry Databases and Data Repositories
with Bioinformatics Info/Links - Overview of Web Services
- NIH-funded Projects Underway or Planned at
Indiana University
13What is Chemoinformatics? (Brown)
- the essence of chemoinformatics is integration
and focus rather than its components, which are
independent disciplines. - Supporting disciplines
- Chemical information
- Computational chemistry
- Chemometrics
14Chemoinformatics and Disease
15Toolkits as Integrators (Brown)
- Companies such as Daylight, Advanced Visual
Systems, OpenEye, and SciTegic provide
integration systems for - Statistical methods
- Text mining
- Computational chemistry
- Visualization
16Genegos MetaDrug Product
- Toxicogenomics platform for the prediction of
human drug metabolism and toxicity of novel
compounds - Enables the visualization of pre-clinical and
clinical high-throughput data in the context of
the complete biological system - Integrates chemical, biological, and protein
function data - http//www.genego.com/
17BioWisdom
- Examination of vast amounts of available
information using its Sofia KnowledgeScan
methodology - SRS data integration platform
- http//www.biowisdom.com/
18Lessons from Hip Hop (Salamone)
- Mashup technique
- Bring together disparate informatics, biological,
chemical, and imaging information when conducting
research - Example of an integration tool iSpecies.org
- A search for a species returns a page with NCBI
genomics information, Yahoo images of the
species, and articles culled from Google Scholar
19iSpecies.org Search
20Chemogenomics and Chemoproteomics (Gagna)
- Chemogenomics (def.)The description of all
potential drugs that can be used against all
possible target sites, OR the actions of
target-specific chemical ligands and how they are
used to globally examine genes - Chemoproteomics (def.)Uses chemistry to
characterize protein structure and functions - They are . . . a form of chemical biology
brought up to date in the area of genome and
proteome analysis.
21New Interdisciplinary Journals
- ACS Chemical Biology (ACS)
- ChemBioChem A European Journal of Chemical
Biology (Wiley/VCH) - Chemical Biology and Drug Design (Blackwell)
- JBIC Journal of Biological and Inorganic
Chemistry (Springer) - Journal of Biochemical and Molecular Toxicology
(Wiley) - Molecular Biosystems (RSC)
- Nature Chemical Biology (Nature Publishing)
- Organic Biomolecular Chemistry (RSC)
22Open Source Software (Geldenhuys)
- Log P calculator from Interactive Analysis
- http//www.logp.com
- University of Utahs Computational Science and
Engineering Online - Can submit jobs for molecular mechanics, quantum
chemical calculations, and biomolecular
interfaces for viewing PDB files - http//www.cse-online.net
- Virtual Computational Chemistry Laboratory
- http//www.vcclab.org
23The Blue Obelisk (Guha)
- Several open chemistry and chemoinformatics
projects that have pooled forces to enhance
interoperability - Maintain
- Chemoinformatics Algorithms Dictionary
- Data Repository for standardized data for
chemical properties and other facts (e.g., mass) - http//www.blueobelisk.org/
24BlueObelisk.org
- Working collaboratively on projects such as
- Chemistry Development Kit (CDK)
- JChemPaint
- Jmol
- JUMBO
- NMRShiftDB
- Octet
- Open Babel
- QSAR
- World Wide Molecular Matrix (WWMM)
25Barriers to the Use of Open Source Software
- Unix command line
- Problem Lack of known standards and datasets of
compounds for validation, e.g., in docking
programs
26Lessons from the Human Genome Project (Austin)
- Keys to success in the HGP were
- Comprehensiveness
- Commitment to open access to the sequence as a
research tool without encumbrance - Proposed tools for a genome functionation
toolbox - Whole-genome transcriptome and proteome
characterization - Development of small inhibitory RNAs (siRNAs) and
knockout mice for every gene - Small molecules and the druggable genome
27ChemDB http//cdb.ics.uci.edu/CHEM/Web/
28ChEBI, Chemical Entities of Biological Interest
- Dictionary of molecular entities focused on small
chemical compounds - Features an ontological classification, showing
the relationships between molecular entities or
classes of entities and their parents and/or
children
29Vioxx Entry in ChEBI
30The IUPAC International Chemical Identifier
(InChI)
- Open source, non-proprietary, public-domain
identifier for chemicals - String of characters that uniquely represent a
molecular substance - Independent of the way the chemical structure is
drawn - Enables reliable structure recognition and easy
linking of diverse data compilations - Accepts as input MOLfiles (or SDfiles) and CML
files - Download the program to your computer at
- http//www.iupac.org/inchi/license.html
31Generation of InChI for Vioxx with wInChI
32Vioxx Entry in PubChem Compounds Found with InChI
33Vioxx Bioassay Data in PubChem
34Vioxx PubChem Link to External Sources of
Information
35The Elsevier MDL/NIH Link via PubChem and
DiscoveryGate
- Cross-indexes PubChem to the Compound Index
hosted on Elsevier MDLs DiscoveryGate platform - MDL added 5 million structures from PubChem to
their index, resulting in over 14 million unique
chemical structures - Links go both ways
- Can move from biological data in PubChem to
bioactivity, chemical sourcing, synthetic
methodology, and EHS data in DiscoveryGate
sources
36Elsevier MDLs xPharm
- Comprehensive set of records linking
- Agents (compounds) (2300)
- Targets (600)
- Disorders (450)
- Principles that govern their interactions (180)
- Answers questions such as
- What targets are associated with control of blood
pressure? - What adverse effects are associated with
monoamine oxidase inhibitors?
37Text Datamining (Banville)
- In the pharmaceutical field, it is ideally the
marriage of biological and chemical information
that needs to be the ultimate focus of text data
mining applications. - Problems
- Lack of universal publication standards for
identifying each unique chemical entity - Selective indexing policies of AI services
- Need to understand how chemical structures link
to biological processes
38Chemical Datamining Software
- SureChem
- http//surechem.reeltwo.com/
- CLiDE
- Recognizes structures, reactions, and text
- http//www.simbiosys.ca/clide/
- OSCAR
- OSCAR1 to check experimental data
- http//www.ch.cam.ac.uk/magnus/checker.html
- http//www.rsc.org/Publishing/ReSourCe/AuthorGuide
lines/AuthoringTools/ExperimentalDataChecker/ - CSR (Chemical Structure Reconstruction)
- http//www.scai.fraunhofer.de/uploads/media/MZ-ERC
IM05_04.pdf - MDL DocSearchcombines MDLs Isentris platform
and EMCs Documentum
39Overview of the Talk
- Review of ACS CINF 2004 Papers
- Review of Relevant Articles
- Public Chemistry Databases and Data Repositories
with Bioinformatics Info/Links - Overview of Web Services
- NIH-funded Projects Underway or Planned at
Indiana University
40Themes from SwissProts 20th Anniversary
Conference, In silico Analysis of Proteins
- Knowledgebases, databases and other information
resources for proteins - Sequence searches and alignments
- Protein sequence analysis
- Protein structure prediction, analysis and
visualization - Proteomics data analysis
41Chemoinformatics Databases (Jónsdóttir)
- Lists databases relevant to drug discovery and
development, including - General databases
- DBs for screening compounds
- DBs for medicinal agents
- DBs with ADMET properties
- DBs with physico-chemical properties
- Curiously does not mention Chemical Abstracts
42Databases with Protein and Ligand Information
(Jónsdóttir)
- Protein Data Bank
- Target Registration Database
- Relibaseuses structural info to analyze
protein-ligand interactions Relibase for
protein-protein interaction searching - Cambridge Structural Database
- KEGG LIGAND DB for enzyme reactions
- http//www.genome.ad.jp/ligand
43Other Databases with Protein and Ligand
Information
- SitesBase--a database of known ligand binding
sites within the PDB - http//www.bioinformatics.leeds.ac.uk/sb/main.html
- Binding MOAD
- http//www.bindingmoad.org/
- sc-PDB (Kellenberger)
- http//bioinfo-pharma.u-strasbg.fr8080/scPDB/inde
x.jsp
44sc-PDB http//bioinfo-pharma.u-strasbg.fr8080/sc
PDB/index.jsp
45Isatin Search on sc-PDB
46Other Databases with Protein-Protein Interaction
Data (Jónsdóttir)
- YPD, Yeast Proteome Database (for proteins from
S. cerevisiae) - http//www.biobase.de/pages/index.php?id139
- Human Protein Reference Database
- http//www.hprd.org/
- BIND, Biomolecular Interaction Network Database
(ceased as of 11/16/2005?) - http//www.bind.ca/Action
47International Molecular Exchange (IMEx)
Consortiumhttp//imex.sourceforge.net/
- BIND (http//www.blueprint.org) The Blueprint
Initiative AsiaPte. Ltd, Singapore and The
Blueprint Initiative North America,Toronto Canada - DIP (http//dip.doe-mbi.ucla.edu) UCLA-DOE
Institute for Genomics Proteomics - IntAct (http//www.ebi.ac.uk/intact),
EMBLEuropean Bioinformatics Institute, Hinxton,
UK - MINT (http//mint.bio.uniroma2.it/mint/)
University of Rome Tor Vergata, Rome Italy - MPact (http//mips.gsf.de/genre/proj/mpact), MIPS
/ Institute for Bioinformatics, Munich, Germany.
48Protein Sites from IU I533 Students and others
- LigandDepotintegrated source for small molecules
- http//ligand-depot.rutgers.edu/index.html
- PSIPRED Protein Structure Prediction Server
- http//bioinf.cs.ucl.ac.uk/psipred/
- DSSP--a database of secondary structure
assignments (and much more) for all protein
entries in the PDB - http//swift.cmbi.ru.nl/gv/dssp/
- Dr. Predrag Radivojacs I690 class on Structural
Bioinformatics - http//www.informatics.indiana.edu/predrag/2006spr
ingi690/2006springi690.htm
49Protein Secondary Structure Prediction
- Methods
- Neural Network
- Rule Based
- Other Machine Learning
- Homology Based
50Protein Secondary Structure Prediction Software
- PredictProtein
- http//www.predictprotein.org/
- Chou-Fasman http//fasta.bioch.virginia.edu/fasta_
www/chofas.htm - NN Predict
- http//www.cmpharm.ucsf.edu/nomi/nnpredict.html
51Structure-Based Docking Methods
- Method
- Scans many small molecules and docks them to a
site of interest on a protein structure - Predicts free energy of binding
- Filters thousands of compounds relatively quickly
- Top hits can be used for more rigorous
computational/experimental characterization and
optimization
52Structure-Based Docking Methods
- DOCK
- http//dock.compbio.ucsf.edu/
- Accelryss Insight (built on DOCK)
- http//www.accelrys.com/products/insight/
- FlexX
- http//www.biosolveit.de/FlexX/
- Glide
- http//www.schrodinger.com/ProductDescription.php?
mID6sID6 - GOLD
- http//www.ccdc.cam.ac.uk/products/life_sciences/g
old/
53Useful Structure Databases
- ModBase
- http//modbase.compbio.ucsf.edu/modbase-cgi-new/se
arch_form.cgi - Dali Database (Fold classification based on PDB)
- http//ekhidna.biocenter.helsinki.fi/dali/start
- Protein Structure Analysis, Comparison, /or
Classification Guide - http//www.bio.vu.nl/nvtb/Structures.html
54SCOP, Structural Classification of Proteins
- Curated database of structural and evolutionary
relationships - All known protein folds (v. 1.69, July 2005)
- 70,859 domains organized into 2,845 families,
1,539 superfamilies, and 945 folds - Detailed information about close relatives
- Links to coordinates, images of structures,
interactive viewers, and literature references - http//scop.mrc-lmb.cam.ac.uk/scop/
55SCOP Search Options
- Homology search yields a list of structures with
significant levels of sequence similarity - Keyword search matches words in SCOP and PDB
56CATH Protein Structure Classification
- Like SCOP, structured hierarchically by
- Class (determined by secondary structure)
- Architecture (overall shape, e.g., barrel,
sandwich, roll, etc.) no equivalent in SCOP - Topology (grouped into fold families based on
overall shape and connectivity of secondary
structures) - Homologous Superfamily (domains thought to share
a common ancestor) - As of January 2005, had 43,229 domains classified
into 1,467 superfamilies and 5,107 sequence
families A protein family database (CATH-PFDB)
contained a total of 616,470 domain sequences
classified into 23,876 sequence families - http//cathwww.biochem.ucl.ac.uk/latest/index.html
57CATH Search Options
- Can browse or search the classification by CATH
code - CATH codes can be used to search other databases,
e.g., DHS, Gene3D, and Impala
58Gasteigers Biochemical Pathways Database
- Database of biochemical pathways that represents
chemical structures and reactions on the atomic
level - Gives access to each atom and bond of the
substrates of enzyme reactions - Allows the study of transition state hypotheses
of enzyme reactions - Analysis of the physicochemical effects operating
at the reaction site allows a classification of
enzyme reactions that goes beyond the traditional
EC code for enzymes. - 1533 biochemical molecules and 2175 reactions
- http//www2.chemie.uni-erlangen.de/services/biopat
h/index.html
59A Gene Expression Database for NCI60 (Scherf)
- Published in Nature Genetics, 2000
- First study to integrate gene expression with
molecular pharmacology databases - Gene expression profiles for NCI60 assessed
using microarray technology - Gene-drug relationships investigated by how the
gene transcription levels vary with respect to
drug activities
60Correlation Matrix Between Drug Activity and Gene
Expression
61Other Relevant Databases/Servers
- Each year Nucleic Acids Research publishes a
Database Issue in January and a Web Server Issue
in July (See refs in Bibliography section).
Examples from the most recent issues
62Overview of the Talk
- Review of ACS CINF 2004 Papers
- Review of Relevant Articles
- Public Chemistry Databases and Data Repositories
with Bioinformatics Info/Links - Overview of Web Services
- NIH-funded Projects Underway or Planned at
Indiana University
63Web Services Overview
- What are Web Services?
- A distributed invocation system built on Grid
computing - Independent of platform and programming language
- Built on existing Web standards
- A service oriented architecture with
- Interfaces based on Internet protocols
- Messages in XML (except for binary data
attachments)
64Service-Oriented Architecture
- From Curcin et al. DDT, 2005, 10(12),867
65Web Services for Chemistry Problems
- Performance and scalability
- Proprietary data
- Competition from high-performance desktop
applications - -- Geoff Hutchison, its a puzzle blog,
2005-01-05 - ALSO
- Lack of a substantial body of trustworthy Open
Access databases - Non-standard chemical data formats (over 40 in
regular use and requiring normalization to one
another)
66Overview of the Talk
- Review of ACS CINF 2004 Papers
- Review of Relevant Articles
- Public Chemistry Databases and Data Repositories
with Bioinformatics Info/Links - Overview of Web Services
- NIH-funded Projects Underway or Planned at
Indiana University
67Indiana University Planned Projectshttp//www.ch
embiogrid.org
- Design of a Grid-based distributed data
architecture - Development of tools for HTS data analysis and
virtual screening - Database for quantum mechanical simulation data
- Chemical prototype projects
- Novel routes to enzymatic reaction mechanisms
- Mechanism-based drug design
- Data-inquiry-based development of new methods in
natural product synthesis
68Web Services for Chemistry at IU
69NCI Developmental Therapeutics Program (DTP)
- Downloadable data
- In vitro 60 cell line results
- in vitro anti-HIV results
- Yeast assay
- 200,000 chemical structures
- molecular targets
- microarray data
- Or search the database at
- http//dtp.nci.nih.gov/docs/dtp_search.html
70IU Database of NIH DTP Data
- Contains over 200,000 chemical structures tested
in 60 cellular assays from different human tumor
cell lines - Also includes microarray assay profiles for the
untreated cell lines (14,000 datapoints) - A local PostgreSQL database containing the data
that is exposed as a web service - Using workflows and complex SQL queries, we can
do advanced data mining that exploits the
chemical, biological and genomic information for
particular audiences (chemists, biologists, etc)
71Mining the NIH DTP database
14,000 gene expression values
60 cell lines
Cell lines can be clustered based on gene
expression similarity
200,000 compounds
Compounds can be clustered based on similarity of
profile across cell lines, or by chemical
structure fingerprint similarity
72Use of Taverna at IU
- A protein implicated in tumor growth is supplied
to the docking program (in this case HSP90 taken
from the PDB 1Y4 complex) - The workflow employs our local NIH DTP database
service to search 200,000 compounds tested in
human tumor cellular assays for similar
structures to the ligand. - Client portlets are used to browse these
structures - Once docking is complete, the user visualizes the
high-scoring docked structures in a portlet using
the JMOL applet. - Similar structures are filtered for drugability,
and are automatically passed to the OpenEye FRED
docking program for docking into the target
protein. - A 2D structure is supplied for input into the
similarity search (in this case, the extracted
bound ligand from the PDB IY4 complex) - Correlation of docking results and biological
fingerprints across the human tumor cell lines
can help identify potential mechanisms of action
of DTP compounds
73Taverna Workflow
Workflow definition
Available web services (WSDL)
Visual depiction of workflow
74Taverna in Action
75Overall Workflow
76Pre-Closing Quote
- There is not going to be a voila moment at the
computer terminal. Instead, there is systematic
use of wide-ranging computational tools to
facilitate and enhance the drug discovery
process. - Jorgensen. Science, March 19, 2004, 303, 1814.
77Closing quote
- The future of chemistry depends on the
automated analysis of chemical knowledge,
combining disparate data sources in a single
resource, such as the World-Wide Molecular
Matrix, which can be analysed using computational
techniques to assess and build on these data. - Townsend et al. Org. Biomol. Chem. 2004, 2, 3299.
78Post-closing quote zzzzzCAS
- In an industry first, Chemical Abstracts Service
(CAS) has unveiled a revolutionary new literature
searching tool which will permit scientists to
search and retrieve the worlds chemical
literatureincluding patents and obscure
technical reportsin their sleep. - --Author unknown
79Acknowledgements
- Randy Arnold
- Xiao Dong
- Sean Mooney
- Peter Murray-Rust
- David J. Wild
- I533 Chemical Informatics Seminar Students
- Elsevier Science
80Bibliography Articles, Books, and Conference
Papers
- The Bigger Picture Linking Bioinformatics to
Cheminformatics CINF Symposium Abstracts
1-16, 227th ACS National MeetingAnaheim, CA,
March 28-April 1, 2004 http//www.acscinf.org/new/
docs/meetings/227nm/227cinfabstracts.htm - Austin, C.P. The completed human genome
implications for chemical biology. Current
Opinion in Chemical Biology 2003, 7, 511-515. - Bajorath, Jürgen, ed. Chemoinformatics concepts,
methods, and tools for drug discovery. Totowa,
N.J. Humana Press, c2004. (Methods in molecular
biology v. 275) - Banville, Debra L. Mining chemical structural
informationo from the drug literature. Drug
Discovery Today January 2006, 11(1/2), 35-42. - Brown F. Editorial opinion chemoinformatics - a
ten year update.Current Opinion in Drug
Discovery and Development 2005 May 8(3)
298-302.
81Bibliography Articles (contd)
- Coles, Simon J. Day, Nick E. Murray-Rust,
Peter Rzepa, Henry S. Zhang, Yong. Enhancement
of the chemical semantic web through
InChIfication. Organic Biomolecular Chemistry
2005, 3, 1832-1834. - Curcin, Vera Ghanem, Moustafa Guo, Yike. "Web
services in the life sciences." Drug Discovery
Today 2005, 10(12), 865-871. - Gagna CE, Winokur D, Clark Lambert W. Cell
biology, chemogenomics and chemoproteomics. Cell
Biol Int. 2004 28(11) 755-64. - Geldenhuys, W.J. Gaasch, K.E. Watson, M.
Allen, D.D.Van Der Schyf, C.J. Optimizing the
use of open-source software applications in drug
discovery. Drug Discovery Today February 2006,
11(3/4), 127-132. - Guha, R. Howard, M.T. Hutchison, G.R.
Murray-Rust, P. Rzepa, H. Steinbeck, C Wegner,
J. Willighagen, E.L. The Blue
ObeliskInteroperability in chemical
informatics. Journal of Chemical Information and
Modeling 2006 Web Release Date 22-Feb-2006 DOI
10.1021/ci050400b
82Bibliography Articles (contd)
- Jónsdóttir, S.O. Jorgensen, F.S. Brunak, S.
Prediction methods and databases within
chemoinformatics emphasis on drugs and drug
candidates. Bioinformatics 2005 May 15 21(10)
2145-60. - Jorgensen, William L. The many roles of
computation in drug discovery. Science March 19,
2004, 303, 1813-1818. - Kauffman, Thom. Profile. interview LiveWire,
March 2005, 7.3 http//pubs.acs.org/4librarians/l
ivewire/2006/7.3/profile.html - Murray-Rust, Peter S. Mitchell, John B.O.
Rzepa, Henry S. Communication and re-use of
chemical information in bioscience. BMC
Bioinformatics 2005, 6, 180. - Murray-Rust, Peter Mitchell, John B.O. Rzepa,
Henry S. Chemistry in bioinformatics. BMC
Bioinformatics 2005, 6, 141-144. - Povolna, Vera Dixon, Scott Weininger, David.
CabinetChemical and Biological Informatics
NETwork. in Oprea, Tudor I., ed.
Chemoinformatics in Drug Discovery. Weinheim
Wiley-VCH, 2004, 241-269.
83Bibliography Articles (contd)
- Salamone, Salvatore. Hip Hop offers lessons on
life sciences data integration. Bio-IT World
February 2006, 36. - Scherf Uwe, Ross Douglas T., Waltham Mark, Smith
Lawrence H., Lee Jae K., Tanabe Lorraine, Kohn
Kurt W., Reinhold William C., Myers Timothy G.,
Andrews Darren T., Scudiero Dominic A., Eisen
Michael B., Sausville Edward A., Pommier Yves,
Botstein David, Brown Patrick O., Weinstein John
N. A gene expression database for the molecular
pharmacology of cancer. Nature Genetics 2000,
24, 236-244. - Souchelnytskyi, S. "Bridging proteomics and
systems biology What are the roads to be
traveled?" Proteomics 2005 (November), 5(16),
4123-4137. - Tetko, Igor V. Computing chemistry on the web.
Drug Discovery Today November 2005, 10(22),
1497-1500.
84Bibliography Articles (contd)
- Zimmermann, Marc Thi, Le Thuy Bui Hofmann,
Martin. Combating illiteracy in chemistry
Towards computer-based chemical structure
reconstruction. ERCIM News January 2005, 60,
40-41. - http//www.scai.fraunhofer.de/uploads/media/MZ-ERC
IM05_04.pdf - Zimmermann, Marc Fluck, Juliane Thi, Le Thuy
Bui Kolarik, Corinna Kumpf, Kai Hofmann,
Martin. Information extraction in the life
sciences Perspectives for medicinal. chemistry,
pharmacology and toxicology. Current Topics in
Medicinal Chemistry 2005, 5(8), 785-796.
85Bibliography Databases
- Andreeva, A. Howorth, D. Brenner, S.E.
Hubbard, T.J.P. Chothia, C. Murzin, A.G. SCOP
database in 2004 refinements integrate structure
and sequence family data. Nucleic Acids Research
2004, 32 Database issue D226-D229 doi
10.1093/nar/gkh039 - Chen J, Swamidass SJ, Dou Y, Bruand J, Baldi P.
ChemDB a public database of small molecules and
related chemoinformatics resources.
Bioinformatics. 2005 Nov 15 21(22) 4133-9. - Dunkel, M. Fullbeck, M. Neumann, S. Preissner,
R. SuperNatural a searchable database of
available natural compounds. Nucleic Acids
Research 2006, 34, Database issue D678-D683 doi
10.1093/nar/gkj132 - Gold, Nicola D. Jackson, Richard M. A
searchable database for comparing protein-ligand
binding site for the analysis of
structure-function relationships. Journal of
Chemical Information and Modeling 2006, 46(2),
736-742.
86Bibliography Databases (contd)
- Kanehisa, M. Goto, S. Hattori, M.
Aoki-Kinoshita, F. Itoh, M. Kawashima, S.
Katayama, T. Araki, M Hirakawa, M. From
genomics to chemical genomics new developments
in KEGG. Nucleic Acids Research 2006, 34,
Database issue D354-D357. doi
101093/nar/gkj102. - Kellenberger, Esther Muller, Pascal Schalon,
Clarire Bret, Guillaume Foata, Nicolas Rognan,
Didier. sc-PDB An annotated database of
druggable binding sites from the Protein Data
Bank. Journal of Chemical Information and
Modeling 2006, 46(2), 717-727. - Kirwin, J.J. Shoichet, B.K. ZINCA free
database of commercially available compounds for
virtual screening. Journal of Chemical
Information and Modeling 2005, 45, 177-182. - Kouranov, A. Xie, L. de la Cruz, J. Chen, L.
Westbrook, J. Bourne, P.E. Berman, H.M. The
RCSB PDB information protal for structural
genomics. Nucleic Acids Research 2006, 34,
Database issue D302-D305 doe 101093/nar/gkj120 - Kumar, M.D.S. Gromiha, M.M. PINT
Protein-protein interactions thermodynamic
database. Nucleic Acids Research 2006, 34
Database issue D195-D198 doi 10.1093/nar/gkj017
87Bibliography Databases (contd)
- Lo Conte, L. Brenner, S.E. Hubbard, T.J.P.
Chothia, C. Murzin, A.G. SCOP database in 2002
refinements accommodate structural genomics.
Nucleic Acids Research 2002, 30(1) 264-267. - Murzin, A.G. Brenner, S.E. Hubbard, T.
Chothia, C. SCOP A structural classification of
proteins database for the investigation of
sequences and structures. Journal of Molecular
Biology 1995, 247, 536-540. - Okuno, Y. Yang, J. Taneishi, K. Yabuuchi, H.
Tsujimoto, G. GLIDA GPCR-ligand database for
chemical genomic drug discovery. Nucleic Acids
Research 2006, 34, Database issue D673-D677 doi
10.1093/nar/gkj028. - Pearl F, Todd A, Sillitoe I, Dibley M, Redfern O,
Lewis T, Bennett C, Marsden R, Grant A, Lee D,
Akpor A, Maibaum M, Harrison A, Dallman T, Reeves
G, Diboun I, Addou S, Lise S, Johnston C, Sillero
A, Thornton J, Orengo C. The CATH Domain
Structure Database and related resources Gene3D
and DHS provide comprehensive domain family
information for genome analysis. Nucleic Acids
Research. 2005, 33 Database Issue D247-D251.
88Bibliography Databases (contd)
- Wheeler, D.L. et al. Database resources of the
National Center for Biotechnology Information.
Nucleic Acids Research 2006, 34 Database Issue
D173-D180 doi 10.1093/nar/gkj158 - Wishart DS, Knox C, Guo AC, Shrivastava S,
Hassanali M, Stothard P, Chang Z, Woolsey,
Jennifer. DrugBank a comprehensive resource for
in silico drug discovery and exploration.Nucleic
Acids Res. 2006 Jan 134(Database issue)
D668-72.
89Biotech Validation Suite for Protein Structures
- Send the server a PDB file
- Server provides a comprehensive check of the
protein, including - Atomic volume analysis
- Full geometric analysis
- NMR restraint data
- http//biotech.ebi.ac.uk8400/
90Knowledge-Driven Bioinformatics Enhanced with
Chemistry
91ToxTree
- An in silico toxicology prediction suite
- Based on the CDK toolkit
- Built on CML
- Released as OpenSource under the GPL
- Standalone PC software
- User Manual http//ecb.jrc.it/DOCUMENTS/QSAR/TOXT
REE/toxTree_user_manual.pdf
92Tools for Genomic and Proteomic Scientists
vis-à-vis Cell Biology (Gagna et al.)
- Tools to fully exploit the techniques in cellular
biology - Light microscopy for high resolution images
- Fractionation of cells into basic components via
ultracentrifugation - Analysis of individual cells through flow
cytometry - LCM, normal and diseased TMAs (tissue
microarrays), quantitative computer image
analysis, cell micromanipulation, and
high-throughput microscopy
93InChI Generation on the Web
- The following websites provide the facility to
generate InChIs - www.acdlabs.com/download/chemsk.htmlACD/Labs'
freely available structure-drawing program
ChemSketch includes the facility to generate
InChIs from drawn structures. - pubchem.ncbi.nlm.nih.gov/edit/PubChem Server
Side Structure Editor v1.8 includes a facility
for generating InChIs as you draw the structure.
94Advances in Macromolcular Crystallography by CCG
- More protein structures available now
- Use of 3D info in bioinformatics makes functional
inferences more dependable - CCG Structural Family Database distributed with
MOE - Includes fold detection methodology to ID
structurally similar proteins - Simultaneous sequence and structural alignment of
large collections of proteins - 3D structural family analysis for insight into
conserved geometry, water molecules, salt
bridges, hydrogen bonds, hydrophobic contacts,
and disulfide bonds
95CCGs Cheminformatics Offerings
- MOE Molecular Database
- Mo lecular Descriptors calculated and used for
classification, clustering, filtering, and
predictive model construction - QSAR/QSPR Predictive Modeling
- Diversity and Similarity Searching
- High Throughput Conformational Search
- 3D Pharmacophore Search
96Components of the Semantic Web for Chemistry
- XML eXtensible Markup Language
- RDF Resource Description Framework
- RSS Rich Site Summary
- Dublin Core allows metadata-based newsfeeds
- OWL for ontologies
- BPEL4WS for workflow and web services
- Murray-Rust et al. Org. Biomol. Chem. 2004, 2,
3192-3203.
97Web Services Integration Projects Biosciences
- myGrid
- http//www.mygrid.org.uk/
- BIOPIPE
- http//biopipe.org/
- BioMOBY
- http//biomoby.org/
98BIOT 2006
- Major themes, areas and suggested topics include
- - Bio-molecular and Phylogenetic Databases
- - Molecular Evolution and Phylogenetic analysis
- - Drug Delivery Systems
- - Bio-Ontology and Data Mining
- - Sequence Search and Alignment
- - Microarray Analysis
- - System Biology
- - Pathway analysis
- - Identification and Classification of Genes
- - Protein Structure Prediction and Molecular
Simulation - - Functional Genomics
- - Proteomics
- - Tertiary structure prediction
- - Drug Docking
- - Gene Expression Analysis
- - Biomedical Imaging
99Proteomics What is it?
- Proteomics is the study of protein expression,
regulation, modification, and function in living
systems for understanding how living systems use
proteins. Using a variety of techniques,
proteomics can be used to study how proteins
interact within a system, or how proteins change
due to applied stresses. - Requires advanced measurement techniques,
especially separations and mass spectrometry
100Proteomics Needs Informatics for
- Locating peaks in 2 or more dimensions
- MS/MS spectra interpretation
- Protein/Peptide quantification
- Peptide detectability
- Experimental data ? Biological information
- enzyme or pathway regulation
- disease susceptibility
- drug efficacy