Bridging Bioinformatics and Chemoinformatics - PowerPoint PPT Presentation

1 / 100
About This Presentation
Title:

Bridging Bioinformatics and Chemoinformatics

Description:

'While much bioscience is published with the knowledge that machines will be ... Enables the visualization of pre-clinical and clinical high-throughput data in ... – PowerPoint PPT presentation

Number of Views:781
Avg rating:3.0/5.0
Slides: 101
Provided by: iuin
Category:

less

Transcript and Presenter's Notes

Title: Bridging Bioinformatics and Chemoinformatics


1
Bridging Bioinformatics and Chem(o)informatics
  • Gary Wiggins
  • School of Informatics
  • Indiana University
  • wiggins_at_indiana.edu
  • Yan He (SLIS MLS Student)
  • Meredith Saba (SLIS MLS Student)

2
Provocative Thought
  • While much bioscience is published with the
    knowledge that machines will be expected to
    understand at least part of it, almost all
    chemistry is published purely for humans to
    read.
  • Murray-Rust et al. Org. Biomol. Chem. 2004, 2,
    3201.

3
Overview of the Talk
  • Review of ACS CINF 2004 Papers
  • Review of Relevant Articles
  • Public Chemistry Databases and Data Repositories
    with Bioinformatics Info/Links
  • Overview of Web Services
  • NIH-funded Projects Underway or Planned at
    Indiana University

4
The Bigger Picture Linking Bioinformatics to
Cheminformatics
  • American Chemical Society Division of Chemical
    Information (CINF) Symposium, Anaheim, Spring
    2004
  • All-day session with 16 papers
  • http//www.acscinf.org/new/docs/meetings/227nm/227
    cinfabstracts.htm

5
Problems from ACS CINF 2004
  • Both technical and people factors hinder
    knowledge exchange between biology and chemistry.
    (Lipinski)
  • People Problems per Chris Lipinski
  • Meta data capture is complicated by people
    issues, particularly those between chemists and
    biologists.
  • Discipline-based disconnects occur distressingly
    often and are frequently overlooked as a cause of
    lost productivity.

6
Interdisciplinary Collaborations Biology and
Chemistry
  • Whats ... important for these collaborations
    is, not only do you have to accept the other
    guys paradigm or at least live with it you have
    to be willing to accept the other guys foibles
    or your perception of the other guys foibles
    (and recognize the opposite of this). We each
    have our own approaches to how we do science, and
    its just different cultures.
  • --Thom Kauffman interview in ACS LiveWire, March
    2005, 7.3. http//pubs.acs.org/4librarians/livewir
    e/2006/7.3/profile.html

7
Some Questions from the ACS CINF 2004 Symposium
  • "Find all proteins related to protein A (i.e.
    within a given path length of A) in a protein
    interaction graph, and retrieve related assay
    results and compound structures.
  • Find all pathways where compound X inhibits or
    slows a reaction, and retrieve Gene Ontology
    classifications for all proteins involved in the
    reaction.

8
Problems from ACS CINF 2004
  • Commercial vs. public data
  • Batch mode data processing possible in biology,
    but primitive in chemistry
  • Primary HTS data has a very high noise factor
  • Data format standardization problem
  • Chemoinformatics and bioinformatics use
    completely different data formats and analysis
    tools
  • Chemical and protein sequence information has
    been largely analyzed separately

9
Solutions from ACS CINF 2004
  • Linking biological and chemical information in
    computational approaches to predict biological
    activity, ADME profiles, and adverse drug
    reactions (ADR)
  • Energetics of binding for more accurate and
    sensitive chemical representation of DNA-protein
    interactions
  • A discovery informatics platform that facilitates
    archival, sharing, integration, and exploration
    of synthetic methods and biological activity data

10
Solutions from ACS CINF 2004
  • Data pipelining approach makes it possible to
    apply bioinformatics and chemoinformatics data
    and analyses together.
  • Visualizations are the best way for people to
    understand data.

11
Solutions from ACS CINF 2004
  • Cabinet (Chemical And Biological Information
    NETwork, formerly Fedora) servers include
  • Metabolic pathway network chart (Empath)
  • Protein-Ligand Association Network (Planet)
  • Enzyme Commission Codebook (EC Book)
  • Traditional Chinese Medicines (TCM)
  • World Drug Index (WDI), and others.
  • Built on the Daylight HTTP toolkit
  • http//www.metaphorics.com/products/cabinet.html

12
Overview of the Talk
  • Review of ACS CINF 2004 Papers
  • Review of Relevant Articles
  • Public Chemistry Databases and Data Repositories
    with Bioinformatics Info/Links
  • Overview of Web Services
  • NIH-funded Projects Underway or Planned at
    Indiana University

13
What is Chemoinformatics? (Brown)
  • the essence of chemoinformatics is integration
    and focus rather than its components, which are
    independent disciplines.
  • Supporting disciplines
  • Chemical information
  • Computational chemistry
  • Chemometrics

14
Chemoinformatics and Disease
15
Toolkits as Integrators (Brown)
  • Companies such as Daylight, Advanced Visual
    Systems, OpenEye, and SciTegic provide
    integration systems for
  • Statistical methods
  • Text mining
  • Computational chemistry
  • Visualization

16
Genegos MetaDrug Product
  • Toxicogenomics platform for the prediction of
    human drug metabolism and toxicity of novel
    compounds
  • Enables the visualization of pre-clinical and
    clinical high-throughput data in the context of
    the complete biological system
  • Integrates chemical, biological, and protein
    function data
  • http//www.genego.com/

17
BioWisdom
  • Examination of vast amounts of available
    information using its Sofia KnowledgeScan
    methodology
  • SRS data integration platform
  • http//www.biowisdom.com/

18
Lessons from Hip Hop (Salamone)
  • Mashup technique
  • Bring together disparate informatics, biological,
    chemical, and imaging information when conducting
    research
  • Example of an integration tool iSpecies.org
  • A search for a species returns a page with NCBI
    genomics information, Yahoo images of the
    species, and articles culled from Google Scholar

19
iSpecies.org Search
  • For mus musculus

20
Chemogenomics and Chemoproteomics (Gagna)
  • Chemogenomics (def.)The description of all
    potential drugs that can be used against all
    possible target sites, OR the actions of
    target-specific chemical ligands and how they are
    used to globally examine genes
  • Chemoproteomics (def.)Uses chemistry to
    characterize protein structure and functions
  • They are . . . a form of chemical biology
    brought up to date in the area of genome and
    proteome analysis.

21
New Interdisciplinary Journals
  • ACS Chemical Biology (ACS)
  • ChemBioChem A European Journal of Chemical
    Biology (Wiley/VCH)
  • Chemical Biology and Drug Design (Blackwell)
  • JBIC Journal of Biological and Inorganic
    Chemistry (Springer)
  • Journal of Biochemical and Molecular Toxicology
    (Wiley)
  • Molecular Biosystems (RSC)
  • Nature Chemical Biology (Nature Publishing)
  • Organic Biomolecular Chemistry (RSC)

22
Open Source Software (Geldenhuys)
  • Log P calculator from Interactive Analysis
  • http//www.logp.com
  • University of Utahs Computational Science and
    Engineering Online
  • Can submit jobs for molecular mechanics, quantum
    chemical calculations, and biomolecular
    interfaces for viewing PDB files
  • http//www.cse-online.net
  • Virtual Computational Chemistry Laboratory
  • http//www.vcclab.org

23
The Blue Obelisk (Guha)
  • Several open chemistry and chemoinformatics
    projects that have pooled forces to enhance
    interoperability
  • Maintain
  • Chemoinformatics Algorithms Dictionary
  • Data Repository for standardized data for
    chemical properties and other facts (e.g., mass)
  • http//www.blueobelisk.org/

24
BlueObelisk.org
  • Working collaboratively on projects such as
  • Chemistry Development Kit (CDK)
  • JChemPaint
  • Jmol
  • JUMBO
  • NMRShiftDB
  • Octet
  • Open Babel
  • QSAR
  • World Wide Molecular Matrix (WWMM)

25
Barriers to the Use of Open Source Software
  • Unix command line
  • Problem Lack of known standards and datasets of
    compounds for validation, e.g., in docking
    programs

26
Lessons from the Human Genome Project (Austin)
  • Keys to success in the HGP were
  • Comprehensiveness
  • Commitment to open access to the sequence as a
    research tool without encumbrance
  • Proposed tools for a genome functionation
    toolbox
  • Whole-genome transcriptome and proteome
    characterization
  • Development of small inhibitory RNAs (siRNAs) and
    knockout mice for every gene
  • Small molecules and the druggable genome

27
ChemDB http//cdb.ics.uci.edu/CHEM/Web/
28
ChEBI, Chemical Entities of Biological Interest
  • Dictionary of molecular entities focused on small
    chemical compounds
  • Features an ontological classification, showing
    the relationships between molecular entities or
    classes of entities and their parents and/or
    children

29
Vioxx Entry in ChEBI
30
The IUPAC International Chemical Identifier
(InChI)
  • Open source, non-proprietary, public-domain
    identifier for chemicals
  • String of characters that uniquely represent a
    molecular substance
  • Independent of the way the chemical structure is
    drawn
  • Enables reliable structure recognition and easy
    linking of diverse data compilations
  • Accepts as input MOLfiles (or SDfiles) and CML
    files
  • Download the program to your computer at
  • http//www.iupac.org/inchi/license.html

31
Generation of InChI for Vioxx with wInChI
32
Vioxx Entry in PubChem Compounds Found with InChI
33
Vioxx Bioassay Data in PubChem
34
Vioxx PubChem Link to External Sources of
Information
35
The Elsevier MDL/NIH Link via PubChem and
DiscoveryGate
  • Cross-indexes PubChem to the Compound Index
    hosted on Elsevier MDLs DiscoveryGate platform
  • MDL added 5 million structures from PubChem to
    their index, resulting in over 14 million unique
    chemical structures
  • Links go both ways
  • Can move from biological data in PubChem to
    bioactivity, chemical sourcing, synthetic
    methodology, and EHS data in DiscoveryGate
    sources

36
Elsevier MDLs xPharm
  • Comprehensive set of records linking
  • Agents (compounds) (2300)
  • Targets (600)
  • Disorders (450)
  • Principles that govern their interactions (180)
  • Answers questions such as
  • What targets are associated with control of blood
    pressure?
  • What adverse effects are associated with
    monoamine oxidase inhibitors?

37
Text Datamining (Banville)
  • In the pharmaceutical field, it is ideally the
    marriage of biological and chemical information
    that needs to be the ultimate focus of text data
    mining applications.
  • Problems
  • Lack of universal publication standards for
    identifying each unique chemical entity
  • Selective indexing policies of AI services
  • Need to understand how chemical structures link
    to biological processes

38
Chemical Datamining Software
  • SureChem
  • http//surechem.reeltwo.com/
  • CLiDE
  • Recognizes structures, reactions, and text
  • http//www.simbiosys.ca/clide/
  • OSCAR
  • OSCAR1 to check experimental data
  • http//www.ch.cam.ac.uk/magnus/checker.html
  • http//www.rsc.org/Publishing/ReSourCe/AuthorGuide
    lines/AuthoringTools/ExperimentalDataChecker/
  • CSR (Chemical Structure Reconstruction)
  • http//www.scai.fraunhofer.de/uploads/media/MZ-ERC
    IM05_04.pdf
  • MDL DocSearchcombines MDLs Isentris platform
    and EMCs Documentum

39
Overview of the Talk
  • Review of ACS CINF 2004 Papers
  • Review of Relevant Articles
  • Public Chemistry Databases and Data Repositories
    with Bioinformatics Info/Links
  • Overview of Web Services
  • NIH-funded Projects Underway or Planned at
    Indiana University

40
Themes from SwissProts 20th Anniversary
Conference, In silico Analysis of Proteins
  • Knowledgebases, databases and other information
    resources for proteins
  • Sequence searches and alignments
  • Protein sequence analysis
  • Protein structure prediction, analysis and
    visualization
  • Proteomics data analysis

41
Chemoinformatics Databases (Jónsdóttir)
  • Lists databases relevant to drug discovery and
    development, including
  • General databases
  • DBs for screening compounds
  • DBs for medicinal agents
  • DBs with ADMET properties
  • DBs with physico-chemical properties
  • Curiously does not mention Chemical Abstracts

42
Databases with Protein and Ligand Information
(Jónsdóttir)
  • Protein Data Bank
  • Target Registration Database
  • Relibaseuses structural info to analyze
    protein-ligand interactions Relibase for
    protein-protein interaction searching
  • Cambridge Structural Database
  • KEGG LIGAND DB for enzyme reactions
  • http//www.genome.ad.jp/ligand

43
Other Databases with Protein and Ligand
Information
  • SitesBase--a database of known ligand binding
    sites within the PDB
  • http//www.bioinformatics.leeds.ac.uk/sb/main.html
  • Binding MOAD
  • http//www.bindingmoad.org/
  • sc-PDB (Kellenberger)
  • http//bioinfo-pharma.u-strasbg.fr8080/scPDB/inde
    x.jsp

44
sc-PDB http//bioinfo-pharma.u-strasbg.fr8080/sc
PDB/index.jsp
45
Isatin Search on sc-PDB
46
Other Databases with Protein-Protein Interaction
Data (Jónsdóttir)
  • YPD, Yeast Proteome Database (for proteins from
    S. cerevisiae)
  • http//www.biobase.de/pages/index.php?id139
  • Human Protein Reference Database
  • http//www.hprd.org/
  • BIND, Biomolecular Interaction Network Database
    (ceased as of 11/16/2005?)
  • http//www.bind.ca/Action

47
International Molecular Exchange (IMEx)
Consortiumhttp//imex.sourceforge.net/
  • BIND (http//www.blueprint.org) The Blueprint
    Initiative AsiaPte. Ltd, Singapore and The
    Blueprint Initiative North America,Toronto Canada
  • DIP (http//dip.doe-mbi.ucla.edu) UCLA-DOE
    Institute for Genomics Proteomics
  • IntAct (http//www.ebi.ac.uk/intact),
    EMBLEuropean Bioinformatics Institute, Hinxton,
    UK
  • MINT (http//mint.bio.uniroma2.it/mint/)
    University of Rome Tor Vergata, Rome Italy
  • MPact (http//mips.gsf.de/genre/proj/mpact), MIPS
    / Institute for Bioinformatics, Munich, Germany.

48
Protein Sites from IU I533 Students and others
  • LigandDepotintegrated source for small molecules
  • http//ligand-depot.rutgers.edu/index.html
  • PSIPRED Protein Structure Prediction Server
  • http//bioinf.cs.ucl.ac.uk/psipred/
  • DSSP--a database of secondary structure
    assignments (and much more) for all protein
    entries in the PDB
  • http//swift.cmbi.ru.nl/gv/dssp/
  • Dr. Predrag Radivojacs I690 class on Structural
    Bioinformatics
  • http//www.informatics.indiana.edu/predrag/2006spr
    ingi690/2006springi690.htm

49
Protein Secondary Structure Prediction
  • Methods
  • Neural Network
  • Rule Based
  • Other Machine Learning
  • Homology Based

50
Protein Secondary Structure Prediction Software
  • PredictProtein
  • http//www.predictprotein.org/
  • Chou-Fasman http//fasta.bioch.virginia.edu/fasta_
    www/chofas.htm
  • NN Predict
  • http//www.cmpharm.ucsf.edu/nomi/nnpredict.html

51
Structure-Based Docking Methods
  • Method
  • Scans many small molecules and docks them to a
    site of interest on a protein structure
  • Predicts free energy of binding
  • Filters thousands of compounds relatively quickly
  • Top hits can be used for more rigorous
    computational/experimental characterization and
    optimization

52
Structure-Based Docking Methods
  • DOCK
  • http//dock.compbio.ucsf.edu/
  • Accelryss Insight (built on DOCK)
  • http//www.accelrys.com/products/insight/
  • FlexX
  • http//www.biosolveit.de/FlexX/
  • Glide
  • http//www.schrodinger.com/ProductDescription.php?
    mID6sID6
  • GOLD
  • http//www.ccdc.cam.ac.uk/products/life_sciences/g
    old/

53
Useful Structure Databases
  • ModBase
  • http//modbase.compbio.ucsf.edu/modbase-cgi-new/se
    arch_form.cgi
  • Dali Database (Fold classification based on PDB)
  • http//ekhidna.biocenter.helsinki.fi/dali/start
  • Protein Structure Analysis, Comparison, /or
    Classification Guide
  • http//www.bio.vu.nl/nvtb/Structures.html

54
SCOP, Structural Classification of Proteins
  • Curated database of structural and evolutionary
    relationships
  • All known protein folds (v. 1.69, July 2005)
  • 70,859 domains organized into 2,845 families,
    1,539 superfamilies, and 945 folds
  • Detailed information about close relatives
  • Links to coordinates, images of structures,
    interactive viewers, and literature references
  • http//scop.mrc-lmb.cam.ac.uk/scop/

55
SCOP Search Options
  • Homology search yields a list of structures with
    significant levels of sequence similarity
  • Keyword search matches words in SCOP and PDB

56
CATH Protein Structure Classification
  • Like SCOP, structured hierarchically by
  • Class (determined by secondary structure)
  • Architecture (overall shape, e.g., barrel,
    sandwich, roll, etc.) no equivalent in SCOP
  • Topology (grouped into fold families based on
    overall shape and connectivity of secondary
    structures)
  • Homologous Superfamily (domains thought to share
    a common ancestor)
  • As of January 2005, had 43,229 domains classified
    into 1,467 superfamilies and 5,107 sequence
    families A protein family database (CATH-PFDB)
    contained a total of 616,470 domain sequences
    classified into 23,876 sequence families
  • http//cathwww.biochem.ucl.ac.uk/latest/index.html

57
CATH Search Options
  • Can browse or search the classification by CATH
    code
  • CATH codes can be used to search other databases,
    e.g., DHS, Gene3D, and Impala

58
Gasteigers Biochemical Pathways Database
  • Database of biochemical pathways that represents
    chemical structures and reactions on the atomic
    level
  • Gives access to each atom and bond of the
    substrates of enzyme reactions
  • Allows the study of transition state hypotheses
    of enzyme reactions
  • Analysis of the physicochemical effects operating
    at the reaction site allows a classification of
    enzyme reactions that goes beyond the traditional
    EC code for enzymes.
  • 1533 biochemical molecules and 2175 reactions
  • http//www2.chemie.uni-erlangen.de/services/biopat
    h/index.html

59
A Gene Expression Database for NCI60 (Scherf)
  • Published in Nature Genetics, 2000
  • First study to integrate gene expression with
    molecular pharmacology databases
  • Gene expression profiles for NCI60 assessed
    using microarray technology
  • Gene-drug relationships investigated by how the
    gene transcription levels vary with respect to
    drug activities

60
Correlation Matrix Between Drug Activity and Gene
Expression
61
Other Relevant Databases/Servers
  • Each year Nucleic Acids Research publishes a
    Database Issue in January and a Web Server Issue
    in July (See refs in Bibliography section).
    Examples from the most recent issues

62
Overview of the Talk
  • Review of ACS CINF 2004 Papers
  • Review of Relevant Articles
  • Public Chemistry Databases and Data Repositories
    with Bioinformatics Info/Links
  • Overview of Web Services
  • NIH-funded Projects Underway or Planned at
    Indiana University

63
Web Services Overview
  • What are Web Services?
  • A distributed invocation system built on Grid
    computing
  • Independent of platform and programming language
  • Built on existing Web standards
  • A service oriented architecture with
  • Interfaces based on Internet protocols
  • Messages in XML (except for binary data
    attachments)

64
Service-Oriented Architecture
  • From Curcin et al. DDT, 2005, 10(12),867

65
Web Services for Chemistry Problems
  • Performance and scalability
  • Proprietary data
  • Competition from high-performance desktop
    applications
  • -- Geoff Hutchison, its a puzzle blog,
    2005-01-05
  • ALSO
  • Lack of a substantial body of trustworthy Open
    Access databases
  • Non-standard chemical data formats (over 40 in
    regular use and requiring normalization to one
    another)

66
Overview of the Talk
  • Review of ACS CINF 2004 Papers
  • Review of Relevant Articles
  • Public Chemistry Databases and Data Repositories
    with Bioinformatics Info/Links
  • Overview of Web Services
  • NIH-funded Projects Underway or Planned at
    Indiana University

67
Indiana University Planned Projectshttp//www.ch
embiogrid.org
  • Design of a Grid-based distributed data
    architecture
  • Development of tools for HTS data analysis and
    virtual screening
  • Database for quantum mechanical simulation data
  • Chemical prototype projects
  • Novel routes to enzymatic reaction mechanisms
  • Mechanism-based drug design
  • Data-inquiry-based development of new methods in
    natural product synthesis

68
Web Services for Chemistry at IU
69
NCI Developmental Therapeutics Program (DTP)
  • Downloadable data
  • In vitro 60 cell line results
  • in vitro anti-HIV results
  • Yeast assay
  • 200,000 chemical structures
  • molecular targets
  • microarray data
  • Or search the database at
  • http//dtp.nci.nih.gov/docs/dtp_search.html

70
IU Database of NIH DTP Data
  • Contains over 200,000 chemical structures tested
    in 60 cellular assays from different human tumor
    cell lines
  • Also includes microarray assay profiles for the
    untreated cell lines (14,000 datapoints)
  • A local PostgreSQL database containing the data
    that is exposed as a web service
  • Using workflows and complex SQL queries, we can
    do advanced data mining that exploits the
    chemical, biological and genomic information for
    particular audiences (chemists, biologists, etc)

71
Mining the NIH DTP database
14,000 gene expression values
60 cell lines
Cell lines can be clustered based on gene
expression similarity
200,000 compounds
Compounds can be clustered based on similarity of
profile across cell lines, or by chemical
structure fingerprint similarity
72
Use of Taverna at IU
  • A protein implicated in tumor growth is supplied
    to the docking program (in this case HSP90 taken
    from the PDB 1Y4 complex)
  • The workflow employs our local NIH DTP database
    service to search 200,000 compounds tested in
    human tumor cellular assays for similar
    structures to the ligand.
  • Client portlets are used to browse these
    structures
  • Once docking is complete, the user visualizes the
    high-scoring docked structures in a portlet using
    the JMOL applet.
  • Similar structures are filtered for drugability,
    and are automatically passed to the OpenEye FRED
    docking program for docking into the target
    protein.
  • A 2D structure is supplied for input into the
    similarity search (in this case, the extracted
    bound ligand from the PDB IY4 complex)
  • Correlation of docking results and biological
    fingerprints across the human tumor cell lines
    can help identify potential mechanisms of action
    of DTP compounds

73
Taverna Workflow
Workflow definition
Available web services (WSDL)
Visual depiction of workflow
74
Taverna in Action
75
Overall Workflow
76
Pre-Closing Quote
  • There is not going to be a voila moment at the
    computer terminal. Instead, there is systematic
    use of wide-ranging computational tools to
    facilitate and enhance the drug discovery
    process.
  • Jorgensen. Science, March 19, 2004, 303, 1814.

77
Closing quote
  • The future of chemistry depends on the
    automated analysis of chemical knowledge,
    combining disparate data sources in a single
    resource, such as the World-Wide Molecular
    Matrix, which can be analysed using computational
    techniques to assess and build on these data.
  • Townsend et al. Org. Biomol. Chem. 2004, 2, 3299.

78
Post-closing quote zzzzzCAS
  • In an industry first, Chemical Abstracts Service
    (CAS) has unveiled a revolutionary new literature
    searching tool which will permit scientists to
    search and retrieve the worlds chemical
    literatureincluding patents and obscure
    technical reportsin their sleep.
  • --Author unknown

79
Acknowledgements
  • Randy Arnold
  • Xiao Dong
  • Sean Mooney
  • Peter Murray-Rust
  • David J. Wild
  • I533 Chemical Informatics Seminar Students
  • Elsevier Science

80
Bibliography Articles, Books, and Conference
Papers
  • The Bigger Picture Linking Bioinformatics to
    Cheminformatics CINF Symposium Abstracts
    1-16, 227th ACS National MeetingAnaheim, CA,
    March 28-April 1, 2004 http//www.acscinf.org/new/
    docs/meetings/227nm/227cinfabstracts.htm
  • Austin, C.P. The completed human genome
    implications for chemical biology. Current
    Opinion in Chemical Biology 2003, 7, 511-515.
  • Bajorath, Jürgen, ed. Chemoinformatics concepts,
    methods, and tools for drug discovery. Totowa,
    N.J. Humana Press, c2004. (Methods in molecular
    biology v. 275)
  • Banville, Debra L. Mining chemical structural
    informationo from the drug literature. Drug
    Discovery Today January 2006, 11(1/2), 35-42.
  • Brown F. Editorial opinion chemoinformatics - a
    ten year update.Current Opinion in Drug
    Discovery and Development 2005 May 8(3)
    298-302.

81
Bibliography Articles (contd)
  • Coles, Simon J. Day, Nick E. Murray-Rust,
    Peter Rzepa, Henry S. Zhang, Yong. Enhancement
    of the chemical semantic web through
    InChIfication. Organic Biomolecular Chemistry
    2005, 3, 1832-1834.
  • Curcin, Vera Ghanem, Moustafa Guo, Yike. "Web
    services in the life sciences." Drug Discovery
    Today 2005, 10(12), 865-871.
  • Gagna CE, Winokur D, Clark Lambert W. Cell
    biology, chemogenomics and chemoproteomics. Cell
    Biol Int. 2004 28(11) 755-64.
  • Geldenhuys, W.J. Gaasch, K.E. Watson, M.
    Allen, D.D.Van Der Schyf, C.J. Optimizing the
    use of open-source software applications in drug
    discovery. Drug Discovery Today February 2006,
    11(3/4), 127-132.
  • Guha, R. Howard, M.T. Hutchison, G.R.
    Murray-Rust, P. Rzepa, H. Steinbeck, C Wegner,
    J. Willighagen, E.L. The Blue
    ObeliskInteroperability in chemical
    informatics. Journal of Chemical Information and
    Modeling 2006 Web Release Date 22-Feb-2006 DOI
    10.1021/ci050400b

82
Bibliography Articles (contd)
  • Jónsdóttir, S.O. Jorgensen, F.S. Brunak, S.
    Prediction methods and databases within
    chemoinformatics emphasis on drugs and drug
    candidates. Bioinformatics 2005 May 15 21(10)
    2145-60.
  • Jorgensen, William L. The many roles of
    computation in drug discovery. Science March 19,
    2004, 303, 1813-1818.
  • Kauffman, Thom. Profile. interview LiveWire,
    March 2005, 7.3 http//pubs.acs.org/4librarians/l
    ivewire/2006/7.3/profile.html
  • Murray-Rust, Peter S. Mitchell, John B.O.
    Rzepa, Henry S. Communication and re-use of
    chemical information in bioscience. BMC
    Bioinformatics 2005, 6, 180.
  • Murray-Rust, Peter Mitchell, John B.O. Rzepa,
    Henry S. Chemistry in bioinformatics. BMC
    Bioinformatics 2005, 6, 141-144.
  • Povolna, Vera Dixon, Scott Weininger, David.
    CabinetChemical and Biological Informatics
    NETwork. in Oprea, Tudor I., ed.
    Chemoinformatics in Drug Discovery. Weinheim
    Wiley-VCH, 2004, 241-269.

83
Bibliography Articles (contd)
  • Salamone, Salvatore. Hip Hop offers lessons on
    life sciences data integration. Bio-IT World
    February 2006, 36.
  • Scherf Uwe, Ross Douglas T., Waltham Mark, Smith
    Lawrence H., Lee Jae K., Tanabe Lorraine, Kohn
    Kurt W., Reinhold William C., Myers Timothy G.,
    Andrews Darren T., Scudiero Dominic A., Eisen
    Michael B., Sausville Edward A., Pommier Yves,
    Botstein David, Brown Patrick O., Weinstein John
    N. A gene expression database for the molecular
    pharmacology of cancer. Nature Genetics 2000,
    24, 236-244.
  • Souchelnytskyi, S. "Bridging proteomics and
    systems biology What are the roads to be
    traveled?" Proteomics 2005 (November), 5(16),
    4123-4137.
  • Tetko, Igor V. Computing chemistry on the web.
    Drug Discovery Today November 2005, 10(22),
    1497-1500.

84
Bibliography Articles (contd)
  • Zimmermann, Marc Thi, Le Thuy Bui Hofmann,
    Martin. Combating illiteracy in chemistry
    Towards computer-based chemical structure
    reconstruction. ERCIM News January 2005, 60,
    40-41.
  • http//www.scai.fraunhofer.de/uploads/media/MZ-ERC
    IM05_04.pdf
  • Zimmermann, Marc Fluck, Juliane Thi, Le Thuy
    Bui Kolarik, Corinna Kumpf, Kai Hofmann,
    Martin. Information extraction in the life
    sciences Perspectives for medicinal. chemistry,
    pharmacology and toxicology. Current Topics in
    Medicinal Chemistry 2005, 5(8), 785-796.

85
Bibliography Databases
  • Andreeva, A. Howorth, D. Brenner, S.E.
    Hubbard, T.J.P. Chothia, C. Murzin, A.G. SCOP
    database in 2004 refinements integrate structure
    and sequence family data. Nucleic Acids Research
    2004, 32 Database issue D226-D229 doi
    10.1093/nar/gkh039
  • Chen J, Swamidass SJ, Dou Y, Bruand J, Baldi P.
    ChemDB a public database of small molecules and
    related chemoinformatics resources.
    Bioinformatics. 2005 Nov 15 21(22) 4133-9.
  • Dunkel, M. Fullbeck, M. Neumann, S. Preissner,
    R. SuperNatural a searchable database of
    available natural compounds. Nucleic Acids
    Research 2006, 34, Database issue D678-D683 doi
    10.1093/nar/gkj132
  • Gold, Nicola D. Jackson, Richard M. A
    searchable database for comparing protein-ligand
    binding site for the analysis of
    structure-function relationships. Journal of
    Chemical Information and Modeling 2006, 46(2),
    736-742.

86
Bibliography Databases (contd)
  • Kanehisa, M. Goto, S. Hattori, M.
    Aoki-Kinoshita, F. Itoh, M. Kawashima, S.
    Katayama, T. Araki, M Hirakawa, M. From
    genomics to chemical genomics new developments
    in KEGG. Nucleic Acids Research 2006, 34,
    Database issue D354-D357. doi
    101093/nar/gkj102.
  • Kellenberger, Esther Muller, Pascal Schalon,
    Clarire Bret, Guillaume Foata, Nicolas Rognan,
    Didier. sc-PDB An annotated database of
    druggable binding sites from the Protein Data
    Bank. Journal of Chemical Information and
    Modeling 2006, 46(2), 717-727.
  • Kirwin, J.J. Shoichet, B.K. ZINCA free
    database of commercially available compounds for
    virtual screening. Journal of Chemical
    Information and Modeling 2005, 45, 177-182.
  • Kouranov, A. Xie, L. de la Cruz, J. Chen, L.
    Westbrook, J. Bourne, P.E. Berman, H.M. The
    RCSB PDB information protal for structural
    genomics. Nucleic Acids Research 2006, 34,
    Database issue D302-D305 doe 101093/nar/gkj120
  • Kumar, M.D.S. Gromiha, M.M. PINT
    Protein-protein interactions thermodynamic
    database. Nucleic Acids Research 2006, 34
    Database issue D195-D198 doi 10.1093/nar/gkj017

87
Bibliography Databases (contd)
  • Lo Conte, L. Brenner, S.E. Hubbard, T.J.P.
    Chothia, C. Murzin, A.G. SCOP database in 2002
    refinements accommodate structural genomics.
    Nucleic Acids Research 2002, 30(1) 264-267.
  • Murzin, A.G. Brenner, S.E. Hubbard, T.
    Chothia, C. SCOP A structural classification of
    proteins database for the investigation of
    sequences and structures. Journal of Molecular
    Biology 1995, 247, 536-540.
  • Okuno, Y. Yang, J. Taneishi, K. Yabuuchi, H.
    Tsujimoto, G. GLIDA GPCR-ligand database for
    chemical genomic drug discovery. Nucleic Acids
    Research 2006, 34, Database issue D673-D677 doi
    10.1093/nar/gkj028.
  • Pearl F, Todd A, Sillitoe I, Dibley M, Redfern O,
    Lewis T, Bennett C, Marsden R, Grant A, Lee D,
    Akpor A, Maibaum M, Harrison A, Dallman T, Reeves
    G, Diboun I, Addou S, Lise S, Johnston C, Sillero
    A, Thornton J, Orengo C. The CATH Domain
    Structure Database and related resources Gene3D
    and DHS provide comprehensive domain family
    information for genome analysis. Nucleic Acids
    Research. 2005, 33 Database Issue D247-D251.

88
Bibliography Databases (contd)
  • Wheeler, D.L. et al. Database resources of the
    National Center for Biotechnology Information.
    Nucleic Acids Research 2006, 34 Database Issue
    D173-D180 doi 10.1093/nar/gkj158
  • Wishart DS, Knox C, Guo AC, Shrivastava S,
    Hassanali M, Stothard P, Chang Z, Woolsey,
    Jennifer. DrugBank a comprehensive resource for
    in silico drug discovery and exploration.Nucleic
    Acids Res. 2006 Jan 134(Database issue)
    D668-72.

89
Biotech Validation Suite for Protein Structures
  • Send the server a PDB file
  • Server provides a comprehensive check of the
    protein, including
  • Atomic volume analysis
  • Full geometric analysis
  • NMR restraint data
  • http//biotech.ebi.ac.uk8400/

90
Knowledge-Driven Bioinformatics Enhanced with
Chemistry
91
ToxTree
  • An in silico toxicology prediction suite
  • Based on the CDK toolkit
  • Built on CML
  • Released as OpenSource under the GPL
  • Standalone PC software
  • User Manual http//ecb.jrc.it/DOCUMENTS/QSAR/TOXT
    REE/toxTree_user_manual.pdf

92
Tools for Genomic and Proteomic Scientists
vis-à-vis Cell Biology (Gagna et al.)
  • Tools to fully exploit the techniques in cellular
    biology
  • Light microscopy for high resolution images
  • Fractionation of cells into basic components via
    ultracentrifugation
  • Analysis of individual cells through flow
    cytometry
  • LCM, normal and diseased TMAs (tissue
    microarrays), quantitative computer image
    analysis, cell micromanipulation, and
    high-throughput microscopy

93
InChI Generation on the Web
  • The following websites provide the facility to
    generate InChIs
  • www.acdlabs.com/download/chemsk.htmlACD/Labs'
    freely available structure-drawing program
    ChemSketch includes the facility to generate
    InChIs from drawn structures.
  • pubchem.ncbi.nlm.nih.gov/edit/PubChem Server
    Side Structure Editor v1.8 includes a facility
    for generating InChIs as you draw the structure.

94
Advances in Macromolcular Crystallography by CCG
  • More protein structures available now
  • Use of 3D info in bioinformatics makes functional
    inferences more dependable
  • CCG Structural Family Database distributed with
    MOE
  • Includes fold detection methodology to ID
    structurally similar proteins
  • Simultaneous sequence and structural alignment of
    large collections of proteins
  • 3D structural family analysis for insight into
    conserved geometry, water molecules, salt
    bridges, hydrogen bonds, hydrophobic contacts,
    and disulfide bonds

95
CCGs Cheminformatics Offerings
  • MOE Molecular Database
  • Mo lecular Descriptors calculated and used for
    classification, clustering, filtering, and
    predictive model construction
  • QSAR/QSPR Predictive Modeling
  • Diversity and Similarity Searching
  • High Throughput Conformational Search
  • 3D Pharmacophore Search

96
Components of the Semantic Web for Chemistry
  • XML eXtensible Markup Language
  • RDF Resource Description Framework
  • RSS Rich Site Summary
  • Dublin Core allows metadata-based newsfeeds
  • OWL for ontologies
  • BPEL4WS for workflow and web services
  • Murray-Rust et al. Org. Biomol. Chem. 2004, 2,
    3192-3203.

97
Web Services Integration Projects Biosciences
  • myGrid
  • http//www.mygrid.org.uk/
  • BIOPIPE
  • http//biopipe.org/
  • BioMOBY
  • http//biomoby.org/

98
BIOT 2006
  • Major themes, areas and suggested topics include
  • - Bio-molecular and Phylogenetic Databases
  • - Molecular Evolution and Phylogenetic analysis
  • - Drug Delivery Systems
  • - Bio-Ontology and Data Mining
  • - Sequence Search and Alignment
  • - Microarray Analysis
  • - System Biology
  • - Pathway analysis
  • - Identification and Classification of Genes
  • - Protein Structure Prediction and Molecular
    Simulation
  • - Functional Genomics
  • - Proteomics
  • - Tertiary structure prediction
  • - Drug Docking
  • - Gene Expression Analysis
  • - Biomedical Imaging

99
Proteomics What is it?
  • Proteomics is the study of protein expression,
    regulation, modification, and function in living
    systems for understanding how living systems use
    proteins. Using a variety of techniques,
    proteomics can be used to study how proteins
    interact within a system, or how proteins change
    due to applied stresses.
  • Requires advanced measurement techniques,
    especially separations and mass spectrometry

100
Proteomics Needs Informatics for
  • Locating peaks in 2 or more dimensions
  • MS/MS spectra interpretation
  • Protein/Peptide quantification
  • Peptide detectability
  • Experimental data ? Biological information
  • enzyme or pathway regulation
  • disease susceptibility
  • drug efficacy
Write a Comment
User Comments (0)
About PowerShow.com