The NIH Roadmap and PubChem - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

The NIH Roadmap and PubChem

Description:

Series of initiatives designed to pursue major opportunities in biomedical ... data with the gene expression and sequence data in the bioinformatics world ... – PowerPoint PPT presentation

Number of Views:199
Avg rating:3.0/5.0
Slides: 41
Provided by: iuin
Category:
Tags: nih | pubchem | roadmap

less

Transcript and Presenter's Notes

Title: The NIH Roadmap and PubChem


1
The NIH Roadmapand PubChem
  • Gary Wiggins
  • I533
  • Spring 2006

2
NIH Roadmap
  • Series of initiatives designed to pursue major
    opportunities in biomedical research and gaps in
    current knowledge that cannot be addressed by any
    single NIH Institute or Center
  • Goal enable rapid transformation of new
    scientific knowledge into tangible benefits for
    public health
  • http//nihroadmap.nih.gov/

3
NIH Molecular Libraries and Imaging Initiative
  • Part of the New Pathways to Discovery area
  • Goal augment the toolbox for understanding the
    functionally interconnected molecular events
    that maintain health and lead to disease
  • Build on high-throughput, highly specific,
    mechanism-based biological assays
  • Aims to develop and discover small molecules that
    hold promise as research tools to probe cellular
    physiology and pathophysiology

4
NIH Molecular Imaging Roadmap
  • High specificity/high sensitivity molecular
    imaging probes
  • Molecular imaging and contrast database
  • Imaging probe development center

5
NIH Roadmap Molecular Libraries Initiative (MLI)
  • A series of integrated research programs with the
    goal of making small molecule screening and
    screening data more widely available to the
    research community
  • http//nihroadmap.nih.gov/molecularlibraries/index
    .asp

6
MLI Aims
  • Go beyond the identification of compounds with
    potential therapeutic properties
  • Will result in the identification of compounds to
    use as probes to study cellular processes in
    health and disease
  • Biological screening data, assay protocols, and
    chemical structures for compounds to be publicly
    available in PubChem

7
NIH MLI Components
  • Molecular Libraries Screening Center Network
    (MLSCN)
  • Cheminformatics (centered around PubChem)
  • Technology development

8
NIH MLI Technology Development Areas
  • Chemical diversity
  • Pilot-scale libraries for investigation of novel
    chemical diversity space
  • Novel methods for natural product chemistry
  • Development of assays
  • Novel instrumentation and detection technologies
    for high throughput screening
  • Datasets and algorithms for better prediction of
    absorption, distribution, metabolism, excretion,
    and toxicity properties of small molecules

9
Assay Guidance Manual
  • Originally written as a guide for therapeutic
    projects teams within Eli Lilly covers
  • Identifying potential assay formats compatible
    with High Throughput Screen (HTS) and Structure
    Activity Relationship (SAR)
  • Developing optimal assay reagents
  • Optimizing assay protocol with respect to
    sensitivity, dynamic range, signal intensity and
    stability
  • Adaptation of the assay to the microtiter plate
    formats
  • Validation of the assay performance
  • Orthogonal follow-up assays for chemical probe
    validation and refinement
  • http//www.ncgc.nih.gov/guidance/index.html

10
NIH Molecular Libraries Small Molecule Repository
  • Run under contract by Discovery Partners
    International
  • Collects samples for high throughput biological
    screening and distributes them to the NIH
    Molecular Libraries Screening Center Network
  • http//mlsmr.discoverypartners.com/MLSMR_HomePage/

11
Roadmap MLI Funded Areas
  • Molecular Libraries Screening Centers (MLSCN)
  • Ten of them at academic institutions
  • NIH Chemical Genomics Center
  • http//www.ncgc.nih.gov/
  • http//nihroadmap.nih.gov/molecularlibraries/funde
    dresearch.asp

12
Roadmap MLI Funded Areas
  • Submitting assays for HTS in the MLSCN
  • 28 different submissions
  • Pilot-scale libraries for HTS (8)
  • New methodologies for natural product chemistry
    (6)
  • Assay development for HT molecular Screening (39)
  • Molecular libraries screening instrumentation (4)

13
Roadmap MLI Funded Areas
  • Novel preclinical tools for predictive
    ADME-Toxicology (5)
  • Innovation in molecular imaging probes (11)
  • Development of high-resolution probes for
    cellular imaging (9)

14
Roadmap MLI Funded Areas
  • Exploratory Centers for Cheminformatics Research
    at
  • Indiana University
  • University of Michigan
  • Rensselaer Polytechnic Institute
  • MIT
  • North Carolina State University, Raleigh
  • University of North Carolina, Chapel Hill

15
IU Projects Underway
  • Innovative cross-screen analysis of NIH
    Developmental Therapeutics Project Human Tumor
    Cell Line data
  • Development of cheminformatics web services and
    use cases in Taverna
  • Development of a novel interface for the analysis
    of PubChem HTS data
  • A structure storage and searching system for
    Distributed Drug Discovery
  • Quantum chemical computer simulations database
  • Training modules for cheminformatics instruction
    on the Web
  • Web guide for essential cheminformatics resources
    (http//www.indiana.edu/cheminfo/cicc/resources.h
    tml)
  • Design of a grid-based distributed data
    architecture for chemistry

16
NIH NCI Developmental Therapeutics Program
  • The NCI has been collecting and testing compounds
    for 50 years. For about 30 years this has been
    managed by the Developmental Therapeutics Program
    (DTP). From 1955 to 1985 the primary test was to
    look for increase in survival of mice bearing
    transplantable tumors. In 1990, the primary
    screen switched to looking for inhibition of
    growth of 60 human tumor cell lines in culture.
    DTP also ran the anti-HIV screen for about 10
    years and managed the yeast anti-cancer screen in
    which compounds were tested for their ability to
    inhibit the growth of yeast strains with defined
    mutations in cell cycle genes. These assays
    provide the bulk of the data DTP makes publicly
    available.

17
NIH NCI DTP
  • DTPs correlation analyses allow one to associate
    a list of genes with a given compound or vice
    versa
  • Want to get workflows running that integrate
    chemical structure data with the gene expression
    and sequence data in the bioinformatics world
  • Need help in the practical details of creating
    web services that will work in the mygrid/Taverna
    (or equivalent) framework

18
NIH DTP Data
19
NCI Panel of 60 Human Cell Cancer Lines
  • Protein levels
  • RNA measurements
  • Mutation status
  • Enzyme activity levels

20
NIH DTPs COMPARE Program
  • The pattern of activity across all 60 cell lines
    that a compound exhibits is related to the
    mechanism of action
  • Can be used to discover the mechanism of a
    compounds actions by looking at which compounds
    of known activity are correlated with the unknown
  • Has been used to discover novel compounds with a
    given activity by testing the top correlating
    compounds to a compound with the activity of
    interest
  • Used to prioritize compounds that seem to have a
    novel mechanism
  • Calculates a correlation coefficient between two
    vectors in 60-dimensional space

21
NIH DTP
  • Given a compound tested in the 60 cell assay, one
    can look for the genes whose expression most
    highly correlates with the ability of the
    compound to inhibit cell growth. Conversely,
    given a gene, one can look for compounds whose
    ability to inhibit cell growth is most highly
    correlated with the expression of that gene.

22
NIH DTP Needs
  • Grid Web services
  • Visualization may use VOTables
  • Tools to squish a set of points in a large
    dimensional space down into 2D or 3D while
    attempting to preserve the relative distances
  • Looking at the nearest neighbors of the point of
    interest with such a map could reveal relations
    that would be missed in just a table listed by
    distance

23
NIH DTP Main Search Page
  • http//dtp.nci.nih.gov/docs/dtp_search.html

24
High-Throughput Screening (HTS)
  • the integration of biological, chemical and
    clinical data
  • automated standardized statistical analysis of
    large and complex data volumes
  • biological and chemical profiling by use of
    statistical analyses on combined data from
    screening, pharmacological profiling, and
    structural properties

25
Other Potential Partners
  • Center for Chemical Genomics at the University of
    Michigan
  • http//www.lifesciences.umich.edu/institute/labs/c
    cg/index.html
  • Milos Novotny (IUB Chemistry) 3.5 million
    National Center for Research Resources (NIH)
    grant to conduct research in the analysis of
    glycoproteins
  • David Flockhart (IUB School of Medicine)
    Cytochrome P450 database http//medicine.iupui.edu
    /flockhart/

26
PubChem
  • 5,298,729 compounds as of 1/16/2006
  • the place to go for biological and related data
  • the central depository of all information related
    to the NIH Roadmap project
  • expected that the actual data will reside there,
    and only some things may be held elsewhere, with
    PubChem acting as a pointer
  • May even have the images from screens and assays
  • chemical structures from Elsevier's xPharm
    database

27
PubChem Data (as of 10/25/2005)
  • Bioassays deposited 177
  • Bioassay test results 3,158,669
  • Substances deposited 7,848,390
  • Unique Substances 5,269,228

28
PubChem Technical Details
  • Entrez database system
  • For all textual information in the database
  • NCBI Toolkit - an open-source infrastructure
    toolkit
  • OpenEye OEChem toolkit and associated software
  • for most structure standardization tasks, plus
    some structure identifier computations like
    SMILES and IUPAC name generation.
  • NIST InChI library
  • for computing the InChI identifier
  • CACTVS Chemoinformatics Toolkit
  • for structure depictions, structure database
    system, structure query execution, structure
    deduplication, some property calculations and the
    WWW structure and image editors
  • Various general low-level support libraries,
    e.g.,
  • zlib, png, gd and freetype libraries
  • In-house code
  • for the queuing system, deposition system,
    display CGIs, structure standardization set-up,
    update scripts, etc.

29
PubChem Database Display and Query Subsystems - 1
  • A special Entrez version
  • stores textual and numerical data
  • hosted on a MS SQL Server relational database
    cluster
  • holds precomputed structure images for display,
    ASN.1 structure data blobs for download, and
    extensive crosslinking functions for linking to
    other NCBI databases

30
PubChem Display and Query Subsystems - 2
  • structure search component
  • based on the CACTVS structure search system
  • pseudo-relational in nature (the underlying
    storage manager is the Sleepycat BDB database
    manager)
  • hosted on a Linux server cluster
  • structure search file is not stored in the SQL
    database, but there is an automatic
    synchronization and update mechanism
  • Some data, such as Lipinski filter criteria, are
    stored in both databases

31
PubChem Programming Utilities
  • Entrez Programming Utilities
  • http//eutils.ncbi.nlm.nih.gov/entrez/query/static
    /eutils_help.html
  • CACTVS chemoinformatics toolkit
  • a full ASN.1 parser for CACTVS understands the
    full data spec for structures and assay data
  • modules for talking to the Entrez database for
    accessing structure blobs and some other NCBI
    systems

32
PubChem Data Deposition
  • PubChem Deposition Gateway
  • http//pubchem.ncbi.nlm.nih.gov/deposit/deposit.cg
    i

33
PubChem Sketcher
  • No need to worry about the type of structure
    definition displayed in the top line
  • uses a hidden internal representation to transfer
    the information
  • http//pubchem.ncbi.nlm.nih.gov/search/

34
InChI, The IUPAC International Chemical
Identifier
  • Official site http//www.iupac.org/inchi/
  • Unofficial InChI FAQ
  • http//wwmm.ch.cam.ac.uk/inchifaq/
  • WSDL InChI server at
  • http//wwmm.ch.cam.ac.uk/gridsphere/gridsphere

35
Searching InChIs
  • Sample search
  • InChI1/C17H14O4S/c1-22(19,20)14-9-7-12(8-10-14)1
    5-11-21-17(18)16(15)13-5-3-2-4-6-13/h2-10H,11H2,1H
    3
  • Must include the quotation marks
  • no carriage return or line feed in the string
  • InChI code for C60 fullerene
  • InChI1/C60/c1-2-5-6-3(1)8-12-10-4(1)9-11-7(2)17-2
    1-13(5)23-24-14(6)22-18(8)28-20(12)30-26-16(10)15(
    9)25-29-19(11)27(17)37-41-31(21)33(23)43-44-34(24)
    32(22)42-38(28)48-40(30)46-36(26)35(25)45-39(29)47
    (37)55-49(41)51(43)57-52(44)50(42)56(48)59-54(46)5
    3(45)58(55)60(57)59

36
ACD Labs and InChIs
  • Transferring structures from PubChem to
    ACD/ChemSketch
  • http//www.acdlabs.com/download/technotes/90/draw_
    db/pubchem.pdf

37
InChI Support in BKChem
  • BKchem - a free chemical drawing program
  • Successfully reads most InChIs
  • http//bkchem.zirael.org/inchi_en.html

38
InChI
  • PubChem sketcher also supports generation of
    InChI strings
  • http//pubchem.ncbi.nlm.nih.gov/edit/
  • change the format selector to "InChI"

39
Protein Data Bank (PDB) Data Dictionaries
  • develop software and data definitions to support
    the structural genomics efforts
  • enable high-throughput data deposition
  • data dictionaries define items at the level of
    detail of the materials and methods section of a
    journal
  • uses macromolecular Crystallographic Information
    File (mmCIF) data dictionaries
  • http//mmcif.pdb.org/index.html

40
Translate WSDL to Human Readable Form
  • http//soapclient.com/soaptest.html
Write a Comment
User Comments (0)
About PowerShow.com