Title: Cheminformatics
1- Cheminformatics Pharmainformatics
2In this presentation
- Part 1 Molecular Conventions
- Part 2 Resources
- Part 3 Drug Design
- Part 4 Drug Development
3Part1
4Cheminformatics
- It is a combination of chemistry and information
technology, is required for the processing and
analysis of chemical data - Cheminformatics is relevant to biologists because
chemistry data are important in many areas of
molecular biology, e.g, in the study of protein
interactions and metabolism
5Molecular formulae
- Molecules can be represented by simple formulae,
which give the number and type of atoms - However, this does not show how they are
connected - Structural formulae provide some information
about the arrangement of atoms in a molecule and
thus allow isomers to be distinguished
6Structural representation of ethane that show
tetrahedral distribution of coordinated groups
about saturated carbon atoms. Panels (a) and (b)
show two extreme conformations. The energetically
favourable conformation (a), which predominates
in nature, has H atoms on opposite sides of C-C
bond as far as possible from each other (in the
staggered configuration). The less favourable
conformation (b) has atoms in eclipsed
configuration. Panels (c) and (d) show
conformations viewed from the end of molecule
(b)
(a)
(d)
(c)
7Structural formulae and full and simplified
structural diagrams for some common organic
compounds
Name Formula Full structure Simplified structure
Methane CH4 H H C H H
Ethane C2H6 H H H C C H H H
Ethene (ethylene) C2H4 H H H C C H
8Structural formulae and full and simplified
structural diagrams for some common organic
compounds
Name Formula Full structure Simplified structure
Cyclohexane C6H12
Ethanol C2H5OH H H H C C O H H
Ethenal (acetaldehyde) CH3CHO H H H C C O H
9Structural diagrams
- Molecules can be represented using simple graphs,
which show atoms as nodes and bonds as links - For organic molecules, further simplification is
achieved by assuming that carbon atoms make up
the molecular backbone and that the valency of
four is satisfied by hydrogen atoms unless
otherwise shown - Such diagrams present all molecules as planar
shapes an do not indicate the spatial
distribution of atoms in 3D
10Chirality
- If four different groups are coordinated around a
central carbon atom, the molecule is described as
chiral - Chiral molecules exist in two conformations,
enantiomers, which are mirror-images of each
other - Although enanciomers have the same chemical
properties, many enzymes and other proteins show
chiral sensitivity, which is important in drug
development and related fields
11Multi-chiral configuration
- Molecules may contain any number of chiral
centers and a series of forms, called
distereoisomers, may exist - These may have different chemical properties
because of the way different groups interact
within the molecule
12DL and RS conventions
- The absolute configuration of groups around a
chiral carbon atom can be described using a
number of conventions - In the DL system, molecules are named D or L
according to whether the coordinated groups are
arranged in a similar fashion to those in
D-glyceraldehyde or L-alanine - In the RS system, molecules are named R (rectus)
and S (sinister) according to the size of
chemical groups surrounding the carbon atom
13Representation of a tetrahedrally coordinated
saturated carbon atom in an organic molecule(a)
the carbon atom is at the centre of a tetrahedron
with four coordinated groups(b) simplified
representation with the central carbon
removed(c) Representation of the tetrahedron as
a flat image
C
(c)
(a)
(b)
14Chirality representation
(a) The structural formula of glyceraldehyde
gives no indication of its chirality
CH2OHCHOHCHO
L
D
D
L
(b) if the molecule is represented as a
tetrahedron, the D and L enantiomers can be
distinguished
(c) these can be shown as 2D graphs using the
Fischer convention
15Part2
16SMILES
- SMILES is a system for representing chemical
formulae as strings, based on a valence model in
which all valencies are considered to be
satisfied by hydrogen atoms unless otherwise
shown - The system has conventions for representing
different bond types, cyclic molecules, branches,
cis/trans isomers and chirality
17RasMol and Chime
- There are several specialized data formats for
chemical structures based on the principle of a
molecular formula and associated table of
connections - Viewing utilities such as RasMol and Chime can
interpret these file formats and display
interactive molecular structures in a variety of
user-defined schemes and colors
18Chemical structure and databases
- Structural information about different molecules
can be obtained from a number of comprehensive
WWW resources, including Chemical Abstracts
On-Line, Chemfinder and MedChem - Each of these resources provides a chemical
database that can be searched using a variety of
query formats, e.g., systematic name,
non-systematic name, formula, molecular weight or
CAS registry number - Search results provide physical, chemical and
biomedical information with links to other
databases and resources - MedChem also provides the SMILES string
19QSAR
- A QSAR is a statistical method used to determine
how the structural features of a molecule are
related to biological activity - The QSAR approach is particularly useful for
categorizing the activities of related molecules
with multiple functional groups - Each molecule is broken down into a series of
descriptors (molecular properties) and the QSAR
determines which descriptors are most likely to
promote biological activity - This gives rise to a set of rules that can be
used to evaluate the potential activity of new
molecules
20Part3
21Pharmainformatics
- Pharmainformatics is the combination of biology,
chemistry, mathematics and information technology
that is essential for efficient data management,
processing and analysis in the pharmaceutical
industry
22Drugs
- Drugs interact with targets, usually proteins, in
the body and through interactions cause
physiological responses - The pharmaceutical industry aims to discover
drugs with specific beneficial effects to treat
human diseases
23Gene drug life
- To know a genes chemical structure and
composition is one thing, but understanding its
actual function is another thing - Though the sequencing and analysis would help in
answering questions on aging, diseases,
disorders, and many more, a new discipline of
designer drugs is around the corner waiting for
someone to tap - Even a single nucleotide polymorphism (SNP,
pronounced snips), a T, for instance, in one of
the gene sequence, where the neighbour has a C,
can spell trouble
24Gene drug life
- Many drugs work only on 30 percent of human
population - In extreme cases, a drug that saves one person
may poison another. For instance, a type II drug
Rezulin, which has been linked to more than 60
deaths from liver toxicity worldwide - This is where in silico drug design would help
not only in reducing the designing, modeling and
testing time but also reducing the expenditure in
manpower, resources and on various phases of drug
design and development
25Areas of drug design
- For drug design, the process must be viewed from
three different dimensions viz., drug design for - Diseases such as HIV, cancer, etc. that have been
beating the people - Life style drugs
- Drugs for repairing genetic disorders
- There is an immanent need for evolving drugs for
diseases such as hepatitis C, leprosy and malaria
since these diseases are wide spread and trouble
the people at large - Other infectious diseases such as tuberculosis,
HIV, etc. are also highly troublesome
26In silico drug design
- Earlier, the drug design process used to take
many decades and was carried out haphazardly
without any direction whereas presently there is
a systems approach. Added to this are tremendous
reduction in research and production costs - Already the surge in bioinformatics solutions has
redefined the way drug trials are done making a
shift from in vitro to in silico - In silico drug design could be used to shorten
the time of drug design and this issue shall
remain the biggest challenge for years to come
27Drugs are insoluble in water
- A large portion of proteins constitute water
(2/3rd of human body consists of water) and hence
do not behave like rigid bodies due to the
presence of water in the cells and consequently,
the behavioural pattern differs from protein to
protein - Drugs normally do not dissolve in water.
Designing of drugs in silico (on chips, without
water) should consider this point
28Important areas for drug design
- The four most important areas of consideration
for successful drug design are the - binding sites
- molecular shape
- molecular size
- inhibitory properties of the proteins
29Important areas for drug design
- The study related to crystallization of membrane
protein structure also plays a vital role in drug
design. This area of research would be highly
challenging and would prove to be an excellent
foundation for further research - Since the sequence size of dengue virus is just
about 11 KB, it would be highly useful for
carrying out lot of work quickly and conveniently
30Medical applications
- Bioinformatics and drug design can be highly
useful for diagnosis and treatment of various
neurological disorders. It has been found that
many neurological disorders are due to unusual
gene structures like the triple A formation
AAA (the A of ATGC nucleotides) in the genes.
The problem becomes more complex with multiple
repeats or occurrences of triple A. More than
eight such repeats are known and in such cases
children are permanently bed ridden or has to use
wheel chairs
31Part4
32Bioinformatics in drug development
- Genomics, proteomics, combinatorial chemistry and
high-throughput screening (HTS) have all
contributed to a massive increase in the amount
of data generated by the pharmaceutical industry - The role of bioinformatics is to store, track and
provide tools for the analysis of these data
some thing like an automated environment
33Bioinformatics in drug development
- Specific applications include the modeling of
protein interactions with small molecules
allowing rational drug design, the association of
genotype and drug response patterns
(pharmacogenomics), the design and assessment of
chemical diversity in combinatorial libraries,
and the processing and storage of data from
high-throughput screens of lead compounds
34Areas of biology
Application Role of bioinformatics
Genomics/proteomics (human genome project) Genomics/proteomics (human genome project)
Characterization of human genes and proteins Target identification/ validation in the human genome Cataloging SNPs and association with drug response patterns (pharmacogenomics)
Genomics/proteomics (human pathogen genome project) Genomics/proteomics (human pathogen genome project)
Characterization of genes and proteins of organisms that are pathogenic to humans Target identification/ validation in pathogens
Functional genomics (protein structures) Functional genomics (protein structures)
Analysis of protein structures (humans and their pathogens) Prediction of drug/target interactions Rational drug design
35Areas of biology
Application Role of bioinformatics
Functional genomics (expression profiling) Functional genomics (expression profiling)
Determining gene expression patterns in disease and health Gene classification based on drug responses Pathway reconstruction
Functional genomics (genome-wide mutagenesis) Functional genomics (genome-wide mutagenesis)
Determining the mutant phenotypes for all genes in the genome Databases of animal models Target identification/ validation
Functional genomics (protein interactions) Functional genomics (protein interactions)
Determining interactions among all proteins Characterization of protein interactions Reconstruction of pathways Prediction of binding sites
36Areas of chemistry
Application Role of bioinformatics
HTS HTS
Highly parallel assay formats for lead identification Storing, tracking and analyzing data
Combinatorial chemistry Combinatorial chemistry
Synthesis of large number of chemical compounds Cataloging chemical libraries Assessing library quality/ diversity Predicting drug/target interactions
37Principles of drug development
- Drug development begins with the identification
of a suitable target, which must contribute
significantly to a human disease - Ideally, altering the activity of this target
should have a beneficial effect thus showing its
potential for therapeutic intervention - The next stage of the process is lead discovery,
where compounds showing some of the desired
activity of an ideal drug are sought
38Principles of drug development
- Optimization of lead compounds results in drug
candidates that may be registered and submitted
for clinical trials, which establish their safety
and metabolic behaviour in human subjects
39Genetic link to drugs
- An early example of the utility of bioinformatics
in drug design is cathepsin K, an enzyme that
might turn out to be an important target for
treating osteoporosis, a crippling disease caused
by the breakdown of bone - While analyzing the osteoclasts (cells that break
down bone in the normal course of bone
replenishment) taken from people with bone
tumors, it was found that osteoclasts cells were
over expressed and could be over active in
individuals with osteoporosis - They matched with a previously identified class
of molecules called cathepsins. Efforts are on
to find a potential drug to block the cathepsin K
target
40Genetic link to drugs
- Scientists believe that 99.9 percent of your
genes perfectly match those of the person sitting
beside you. But the remaining 0.1 percent of the
genes vary and it is these variations in which
the drug companies are interested in - Several years after the debut of tests for BRCA1
and BRCA2, scientists are still trying to
determine exactly to what degree those genes
contribute to a womans cancer risk
41Chemical diversity
- Diverse chemical libraries are required for
efficient lead discovery if little is known about
the binding properties of the drug target - Conversely, focused libraries are required if the
structure of the target is known, since this
defines a particular set of ligands - Chemical diversity can be defined by comparing
molecules on the basis of descriptors (functional
groups) and how these fill chemical space - A number of software tools are available for the
design and assessment of diverse or focused
chemical libraries, virtual screening against
drug targets
42Computational screening
- Software applications like DOCK and Autodock
match potential ligands to binding sites by
calculating steric constraints and bond energies - These can be used to search chemical databases
and find potential drug leads - Some applications consider the ligand and binding
site as inflexible structures, rather like pieces
of a jigsaw, while others can incorporate
flexibility into the molecules by calculating
allowable and compatible bond torsions
43Functional genomics
- The large-scale functional annotation of genes is
known as functional genomics and incorporates
areas such as homology searching, structural
analysis, expression analysis, large scale
mutagenesis and the analysis of protein
interactions - All of these areas are important in drug
development
44Genome-scale mutagenesis
- Genome-scale mutagenesis is a rich source of
animal disease models for target identification
and validation, and large mutant collections in
simple organisms can be used for the rapid
high-throughput screening of potential lead
compounds
45Approaches in functional genomics
Approach Functional annotation method
Homology searching Comparison to related sequences with known function
Protein structure determination (structural genomics) Comparison to molecules with related structure and known function
Comparative genomics Functional annotation by domain conservation, conserved phylogeny or conserved genomic organization
Expression analysis Similar expression profiles indicate conserved function
Mutagenesis Function based on mutant phenotype, e.g. knockout mice
Protein interaction screening Function based on presence in multi-subunit complex or on interaction with proteins of known function
Small molecule informatics Interaction with small molecules
46Pharmacogenomics
- It is a study of how variation in the human
population correlates with drug response patterns - The analysis of genomic data and its comparison
with drug response data allows patients to be
clustered into drug response groups, so that
appropriate drugs and dose regimens can be
administered - Variation is catalogued by analyzing data on
mutation (particularly SNPs) and gene expression
profiles
47In lab vs. out of lab effort
- The companies and individuals plug into the
effort of drug design at various points
collecting and storing data, searching databases,
and interpreting the data - The race and competition is all about who can
mine the massive information best - Just modeling or computing of the drug design or
protein structure would not be sufficient, but
lot of information on test results and clinical
trials from outside are also very important - Most of the time should be spent on this aspect
for ensuring success in drug design and
development
48Issues of drug design
- Eventhough the human genome has been sequenced,
there a number of problems awaiting for
solutions technical, legal, and social - It is absolutely not clear as to how much must
one know about a gene in order to patent it - There is also a necessity of reviewing all failed
drugs, i.e., drugs failed during clinical trails
since their molecular composition and
experimentation process could give lot of
valuable information
49- Various aspects connected to successful drug
design include supercomputing, modeling of
proteins through software, biotechnology,
computational methods and analysis, biochemistry,
in silico drug design, etc. - It is notable that a drug that works for protein
A does not work for protein B or behaves
differently due to various factors. That is why,
many drugs could fail, and hence an integrated
(team work) effort is required with tremendous
amount of information and interactions
50- At the moment, many patent applications rely on
computerized prediction techniques that are often
referred to as in silico biology - With full or partial gene sequence, scientists
enter the data into a computer program that
predicts the amino acid sequence of the resulting
protein - By comparing this hypothetical protein with known
proteins, the researchers take a guess at what
the underlying gene sequence does and how it
might be useful in developing a drug, say, or a
diagnostic test
51- Searches for compounds that bind to and have the
desired effect on drug targets still take place
mainly in a biochemists traditional wet lab,
where evaluations for activity, toxicity and
absorption can take years - But now with the bioinformatics initiatives,
tools and growing databases of protein structures
and biomolecular pathways, this aspect of drug
development is shifting to computers - As the saying goes genomics without
bioinformatics will not have much of a payoff
52Ayurveda and tribal medicine
- Till date, not much has been considered about the
biodiversity, especially research and knowledge
base on alternate medicine, Ayurveda,
herbs/shrubs applications from remote villages,
etc. - This area of medicine and study of their affect
on genes and proteins could be another
challenging and interesting area
53Future of pharmainformatics
- Drug companies collect the genetic know-how to
make medicines tailored to specific genes an
effort called pharmacogenomics - In the years to come, pharmacists may hand over
one version of blood pressure drug based on your
unique genetic profile, while the person behind
in line would get a different version of the same
medicine!! - There is going to be a day when somebody comes in
with cancer, and diagnosis can be done not on the
basis of morphology of the cancer but by looking
at the detailed patterns of gene expression and
protein-binding activities in that cell
54Target for the industry
- It is expected that in this decade, the
pharmaceutical industry will be faced with
evaluating up to 10,000 human proteins against
which new therapeutics might be directed - That is 25 times the number of drug targets that
have been evaluated by all the companies since
the dawn of the industry
55Resources
- For a primer on genetic testing and a directory
of genetic tests, visit GeneTests at
www.genetests.org - For more on the ethical, legal and social
implications of human genome research, visit the
National Human Genome Research Institutes web
site at www.nhgri.nih.gov/ELSI