Cheminformatics

About This Presentation

Title:

Cheminformatics

Description:

The less favourable conformation (b) has atoms in eclipsed configuration. ... of a tetrahedrally coordinated saturated carbon atom in an organic molecule ... – PowerPoint PPT presentation

Number of Views:651

Avg rating:3.0/5.0

Slides: 56

Provided by: tvisw

Category:

more less

Transcript and Presenter's Notes

Title: Cheminformatics

1

Cheminformatics Pharmainformatics

2
In this presentation

Part 1 Molecular Conventions
Part 2 Resources
Part 3 Drug Design
Part 4 Drug Development

3
Part1

Molecular Conventions

4
Cheminformatics

It is a combination of chemistry and information
technology, is required for the processing and
analysis of chemical data
Cheminformatics is relevant to biologists because
chemistry data are important in many areas of
molecular biology, e.g, in the study of protein
interactions and metabolism

5
Molecular formulae

Molecules can be represented by simple formulae,
which give the number and type of atoms
However, this does not show how they are
connected
Structural formulae provide some information
about the arrangement of atoms in a molecule and
thus allow isomers to be distinguished

6
Structural representation of ethane that show
tetrahedral distribution of coordinated groups
about saturated carbon atoms. Panels (a) and (b)
show two extreme conformations. The energetically
favourable conformation (a), which predominates
in nature, has H atoms on opposite sides of C-C
bond as far as possible from each other (in the
staggered configuration). The less favourable
conformation (b) has atoms in eclipsed
configuration. Panels (c) and (d) show
conformations viewed from the end of molecule
(b)
(a)
(d)
(c)
7
Structural formulae and full and simplified
structural diagrams for some common organic
compounds
Name Formula Full structure Simplified structure
Methane CH4 H H C H H
Ethane C2H6 H H H C C H H H
Ethene (ethylene) C2H4 H H H C C H
8
Structural formulae and full and simplified
structural diagrams for some common organic
compounds
Name Formula Full structure Simplified structure
Cyclohexane C6H12
Ethanol C2H5OH H H H C C O H H
Ethenal (acetaldehyde) CH3CHO H H H C C O H
9
Structural diagrams

Molecules can be represented using simple graphs,
which show atoms as nodes and bonds as links
For organic molecules, further simplification is
achieved by assuming that carbon atoms make up
the molecular backbone and that the valency of
four is satisfied by hydrogen atoms unless
otherwise shown
Such diagrams present all molecules as planar
shapes an do not indicate the spatial
distribution of atoms in 3D

10
Chirality

If four different groups are coordinated around a
central carbon atom, the molecule is described as
chiral
Chiral molecules exist in two conformations,
enantiomers, which are mirror-images of each
other
Although enanciomers have the same chemical
properties, many enzymes and other proteins show
chiral sensitivity, which is important in drug
development and related fields

11
Multi-chiral configuration

Molecules may contain any number of chiral
centers and a series of forms, called
distereoisomers, may exist
These may have different chemical properties
because of the way different groups interact
within the molecule

12
DL and RS conventions

The absolute configuration of groups around a
chiral carbon atom can be described using a
number of conventions
In the DL system, molecules are named D or L
according to whether the coordinated groups are
arranged in a similar fashion to those in
D-glyceraldehyde or L-alanine
In the RS system, molecules are named R (rectus)
and S (sinister) according to the size of
chemical groups surrounding the carbon atom

13
Representation of a tetrahedrally coordinated
saturated carbon atom in an organic molecule(a)
the carbon atom is at the centre of a tetrahedron
with four coordinated groups(b) simplified
representation with the central carbon
removed(c) Representation of the tetrahedron as
a flat image
C
(c)
(a)
(b)
14
Chirality representation
(a) The structural formula of glyceraldehyde
gives no indication of its chirality
CH2OHCHOHCHO
L
D
D
L
(b) if the molecule is represented as a
tetrahedron, the D and L enantiomers can be
distinguished
(c) these can be shown as 2D graphs using the
Fischer convention
15
Part2

Resources

16
SMILES

SMILES is a system for representing chemical
formulae as strings, based on a valence model in
which all valencies are considered to be
satisfied by hydrogen atoms unless otherwise
shown
The system has conventions for representing
different bond types, cyclic molecules, branches,
cis/trans isomers and chirality

17
RasMol and Chime

There are several specialized data formats for
chemical structures based on the principle of a
molecular formula and associated table of
connections
Viewing utilities such as RasMol and Chime can
interpret these file formats and display
interactive molecular structures in a variety of
user-defined schemes and colors

18
Chemical structure and databases

Structural information about different molecules
can be obtained from a number of comprehensive
WWW resources, including Chemical Abstracts
On-Line, Chemfinder and MedChem
Each of these resources provides a chemical
database that can be searched using a variety of
query formats, e.g., systematic name,
non-systematic name, formula, molecular weight or
CAS registry number
Search results provide physical, chemical and
biomedical information with links to other
databases and resources
MedChem also provides the SMILES string

19
QSAR

A QSAR is a statistical method used to determine
how the structural features of a molecule are
related to biological activity
The QSAR approach is particularly useful for
categorizing the activities of related molecules
with multiple functional groups
Each molecule is broken down into a series of
descriptors (molecular properties) and the QSAR
determines which descriptors are most likely to
promote biological activity
This gives rise to a set of rules that can be
used to evaluate the potential activity of new
molecules

20
Part3

Drug Design

21
Pharmainformatics

Pharmainformatics is the combination of biology,
chemistry, mathematics and information technology
that is essential for efficient data management,
processing and analysis in the pharmaceutical
industry

22
Drugs

Drugs interact with targets, usually proteins, in
the body and through interactions cause
physiological responses
The pharmaceutical industry aims to discover
drugs with specific beneficial effects to treat
human diseases

23
Gene drug life

To know a genes chemical structure and
composition is one thing, but understanding its
actual function is another thing
Though the sequencing and analysis would help in
answering questions on aging, diseases,
disorders, and many more, a new discipline of
designer drugs is around the corner waiting for
someone to tap
Even a single nucleotide polymorphism (SNP,
pronounced snips), a T, for instance, in one of
the gene sequence, where the neighbour has a C,
can spell trouble

24
Gene drug life

Many drugs work only on 30 percent of human
population
In extreme cases, a drug that saves one person
may poison another. For instance, a type II drug
Rezulin, which has been linked to more than 60
deaths from liver toxicity worldwide
This is where in silico drug design would help
not only in reducing the designing, modeling and
testing time but also reducing the expenditure in
manpower, resources and on various phases of drug
design and development

25
Areas of drug design

For drug design, the process must be viewed from
three different dimensions viz., drug design for
Diseases such as HIV, cancer, etc. that have been
beating the people
Life style drugs
Drugs for repairing genetic disorders
There is an immanent need for evolving drugs for
diseases such as hepatitis C, leprosy and malaria
since these diseases are wide spread and trouble
the people at large
Other infectious diseases such as tuberculosis,
HIV, etc. are also highly troublesome

26
In silico drug design

Earlier, the drug design process used to take
many decades and was carried out haphazardly
without any direction whereas presently there is
a systems approach. Added to this are tremendous
reduction in research and production costs
Already the surge in bioinformatics solutions has
redefined the way drug trials are done making a
shift from in vitro to in silico
In silico drug design could be used to shorten
the time of drug design and this issue shall
remain the biggest challenge for years to come

27
Drugs are insoluble in water

A large portion of proteins constitute water
(2/3rd of human body consists of water) and hence
do not behave like rigid bodies due to the
presence of water in the cells and consequently,
the behavioural pattern differs from protein to
protein
Drugs normally do not dissolve in water.
Designing of drugs in silico (on chips, without
water) should consider this point

28
Important areas for drug design

The four most important areas of consideration
for successful drug design are the
binding sites
molecular shape
molecular size
inhibitory properties of the proteins

29
Important areas for drug design

The study related to crystallization of membrane
protein structure also plays a vital role in drug
design. This area of research would be highly
challenging and would prove to be an excellent
foundation for further research
Since the sequence size of dengue virus is just
about 11 KB, it would be highly useful for
carrying out lot of work quickly and conveniently

30
Medical applications

Bioinformatics and drug design can be highly
useful for diagnosis and treatment of various
neurological disorders. It has been found that
many neurological disorders are due to unusual
gene structures like the triple A formation
AAA (the A of ATGC nucleotides) in the genes.
The problem becomes more complex with multiple
repeats or occurrences of triple A. More than
eight such repeats are known and in such cases
children are permanently bed ridden or has to use
wheel chairs

31
Part4

Drug Development

32
Bioinformatics in drug development

Genomics, proteomics, combinatorial chemistry and
high-throughput screening (HTS) have all
contributed to a massive increase in the amount
of data generated by the pharmaceutical industry
The role of bioinformatics is to store, track and
provide tools for the analysis of these data
some thing like an automated environment

33
Bioinformatics in drug development

Specific applications include the modeling of
protein interactions with small molecules
allowing rational drug design, the association of
genotype and drug response patterns
(pharmacogenomics), the design and assessment of
chemical diversity in combinatorial libraries,
and the processing and storage of data from
high-throughput screens of lead compounds

34
Areas of biology
Application Role of bioinformatics
Genomics/proteomics (human genome project) Genomics/proteomics (human genome project)
Characterization of human genes and proteins Target identification/ validation in the human genome Cataloging SNPs and association with drug response patterns (pharmacogenomics)
Genomics/proteomics (human pathogen genome project) Genomics/proteomics (human pathogen genome project)
Characterization of genes and proteins of organisms that are pathogenic to humans Target identification/ validation in pathogens
Functional genomics (protein structures) Functional genomics (protein structures)
Analysis of protein structures (humans and their pathogens) Prediction of drug/target interactions Rational drug design
35
Areas of biology
Application Role of bioinformatics
Functional genomics (expression profiling) Functional genomics (expression profiling)
Determining gene expression patterns in disease and health Gene classification based on drug responses Pathway reconstruction
Functional genomics (genome-wide mutagenesis) Functional genomics (genome-wide mutagenesis)
Determining the mutant phenotypes for all genes in the genome Databases of animal models Target identification/ validation
Functional genomics (protein interactions) Functional genomics (protein interactions)
Determining interactions among all proteins Characterization of protein interactions Reconstruction of pathways Prediction of binding sites
36
Areas of chemistry
Application Role of bioinformatics
HTS HTS
Highly parallel assay formats for lead identification Storing, tracking and analyzing data
Combinatorial chemistry Combinatorial chemistry
Synthesis of large number of chemical compounds Cataloging chemical libraries Assessing library quality/ diversity Predicting drug/target interactions
37
Principles of drug development

Drug development begins with the identification
of a suitable target, which must contribute
significantly to a human disease
Ideally, altering the activity of this target
should have a beneficial effect thus showing its
potential for therapeutic intervention
The next stage of the process is lead discovery,
where compounds showing some of the desired
activity of an ideal drug are sought

38
Principles of drug development

Optimization of lead compounds results in drug
candidates that may be registered and submitted
for clinical trials, which establish their safety
and metabolic behaviour in human subjects

39
Genetic link to drugs

An early example of the utility of bioinformatics
in drug design is cathepsin K, an enzyme that
might turn out to be an important target for
treating osteoporosis, a crippling disease caused
by the breakdown of bone
While analyzing the osteoclasts (cells that break
down bone in the normal course of bone
replenishment) taken from people with bone
tumors, it was found that osteoclasts cells were
over expressed and could be over active in
individuals with osteoporosis
They matched with a previously identified class
of molecules called cathepsins. Efforts are on
to find a potential drug to block the cathepsin K
target

40
Genetic link to drugs

Scientists believe that 99.9 percent of your
genes perfectly match those of the person sitting
beside you. But the remaining 0.1 percent of the
genes vary and it is these variations in which
the drug companies are interested in
Several years after the debut of tests for BRCA1
and BRCA2, scientists are still trying to
determine exactly to what degree those genes
contribute to a womans cancer risk

41
Chemical diversity

Diverse chemical libraries are required for
efficient lead discovery if little is known about
the binding properties of the drug target
Conversely, focused libraries are required if the
structure of the target is known, since this
defines a particular set of ligands
Chemical diversity can be defined by comparing
molecules on the basis of descriptors (functional
groups) and how these fill chemical space
A number of software tools are available for the
design and assessment of diverse or focused
chemical libraries, virtual screening against
drug targets

42
Computational screening

Software applications like DOCK and Autodock
match potential ligands to binding sites by
calculating steric constraints and bond energies
These can be used to search chemical databases
and find potential drug leads
Some applications consider the ligand and binding
site as inflexible structures, rather like pieces
of a jigsaw, while others can incorporate
flexibility into the molecules by calculating
allowable and compatible bond torsions

43
Functional genomics

The large-scale functional annotation of genes is
known as functional genomics and incorporates
areas such as homology searching, structural
analysis, expression analysis, large scale
mutagenesis and the analysis of protein
interactions
All of these areas are important in drug
development

44
Genome-scale mutagenesis

Genome-scale mutagenesis is a rich source of
animal disease models for target identification
and validation, and large mutant collections in
simple organisms can be used for the rapid
high-throughput screening of potential lead
compounds

45
Approaches in functional genomics
Approach Functional annotation method
Homology searching Comparison to related sequences with known function
Protein structure determination (structural genomics) Comparison to molecules with related structure and known function
Comparative genomics Functional annotation by domain conservation, conserved phylogeny or conserved genomic organization
Expression analysis Similar expression profiles indicate conserved function
Mutagenesis Function based on mutant phenotype, e.g. knockout mice
Protein interaction screening Function based on presence in multi-subunit complex or on interaction with proteins of known function
Small molecule informatics Interaction with small molecules
46
Pharmacogenomics

It is a study of how variation in the human
population correlates with drug response patterns
The analysis of genomic data and its comparison
with drug response data allows patients to be
clustered into drug response groups, so that
appropriate drugs and dose regimens can be
administered
Variation is catalogued by analyzing data on
mutation (particularly SNPs) and gene expression
profiles

47
In lab vs. out of lab effort

The companies and individuals plug into the
effort of drug design at various points
collecting and storing data, searching databases,
and interpreting the data
The race and competition is all about who can
mine the massive information best
Just modeling or computing of the drug design or
protein structure would not be sufficient, but
lot of information on test results and clinical
trials from outside are also very important
Most of the time should be spent on this aspect
for ensuring success in drug design and
development

48
Issues of drug design

Eventhough the human genome has been sequenced,
there a number of problems awaiting for
solutions technical, legal, and social
It is absolutely not clear as to how much must
one know about a gene in order to patent it
There is also a necessity of reviewing all failed
drugs, i.e., drugs failed during clinical trails
since their molecular composition and
experimentation process could give lot of
valuable information

Various aspects connected to successful drug
design include supercomputing, modeling of
proteins through software, biotechnology,
computational methods and analysis, biochemistry,
in silico drug design, etc.
It is notable that a drug that works for protein
A does not work for protein B or behaves
differently due to various factors. That is why,
many drugs could fail, and hence an integrated
(team work) effort is required with tremendous
amount of information and interactions

At the moment, many patent applications rely on
computerized prediction techniques that are often
referred to as in silico biology
With full or partial gene sequence, scientists
enter the data into a computer program that
predicts the amino acid sequence of the resulting
protein
By comparing this hypothetical protein with known
proteins, the researchers take a guess at what
the underlying gene sequence does and how it
might be useful in developing a drug, say, or a
diagnostic test

Searches for compounds that bind to and have the
desired effect on drug targets still take place
mainly in a biochemists traditional wet lab,
where evaluations for activity, toxicity and
absorption can take years
But now with the bioinformatics initiatives,
tools and growing databases of protein structures
and biomolecular pathways, this aspect of drug
development is shifting to computers
As the saying goes genomics without
bioinformatics will not have much of a payoff

52
Ayurveda and tribal medicine

Till date, not much has been considered about the
biodiversity, especially research and knowledge
base on alternate medicine, Ayurveda,
herbs/shrubs applications from remote villages,
etc.
This area of medicine and study of their affect
on genes and proteins could be another
challenging and interesting area

53
Future of pharmainformatics

Drug companies collect the genetic know-how to
make medicines tailored to specific genes an
effort called pharmacogenomics
In the years to come, pharmacists may hand over
one version of blood pressure drug based on your
unique genetic profile, while the person behind
in line would get a different version of the same
medicine!!
There is going to be a day when somebody comes in
with cancer, and diagnosis can be done not on the
basis of morphology of the cancer but by looking
at the detailed patterns of gene expression and
protein-binding activities in that cell

54
Target for the industry

It is expected that in this decade, the
pharmaceutical industry will be faced with
evaluating up to 10,000 human proteins against
which new therapeutics might be directed
That is 25 times the number of drug targets that
have been evaluated by all the companies since
the dawn of the industry

55
Resources

For a primer on genetic testing and a directory
of genetic tests, visit GeneTests at
www.genetests.org
For more on the ethical, legal and social
implications of human genome research, visit the
National Human Genome Research Institutes web
site at www.nhgri.nih.gov/ELSI

Write a Comment

User Comments (0)