Bioinformatics from a drug discovery perspective - PowerPoint PPT Presentation

1 / 35

About This Presentation

Title:

Bioinformatics from a drug discovery perspective

Description:

In-house generated gene centric information resource. DNA and protein sequence ... removes (in principle) the need to make extensive IS projects for new data types. ... – PowerPoint PPT presentation

Number of Views:114

Avg rating:3.0/5.0

Slides: 36

Provided by: niclasj5

Category:

more less

Transcript and Presenter's Notes

Title: Bioinformatics from a drug discovery perspective

1
Bioinformatics from a drug discovery perspective

EMBRACE Workshop, 22-23 March 2007
Niclas Jareborg
AstraZeneca RD Södertälje

2
AstraZeneca Drug Discovery

Research Areas
CV/GI (Cardiovasc/Gastrointest), RIRA
(Resp/Infl), CNS/Pain, Cancer, Infection
Discovery Sites
UK
Charnwood (RIRA), Alderley Park (Cancer, CV/GI,
RIRA)
North America
Boston (Cancer, Infection), Willmington
(CNS/Pain), Montreal (CNS/Pain)
Sweden
Lund (RIRA), Mölndal (CVGI), Södertälje
(CNS/Pain)
India
Bangalore (Infection)
Bioinformatics
All RAs have their own bioinformatics teams
Infrastructure at Alderley Park (dbs, large
Linux clusters)
IS organisation

3
A target is defined as

... a biological target protein on which a
chemical entity (e.g. a drug molecule) exerts its
action
A drug target must be associated with a disease

4
Drug discovery process
Protein
Compound library
Target identification
Assay
Target validation
Hit identification (HTS)
Hit
Genes
Hit to lead (Lead identification)
Lead optimisation
Candidate drug
Effort
Clinical trials
5
Target Definition

Alternative Splicing
Identify pharmacologically relevant target
variant(s)
Sequence variation
Function
Target
Metabolizing enzyme
Binding of substance
Identify most common variant
Might differ in different populations!

6
Target Definition

Expression
Is the target expressed in a relevant human
tissue?
Databases
Microarrays
Immunhistochemistry
In situ hybridization
Proteomics
Literature

7
Target Definition

Selectivity
How similar are related proteins?
Do similar proteins have functions that we do not
want to affect?
Animal models
Orthologous genes
Same family size?
Splice variants
Same as in human?
Polymorphisms
Differences between inbred strains
Tissue expression
Overlap human?
Available transgenes or knock-outs

8
Genetics Bioinformatics
Bioinformatics input to the drug discovery process
Research
Development
Commercialisation
MS1
MS4
MS5
MS2
MS3
Development for Launch
CD Pre- nomination
Hit Identification
Lead Identification
Lead Optimisation
Target Identification
Sales
Launch
Registration
Primary screening Identify polymorphic and splice
variants
Support Biomarker identification
Support choice of model organism(s)
Support target identification
flag up population variants in target
Selectivity screening Identify paralogues
9
In-house generated gene centric information
resource
10
In-house generated gene centric information
resource
11
Target identification
Targets from different experimental approaches
as well as validation using different technologies
ESTs sequencing campaigns
Genetics/genome information
Proteomics
Literature
Differential biology
Target Candidates
In silico
Micro arrays (Affymetrix, glas etc.)
Validation (in silico, lab bench)
Validation as potential targets
Specificity / selectivity
12
Target identification
30000 human genes
What?
Link to disease?
Where?
Novel?
1 potential target
13
The human genome offers many potential drug
targets
14
Current Drug Targets - few target classes Based
on 483 drugs in Goodman and Gilman's "The
Pharmacological basis of therapeutics"
Samuel Svensson, PhD AstraZeneca RD Södertälje
15
Number of druggable targets smaller than expected?
Only a subfraction of gene products play a direct
role in disease patophysiology
30000 human genes
Druggable genome 2-3.000 genes 500 GPCRs, 50
NHRs, gt200 ion channels, gt1.000 enzymes (e.g. 450
proteases, 500 kinases, gt200 others)
pathogens commensal gut bacteria genes
lt 5.000 targets for small molecule drugs
2-3.000 druggable targets
16
Updating the (shrinking?) Targetome
Down to 22K ? (see) PMID 15174140
Some of the 120 InterPro domains are unpromising
many potentials still functional orphans
realistically nearer 2000 ?
OMIM still only at 1900 and only low numbers of
robust genetic association results
17
Current trends

Blue sky genomics -gt literature
Finding unknown targets -gt prioritizing the
lists
Moving from single target focus
Comparing and ranking of target candidates
Integration of relevant but disparate data
sources
Better understanding of the target
neighbourhood
Disease mechanism
Biomarkers
Toxicology

18
Sources of Contextual Information

Structured

Unstructured

80
20
Current approach to retrieving information from
unstructured sources is through manual
extraction I.e. Finding documents and reading
them!

Internal Docs
Tox Reports, Clinical Trial Reports.
External Docs
Patents USPTO, WIPO, EP, etc
Literature Medline, Embase
Press Releases
competitor, supplier, collaborator, academic
(etc)
Government Agencies
Conference Proceedings
News Feeds

Internal Chemical Dbs
Internal Biological Dbs
External, Commercial Dbs
GVK Bio, Ingenuity IPA
External Public Dbs
EMBL, PDB, SNPdb, etc

19
Dissecting the Decision Making Process
Finding
Extracting
Integrating
Creating

Locating relevant documents and information
Retrieving them in a useable format

Reading information
Locating the facts within documents

Understanding what it means
Putting the information into context
Turning information into knowledge

Developing new hypotheses
Input into decision making

20
Issues with the Manual Approach
Finding
Extracting
Integrating
Creating

Difficult to capture breadth
Chance to miss things
White space in failing to find things

Limited time to read things
Focus on reviews and summaries

Based on individual scientists own knowledge
Narrow
Biased

Hypotheses are per project
Reactive not proactive

21
Text mining

Sources
Literature
Patents
In-house reports
Information
Protein-protein interactions
Tissue expression
Pharmacological differences
Splice variants, Polymorphisms
Species
Toxicology
etc

22
Emerging SystemsText Mining

Extraction of facts from unstructured data
sources
Natural Language Processing, Ontologies
Linguamatics I2E
Knowledgebase generation

23
Biomedical Entity-Relationship Data
BCL2
PARP
TNF
CASP9
CASP3
CASP8
MTPN
24
Pilot SystemsPathway Analysis Ingenuity IPA
25
BER System in Action
Evidence Trail
Literature
Gene Expression
ERSystem (Gene/Metabolite Knowledgebase)

Significant Biological
Entity List
Gene List
Protein List
Metabolite List

Biological environment of the list.
Hypothesis Generation
Proteomic
Metabonomic
Canonical pathways associated with the list
Question What is the underlying biology,
pathology, physiology etc associated with this
list of entities? What is it telling me?
Genetic
Diseases, Biological processes associated with
the list
26
Structuring the KnowledgeDelivers facts as
networks of information Knowledge Bases
GI Tox Knowledge Map
Species Human Rat Dog Etc.
Observed in
Clinical Observations Diarrhoea Vomiting Loose
Stools Bloating Nausea Etc.
Observed in
Affects
Linked with
Compound
Genes
Is a
Linked with
Affects
Pathology GI toxicity GI pathology
Linked with
Involved in
Affects
Involved in
Cellular Processes
27
Data source integration
28
Workflow technology

Enables scientists to use, modify and implement
solutions that specialist groups help them put in
place removes (in principle) the need to make
extensive IS projects for new data types.

29
The Knowledge Technology Ziggurat
Modelling
Create
Builds on
Information Structuring
Integrate
Builds on
Decision Making Process
Fact Extraction (Text Mining)
Extract
Builds on
Document Retrieval and Storage
Find
Builds on
Content Licensing Access
Unstructured Information
30
Bio and Chemo Informatics Joins to Aid Target
Selection
Sequences
Structures
Chemistry
31
What do we need to do ?
Clinical Practice
Chemistry
Biology
32
Hypothesis Generation Using Informatics/Modelling
Proteins
Testicular Degeneration
Candidate Compound
33
A multidimensional jigsaw puzzle