Title: Bioinformatics from a drug discovery perspective
1Bioinformatics from a drug discovery perspective
- EMBRACE Workshop, 22-23 March 2007
- Niclas Jareborg
- AstraZeneca RD Södertälje
2AstraZeneca Drug Discovery
- Research Areas
- CV/GI (Cardiovasc/Gastrointest), RIRA
(Resp/Infl), CNS/Pain, Cancer, Infection - Discovery Sites
- UK
- Charnwood (RIRA), Alderley Park (Cancer, CV/GI,
RIRA) - North America
- Boston (Cancer, Infection), Willmington
(CNS/Pain), Montreal (CNS/Pain) - Sweden
- Lund (RIRA), Mölndal (CVGI), Södertälje
(CNS/Pain) - India
- Bangalore (Infection)
- Bioinformatics
- All RAs have their own bioinformatics teams
- Infrastructure at Alderley Park (dbs, large
Linux clusters) - IS organisation
3A target is defined as
- ... a biological target protein on which a
chemical entity (e.g. a drug molecule) exerts its
action - A drug target must be associated with a disease
4Drug discovery process
Protein
Compound library
Target identification
Assay
Target validation
Hit identification (HTS)
Hit
Genes
Hit to lead (Lead identification)
Lead optimisation
Candidate drug
Effort
Clinical trials
5Target Definition
- Alternative Splicing
- Identify pharmacologically relevant target
variant(s) - Sequence variation
- Function
- Target
- Metabolizing enzyme
- Binding of substance
- Identify most common variant
- Might differ in different populations!
6Target Definition
- Expression
- Is the target expressed in a relevant human
tissue? - Databases
- Microarrays
- Immunhistochemistry
- In situ hybridization
- Proteomics
- Literature
7Target Definition
- Selectivity
- How similar are related proteins?
- Do similar proteins have functions that we do not
want to affect? - Animal models
- Orthologous genes
- Same family size?
- Splice variants
- Same as in human?
- Polymorphisms
- Differences between inbred strains
- Tissue expression
- Overlap human?
- Available transgenes or knock-outs
8Genetics Bioinformatics
Bioinformatics input to the drug discovery process
Research
Development
Commercialisation
MS1
MS4
MS5
MS2
MS3
Development for Launch
CD Pre- nomination
Hit Identification
Lead Identification
Lead Optimisation
Target Identification
Sales
Launch
Registration
Primary screening Identify polymorphic and splice
variants
Support Biomarker identification
Support choice of model organism(s)
Support target identification
flag up population variants in target
Selectivity screening Identify paralogues
9In-house generated gene centric information
resource
10In-house generated gene centric information
resource
11Target identification
Targets from different experimental approaches
as well as validation using different technologies
ESTs sequencing campaigns
Genetics/genome information
Proteomics
Literature
Differential biology
Target Candidates
In silico
Micro arrays (Affymetrix, glas etc.)
Validation (in silico, lab bench)
Validation as potential targets
Specificity / selectivity
12Target identification
30000 human genes
What?
Link to disease?
Where?
Novel?
1 potential target
13The human genome offers many potential drug
targets
14Current Drug Targets - few target classes Based
on 483 drugs in Goodman and Gilman's "The
Pharmacological basis of therapeutics"
Samuel Svensson, PhD AstraZeneca RD Södertälje
15Number of druggable targets smaller than expected?
Only a subfraction of gene products play a direct
role in disease patophysiology
30000 human genes
Druggable genome 2-3.000 genes 500 GPCRs, 50
NHRs, gt200 ion channels, gt1.000 enzymes (e.g. 450
proteases, 500 kinases, gt200 others)
pathogens commensal gut bacteria genes
lt 5.000 targets for small molecule drugs
2-3.000 druggable targets
16Updating the (shrinking?) Targetome
Down to 22K ? (see) PMID 15174140
Some of the 120 InterPro domains are unpromising
many potentials still functional orphans
realistically nearer 2000 ?
OMIM still only at 1900 and only low numbers of
robust genetic association results
17Current trends
- Blue sky genomics -gt literature
- Finding unknown targets -gt prioritizing the
lists - Moving from single target focus
- Comparing and ranking of target candidates
- Integration of relevant but disparate data
sources - Better understanding of the target
neighbourhood - Disease mechanism
- Biomarkers
- Toxicology
18Sources of Contextual Information
80
20
Current approach to retrieving information from
unstructured sources is through manual
extraction I.e. Finding documents and reading
them!
- Internal Docs
- Tox Reports, Clinical Trial Reports.
- External Docs
- Patents USPTO, WIPO, EP, etc
- Literature Medline, Embase
- Press Releases
- competitor, supplier, collaborator, academic
(etc) - Government Agencies
- Conference Proceedings
- News Feeds
- Internal Chemical Dbs
- Internal Biological Dbs
- External, Commercial Dbs
- GVK Bio, Ingenuity IPA
- External Public Dbs
- EMBL, PDB, SNPdb, etc
19Dissecting the Decision Making Process
Finding
Extracting
Integrating
Creating
- Locating relevant documents and information
- Retrieving them in a useable format
- Reading information
- Locating the facts within documents
- Understanding what it means
- Putting the information into context
- Turning information into knowledge
- Developing new hypotheses
- Input into decision making
20Issues with the Manual Approach
Finding
Extracting
Integrating
Creating
- Difficult to capture breadth
- Chance to miss things
- White space in failing to find things
- Limited time to read things
- Focus on reviews and summaries
- Based on individual scientists own knowledge
- Narrow
- Biased
- Hypotheses are per project
- Reactive not proactive
21Text mining
- Sources
- Literature
- Patents
- In-house reports
- Information
- Protein-protein interactions
- Tissue expression
- Pharmacological differences
- Splice variants, Polymorphisms
- Species
- Toxicology
- etc
22Emerging SystemsText Mining
- Extraction of facts from unstructured data
sources - Natural Language Processing, Ontologies
- Linguamatics I2E
- Knowledgebase generation
23Biomedical Entity-Relationship Data
BCL2
PARP
TNF
CASP9
CASP3
CASP8
MTPN
24Pilot SystemsPathway Analysis Ingenuity IPA
25BER System in Action
Evidence Trail
Literature
Gene Expression
ERSystem (Gene/Metabolite Knowledgebase)
- Significant Biological
- Entity List
- Gene List
- Protein List
- Metabolite List
Biological environment of the list.
Hypothesis Generation
Proteomic
Metabonomic
Canonical pathways associated with the list
Question What is the underlying biology,
pathology, physiology etc associated with this
list of entities? What is it telling me?
Genetic
Diseases, Biological processes associated with
the list
26Structuring the KnowledgeDelivers facts as
networks of information Knowledge Bases
GI Tox Knowledge Map
Species Human Rat Dog Etc.
Observed in
Clinical Observations Diarrhoea Vomiting Loose
Stools Bloating Nausea Etc.
Observed in
Affects
Linked with
Compound
Genes
Is a
Linked with
Affects
Pathology GI toxicity GI pathology
Linked with
Involved in
Affects
Involved in
Cellular Processes
27Data source integration
28Workflow technology
- Enables scientists to use, modify and implement
solutions that specialist groups help them put in
place removes (in principle) the need to make
extensive IS projects for new data types.
29The Knowledge Technology Ziggurat
Modelling
Create
Builds on
Information Structuring
Integrate
Builds on
Decision Making Process
Fact Extraction (Text Mining)
Extract
Builds on
Document Retrieval and Storage
Find
Builds on
Content Licensing Access
Unstructured Information
30Bio and Chemo Informatics Joins to Aid Target
Selection
Sequences
Structures
Chemistry
31What do we need to do ?
Clinical Practice
Chemistry
Biology
32Hypothesis Generation Using Informatics/Modelling
Proteins
Testicular Degeneration
Candidate Compound
33A multidimensional jigsaw puzzle
- Target - Biological mechanisms - Disease
- Target/Off-target - Biological mechanisms -
Toxicology - Polymorphisms
- Splice variants
- Interaction partners
- Tissues
- Compounds
- Animal models
- etc etc etc
34Current needs
- Pathways / Systems biology
- Mining of unstructured data
- Connect biology and chemistry informatics domains
- System / data integration
- Ontologies!
- Workflow technology
35AZ - EBI
- AZ member of the Industry programme
- Training and Education
- Network meetings
- Research, Standards