Bioinformatics from a drug discovery perspective - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Bioinformatics from a drug discovery perspective

Description:

In-house generated gene centric information resource. DNA and protein sequence ... removes (in principle) the need to make extensive IS projects for new data types. ... – PowerPoint PPT presentation

Number of Views:114
Avg rating:3.0/5.0
Slides: 36
Provided by: niclasj5
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics from a drug discovery perspective


1
Bioinformatics from a drug discovery perspective
  • EMBRACE Workshop, 22-23 March 2007
  • Niclas Jareborg
  • AstraZeneca RD Södertälje

2
AstraZeneca Drug Discovery
  • Research Areas
  • CV/GI (Cardiovasc/Gastrointest), RIRA
    (Resp/Infl), CNS/Pain, Cancer, Infection
  • Discovery Sites
  • UK
  • Charnwood (RIRA), Alderley Park (Cancer, CV/GI,
    RIRA)
  • North America
  • Boston (Cancer, Infection), Willmington
    (CNS/Pain), Montreal (CNS/Pain)
  • Sweden
  • Lund (RIRA), Mölndal (CVGI), Södertälje
    (CNS/Pain)
  • India
  • Bangalore (Infection)
  • Bioinformatics
  • All RAs have their own bioinformatics teams
  • Infrastructure at Alderley Park (dbs, large
    Linux clusters)
  • IS organisation

3
A target is defined as
  • ... a biological target protein on which a
    chemical entity (e.g. a drug molecule) exerts its
    action
  • A drug target must be associated with a disease

4
Drug discovery process
Protein
Compound library
Target identification
Assay
Target validation
Hit identification (HTS)
Hit
Genes
Hit to lead (Lead identification)
Lead optimisation
Candidate drug
Effort
Clinical trials
5
Target Definition
  • Alternative Splicing
  • Identify pharmacologically relevant target
    variant(s)
  • Sequence variation
  • Function
  • Target
  • Metabolizing enzyme
  • Binding of substance
  • Identify most common variant
  • Might differ in different populations!

6
Target Definition
  • Expression
  • Is the target expressed in a relevant human
    tissue?
  • Databases
  • Microarrays
  • Immunhistochemistry
  • In situ hybridization
  • Proteomics
  • Literature

7
Target Definition
  • Selectivity
  • How similar are related proteins?
  • Do similar proteins have functions that we do not
    want to affect?
  • Animal models
  • Orthologous genes
  • Same family size?
  • Splice variants
  • Same as in human?
  • Polymorphisms
  • Differences between inbred strains
  • Tissue expression
  • Overlap human?
  • Available transgenes or knock-outs

8
Genetics Bioinformatics
Bioinformatics input to the drug discovery process
Research
Development
Commercialisation
MS1
MS4
MS5
MS2
MS3
Development for Launch
CD Pre- nomination
Hit Identification
Lead Identification
Lead Optimisation
Target Identification
Sales
Launch
Registration
Primary screening Identify polymorphic and splice
variants
Support Biomarker identification
Support choice of model organism(s)
Support target identification
flag up population variants in target
Selectivity screening Identify paralogues
9
In-house generated gene centric information
resource
10
In-house generated gene centric information
resource
11
Target identification
Targets from different experimental approaches
as well as validation using different technologies
ESTs sequencing campaigns
Genetics/genome information
Proteomics
Literature
Differential biology
Target Candidates
In silico
Micro arrays (Affymetrix, glas etc.)
Validation (in silico, lab bench)
Validation as potential targets
Specificity / selectivity
12
Target identification
30000 human genes
What?
Link to disease?
Where?
Novel?
1 potential target
13
The human genome offers many potential drug
targets
14
Current Drug Targets - few target classes Based
on 483 drugs in Goodman and Gilman's "The
Pharmacological basis of therapeutics"
Samuel Svensson, PhD AstraZeneca RD Södertälje
15
Number of druggable targets smaller than expected?
Only a subfraction of gene products play a direct
role in disease patophysiology
30000 human genes
Druggable genome 2-3.000 genes 500 GPCRs, 50
NHRs, gt200 ion channels, gt1.000 enzymes (e.g. 450
proteases, 500 kinases, gt200 others)
pathogens commensal gut bacteria genes
lt 5.000 targets for small molecule drugs
2-3.000 druggable targets
16
Updating the (shrinking?) Targetome
Down to 22K ? (see) PMID 15174140
Some of the 120 InterPro domains are unpromising
many potentials still functional orphans
realistically nearer 2000 ?
OMIM still only at 1900 and only low numbers of
robust genetic association results
17
Current trends
  • Blue sky genomics -gt literature
  • Finding unknown targets -gt prioritizing the
    lists
  • Moving from single target focus
  • Comparing and ranking of target candidates
  • Integration of relevant but disparate data
    sources
  • Better understanding of the target
    neighbourhood
  • Disease mechanism
  • Biomarkers
  • Toxicology

18
Sources of Contextual Information
  • Structured
  • Unstructured

80
20
Current approach to retrieving information from
unstructured sources is through manual
extraction I.e. Finding documents and reading
them!
  • Internal Docs
  • Tox Reports, Clinical Trial Reports.
  • External Docs
  • Patents USPTO, WIPO, EP, etc
  • Literature Medline, Embase
  • Press Releases
  • competitor, supplier, collaborator, academic
    (etc)
  • Government Agencies
  • Conference Proceedings
  • News Feeds
  • Internal Chemical Dbs
  • Internal Biological Dbs
  • External, Commercial Dbs
  • GVK Bio, Ingenuity IPA
  • External Public Dbs
  • EMBL, PDB, SNPdb, etc

19
Dissecting the Decision Making Process
Finding
Extracting
Integrating
Creating
  • Locating relevant documents and information
  • Retrieving them in a useable format
  • Reading information
  • Locating the facts within documents
  • Understanding what it means
  • Putting the information into context
  • Turning information into knowledge
  • Developing new hypotheses
  • Input into decision making

20
Issues with the Manual Approach
Finding
Extracting
Integrating
Creating
  • Difficult to capture breadth
  • Chance to miss things
  • White space in failing to find things
  • Limited time to read things
  • Focus on reviews and summaries
  • Based on individual scientists own knowledge
  • Narrow
  • Biased
  • Hypotheses are per project
  • Reactive not proactive

21
Text mining
  • Sources
  • Literature
  • Patents
  • In-house reports
  • Information
  • Protein-protein interactions
  • Tissue expression
  • Pharmacological differences
  • Splice variants, Polymorphisms
  • Species
  • Toxicology
  • etc

22
Emerging SystemsText Mining
  • Extraction of facts from unstructured data
    sources
  • Natural Language Processing, Ontologies
  • Linguamatics I2E
  • Knowledgebase generation

23
Biomedical Entity-Relationship Data
BCL2
PARP
TNF
CASP9
CASP3
CASP8
MTPN
24
Pilot SystemsPathway Analysis Ingenuity IPA
25
BER System in Action
Evidence Trail
Literature
Gene Expression
ERSystem (Gene/Metabolite Knowledgebase)
  • Significant Biological
  • Entity List
  • Gene List
  • Protein List
  • Metabolite List

Biological environment of the list.
Hypothesis Generation
Proteomic
Metabonomic
Canonical pathways associated with the list
Question What is the underlying biology,
pathology, physiology etc associated with this
list of entities? What is it telling me?
Genetic
Diseases, Biological processes associated with
the list
26
Structuring the KnowledgeDelivers facts as
networks of information Knowledge Bases
GI Tox Knowledge Map
Species Human Rat Dog Etc.
Observed in
Clinical Observations Diarrhoea Vomiting Loose
Stools Bloating Nausea Etc.
Observed in
Affects
Linked with
Compound
Genes
Is a
Linked with
Affects
Pathology GI toxicity GI pathology
Linked with
Involved in
Affects
Involved in
Cellular Processes
27
Data source integration
28
Workflow technology
  • Enables scientists to use, modify and implement
    solutions that specialist groups help them put in
    place removes (in principle) the need to make
    extensive IS projects for new data types.

29
The Knowledge Technology Ziggurat
Modelling
Create
Builds on
Information Structuring
Integrate
Builds on
Decision Making Process
Fact Extraction (Text Mining)
Extract
Builds on
Document Retrieval and Storage
Find
Builds on
Content Licensing Access
Unstructured Information
30
Bio and Chemo Informatics Joins to Aid Target
Selection
Sequences
Structures
Chemistry
31
What do we need to do ?
Clinical Practice
Chemistry
Biology
32
Hypothesis Generation Using Informatics/Modelling
Proteins
Testicular Degeneration
Candidate Compound
33
A multidimensional jigsaw puzzle
  • Target - Biological mechanisms - Disease
  • Target/Off-target - Biological mechanisms -
    Toxicology
  • Polymorphisms
  • Splice variants
  • Interaction partners
  • Tissues
  • Compounds
  • Animal models
  • etc etc etc

34
Current needs
  • Pathways / Systems biology
  • Mining of unstructured data
  • Connect biology and chemistry informatics domains
  • System / data integration
  • Ontologies!
  • Workflow technology

35
AZ - EBI
  • AZ member of the Industry programme
  • Training and Education
  • Network meetings
  • Research, Standards
Write a Comment
User Comments (0)
About PowerShow.com