Title: Understanding genes using mathematical tools Adam Sartiel COMPUGEN
1Understanding genesusingmathematical
toolsAdam SartielCOMPUGEN
2Short History Of Compugen
- 1993 Founded
- 1994 First Bioccelerator sold (Merck)
- 1997 LEADS project initiated
- 1998 Pfizer collaboration
- 1999 USPTO agreement LabOnWeb launched
- 2000 Launch of Z3 IPO
- 2001 Gencarta and OligoLibraries launched
Novartis collaboration
3Unique RD Team
- Substantial
- 120 professionals 32 PhD/MD, 37 M.Sc.
- Multidisciplinary
- Algorithm development, Molecular biology,
Software engineering, Statistics, Physics,
Chemistry - Integrated
- Synergy between disciplines and feedback
4Gene analysis using mathematics
- Drug discovery and Bioinformatics
- Principles of sequence alignment
- The EST opportunity and the Transcriptome
- Applications (Gencarta and DNA chips)
5Cellular pathways are highly complex
6The Drug Development Process
7Some definitions
- Drug protein, lipid, antibody, or small
organic molecule which has proven effect and
approved safety level. - Lead A molecule in development which may one
day become a drug - Target A protein (in most cases) which
activity a drug lead would affect, in order to
create a desirable effect on the body. - Validated target A target which has a proven,
demonstrated effect on a disease or condition.
830,000 GENES?
- Fewer genes than initially thought?
- Some complexity due to alternative splicing
- Gene prediction is problematic
- Complex genes (interleaved, nested,...) are
especially difficult to identify - Both HGP and Celera tried to minimize false
positives - Conclusion more genes may be found
Wright et al., Genome Biology 2001 2(7) There
are 65,000 75,000 genes
9ONE GENE ? ONE PROTEIN???
10Gene identification using sequence comparison
11Similar sequences, common ancestor...
Understand genes know your targets
... common ancestor, similar function
12The genetic code is redundant
13Proteins see deeper
Unrelated DNA sequences?
Highly related proteins!
14How to align proteins?
MARQGEFPSILK
M-RHGEFP-LLKWC
A good algorithm, vs. 2001 databases, requires
super-computers
15Another direction find genes by sequence
- Gene regions have different nucleotide
composition than non-coding regions. - Intron and exons are distinct in sequences
- Splice junctions are clearly detectable
ACGATCGAGCATGCATCATCAGCATCTAGCGATCAGCAGGCATCGAGCAG
CTAGCATGCATG
TGCTAGCACGTACGTAGTAGTCGTAGATCGCTAGTCGTCCGTAGCTCGTC
GATCGTACGTCAC
16One step ahead the story of the ESTs
Craig Venter
Public domain ESTs (Expressed Sequence Tags) gt
5,000,000
17The ESTs Rough Diamonds?
- Short, inaccurate, badly annotated
- Abundant with repeats, alternative splicing
- Too many
- The shredder effect
18USING ESTS TO GET THE TRANSCRIPTOME
Input GenBank- a pool of ESTs and mRNAs
Process 1-clustering
Process 2- Assembly
Output The transcriptome
19The Transcriptome - Definition
- The mRNA collection content, present at any
given moment in a cell or a tissue, and its
behavior over time and cell states
20Introducing the Transcriptome
- The Genome
- Index to the range of possible proteins
- Useful as map and for inter-organisms analysis
- The Proteome
- Describes what actually happens in the cell
- Complex tools, partial results
- The Transcriptome
- Golden path Proteome information in DNA
technology.
21Transcriptome applications
- Discovery of new proteins
- Which are present in specific tissues
- Which have specific cell locations
- Which respond to specific cell states
- Discovery of new variants
- Of important genes
- Which work to increase/decrease the activity of
the native protein.
22Example Alternative SplicingOne Gene - Multiple
mRNAs
Pre mRNA
6
4
3
5
2
1
Alternative Splicing
(tissue A)
6
4
5
2
1
3
"
(tissue B)
6
3
5
2
1
4
"
(Other tissues)
6
4
3
5
2
1
Various Mature mRNA Transcripts
23Alternative Splicing vs. Contiging
Contiging
Assembling
24Extreme example of alternative splicing
Mature PSA
PSA precursor
PSA RNA
Genomic
Modified mRNA
LM precursor
Mature LM protein
25Is This The Only Example?
PSA genomic
KLK-2 genomic
exon 2
exon 3 exon 4
exon1
exon1
exon 2
exon 3 exon 4
exon 5
exon 5
KLM
LM
Stop codon
26Validation Northern Blot
- Like PSA, LM expression is restricted to prostate
tissue - Multiple bands may reflect conserved regions or
alternative splicing
27Example receptor with DN
DominantNegative
28Natural Antisense a regulation mechanism?
29LEADS Antisense Prediction
- When analyzing EST data for Antisense
- Use original EST orientation annotation
- Check splicing signals on both strands
- Examine library description for enzymes used
- Mark PolyA signals and PolyA tails (compare to
genomic PolyA) - Take into account NotI sites
30Example A Putative SNP
Cluster T07189 Position 347
31SNP Verification
Cluster T07189 Position 347
32Using Compugens Transcriptome Technology
- Large-scale collaborations Pfizer, Novartis
- Co-development of molecules TNF, Chemokine
receptors, kinases, GPCRs - Academia research UCSF, NYU, TAU.
- Database products
- DNA chip design
- Mass-spec analysis
- Gene Ontology
33Chip Design on Alternative Splicing
34How many genes are there really?
- Raw data
- 3,770,969 human sequences
- 2,061,357 mouse sequences
- 297,568 rat sequences
- Non-singleton clusters 120,372 H, 63,043 M,
33,396 R - with splice variants 26 (H), 32 (M), 23 (R)
- Homology (to SwissProtTrembl, InterPro, other GC
proteins) 20 (HM), 27 (R). - Total unique proteins 236,797 (H), 106,119 (M),
32,352 (R)
35The Novartis Agreement
- Signed August 2001
- Novartis non-exclusively licensed the LEADS
platform and related software, and plans to use
it for - In-silico drug target identification and
prioritization - Genome wide chip design
- Agreement was signed after a detailed pilot study
run in November 2000 - Discovered novel genes and splice variants using
Incyte and Celera data - Genes were subsequently verified in Novartis
laboratory.
36GENCARTA
- Result of LEADS applied to
- Public genome information
- Published mRNA
- ESTs
- In-house designed interface, Oracle-based
infrastructure. - Installed Kyowa-Hakko, Avalon Pharma, Weizmann
Institute, YU - Version 2.2 out in October 2001.
37Lets go for the real thing
- Gencarta Demonstration
- OligoLibrary Demonstration
38Conclusion Advantages of the Transcriptome
- Identify new drug targets
- Understand splice variant behavior
- Isolate natural drugs
- Annotate Proteomics experiments
- Design better DNA chips
Solve the real bottlenecks in drug discovery and
development
39Understanding genesusingmathematical
toolsAdam SartielCOMPUGEN