Understanding genes using mathematical tools Adam Sartiel COMPUGEN - PowerPoint PPT Presentation

About This Presentation
Title:

Understanding genes using mathematical tools Adam Sartiel COMPUGEN

Description:

Stop codon. Stop codon. Signal peptide. Signal peptide. Alternative splicing ... Stop codon. Is This The Only Example? 26. Validation: Northern Blot ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 40
Provided by: HP3120
Category:

less

Transcript and Presenter's Notes

Title: Understanding genes using mathematical tools Adam Sartiel COMPUGEN


1
Understanding genesusingmathematical
toolsAdam SartielCOMPUGEN
2
Short History Of Compugen
  • 1993 Founded
  • 1994 First Bioccelerator sold (Merck)
  • 1997 LEADS project initiated
  • 1998 Pfizer collaboration
  • 1999 USPTO agreement LabOnWeb launched
  • 2000 Launch of Z3 IPO
  • 2001 Gencarta and OligoLibraries launched
    Novartis collaboration

3
Unique RD Team
  • Substantial
  • 120 professionals 32 PhD/MD, 37 M.Sc.
  • Multidisciplinary
  • Algorithm development, Molecular biology,
    Software engineering, Statistics, Physics,
    Chemistry
  • Integrated
  • Synergy between disciplines and feedback

4
Gene analysis using mathematics
  • Drug discovery and Bioinformatics
  • Principles of sequence alignment
  • The EST opportunity and the Transcriptome
  • Applications (Gencarta and DNA chips)

5
Cellular pathways are highly complex
6
The Drug Development Process
7
Some definitions
  • Drug protein, lipid, antibody, or small
    organic molecule which has proven effect and
    approved safety level.
  • Lead A molecule in development which may one
    day become a drug
  • Target A protein (in most cases) which
    activity a drug lead would affect, in order to
    create a desirable effect on the body.
  • Validated target A target which has a proven,
    demonstrated effect on a disease or condition.

8
30,000 GENES?
  • Fewer genes than initially thought?
  • Some complexity due to alternative splicing
  • Gene prediction is problematic
  • Complex genes (interleaved, nested,...) are
    especially difficult to identify
  • Both HGP and Celera tried to minimize false
    positives
  • Conclusion more genes may be found

Wright et al., Genome Biology 2001 2(7) There
are 65,000 75,000 genes
9
ONE GENE ? ONE PROTEIN???
10
Gene identification using sequence comparison
11
Similar sequences, common ancestor...
Understand genes know your targets
... common ancestor, similar function
12
The genetic code is redundant
13
Proteins see deeper
Unrelated DNA sequences?
Highly related proteins!
14
How to align proteins?
MARQGEFPSILK
M-RHGEFP-LLKWC
A good algorithm, vs. 2001 databases, requires
super-computers
15
Another direction find genes by sequence
  • Gene regions have different nucleotide
    composition than non-coding regions.
  • Intron and exons are distinct in sequences
  • Splice junctions are clearly detectable

ACGATCGAGCATGCATCATCAGCATCTAGCGATCAGCAGGCATCGAGCAG
CTAGCATGCATG
TGCTAGCACGTACGTAGTAGTCGTAGATCGCTAGTCGTCCGTAGCTCGTC
GATCGTACGTCAC
16
One step ahead the story of the ESTs
Craig Venter
Public domain ESTs (Expressed Sequence Tags) gt
5,000,000
17
The ESTs Rough Diamonds?
  • Short, inaccurate, badly annotated
  • Abundant with repeats, alternative splicing
  • Too many
  • The shredder effect

18
USING ESTS TO GET THE TRANSCRIPTOME
Input GenBank- a pool of ESTs and mRNAs
Process 1-clustering
Process 2- Assembly
Output The transcriptome
19
The Transcriptome - Definition
  • The mRNA collection content, present at any
    given moment in a cell or a tissue, and its
    behavior over time and cell states

20
Introducing the Transcriptome
  • The Genome
  • Index to the range of possible proteins
  • Useful as map and for inter-organisms analysis
  • The Proteome
  • Describes what actually happens in the cell
  • Complex tools, partial results
  • The Transcriptome
  • Golden path Proteome information in DNA
    technology.

21
Transcriptome applications
  • Discovery of new proteins
  • Which are present in specific tissues
  • Which have specific cell locations
  • Which respond to specific cell states
  • Discovery of new variants
  • Of important genes
  • Which work to increase/decrease the activity of
    the native protein.

22
Example Alternative SplicingOne Gene - Multiple
mRNAs
Pre mRNA
6
4
3
5
2
1
Alternative Splicing
(tissue A)
6
4
5
2
1
3
"
(tissue B)
6
3
5
2
1
4
"
(Other tissues)
6
4
3
5
2
1
Various Mature mRNA Transcripts
23
Alternative Splicing vs. Contiging
Contiging
Assembling
24
Extreme example of alternative splicing
Mature PSA
PSA precursor
PSA RNA
Genomic
Modified mRNA
LM precursor
Mature LM protein
25
Is This The Only Example?
PSA genomic
KLK-2 genomic
exon 2
exon 3 exon 4
exon1
exon1
exon 2
exon 3 exon 4
exon 5
exon 5




KLM
LM
Stop codon
26
Validation Northern Blot
  • Like PSA, LM expression is restricted to prostate
    tissue
  • Multiple bands may reflect conserved regions or
    alternative splicing

27
Example receptor with DN
DominantNegative
28
Natural Antisense a regulation mechanism?
29
LEADS Antisense Prediction
  • When analyzing EST data for Antisense
  • Use original EST orientation annotation
  • Check splicing signals on both strands
  • Examine library description for enzymes used
  • Mark PolyA signals and PolyA tails (compare to
    genomic PolyA)
  • Take into account NotI sites

30
Example A Putative SNP
Cluster T07189 Position 347
31
SNP Verification
Cluster T07189 Position 347
32
Using Compugens Transcriptome Technology
  • Large-scale collaborations Pfizer, Novartis
  • Co-development of molecules TNF, Chemokine
    receptors, kinases, GPCRs
  • Academia research UCSF, NYU, TAU.
  • Database products
  • DNA chip design
  • Mass-spec analysis
  • Gene Ontology

33
Chip Design on Alternative Splicing
34
How many genes are there really?
  • Raw data
  • 3,770,969 human sequences
  • 2,061,357 mouse sequences
  • 297,568 rat sequences
  • Non-singleton clusters 120,372 H, 63,043 M,
    33,396 R
  • with splice variants 26 (H), 32 (M), 23 (R)
  • Homology (to SwissProtTrembl, InterPro, other GC
    proteins) 20 (HM), 27 (R).
  • Total unique proteins 236,797 (H), 106,119 (M),
    32,352 (R)

35
The Novartis Agreement
  • Signed August 2001
  • Novartis non-exclusively licensed the LEADS
    platform and related software, and plans to use
    it for
  • In-silico drug target identification and
    prioritization
  • Genome wide chip design
  • Agreement was signed after a detailed pilot study
    run in November 2000
  • Discovered novel genes and splice variants using
    Incyte and Celera data
  • Genes were subsequently verified in Novartis
    laboratory.

36
GENCARTA
  • Result of LEADS applied to
  • Public genome information
  • Published mRNA
  • ESTs
  • In-house designed interface, Oracle-based
    infrastructure.
  • Installed Kyowa-Hakko, Avalon Pharma, Weizmann
    Institute, YU
  • Version 2.2 out in October 2001.

37
Lets go for the real thing
  • Gencarta Demonstration
  • OligoLibrary Demonstration

38
Conclusion Advantages of the Transcriptome
  • Identify new drug targets
  • Understand splice variant behavior
  • Isolate natural drugs
  • Annotate Proteomics experiments
  • Design better DNA chips

Solve the real bottlenecks in drug discovery and
development
39
Understanding genesusingmathematical
toolsAdam SartielCOMPUGEN
Write a Comment
User Comments (0)
About PowerShow.com